- pandas.core.groupby.DataFrameGroupBy.__iter__
- pandas.core.groupby.SeriesGroupBy.__iter__
- pandas.core.groupby.DataFrameGroupBy.groups
- pandas.core.groupby.SeriesGroupBy.groups
- pandas.core.groupby.DataFrameGroupBy.indices
- pandas.core.groupby.SeriesGroupBy.indices
- pandas.core.groupby.DataFrameGroupBy.get_group
- pandas.core.groupby.SeriesGroupBy.get_group
- pandas.Grouper
- pandas.NamedAgg
- pandas.core.groupby.SeriesGroupBy.apply
- pandas.core.groupby.DataFrameGroupBy.apply
- pandas.core.groupby.SeriesGroupBy.agg
- pandas.core.groupby.DataFrameGroupBy.agg
- pandas.core.groupby.SeriesGroupBy.aggregate
- pandas.core.groupby.DataFrameGroupBy.aggregate
- pandas.core.groupby.SeriesGroupBy.transform
- pandas.core.groupby.DataFrameGroupBy.transform
- pandas.core.groupby.SeriesGroupBy.pipe
- pandas.core.groupby.DataFrameGroupBy.pipe
- pandas.core.groupby.DataFrameGroupBy.filter
- pandas.core.groupby.SeriesGroupBy.filter
- pandas.core.groupby.DataFrameGroupBy.all
- pandas.core.groupby.DataFrameGroupBy.any
- pandas.core.groupby.DataFrameGroupBy.bfill
- pandas.core.groupby.DataFrameGroupBy.corr
- pandas.core.groupby.DataFrameGroupBy.corrwith
- pandas.core.groupby.DataFrameGroupBy.count
- pandas.core.groupby.DataFrameGroupBy.cov
- pandas.core.groupby.DataFrameGroupBy.cumcount
- pandas.core.groupby.DataFrameGroupBy.cummax
- pandas.core.groupby.DataFrameGroupBy.cummin
- pandas.core.groupby.DataFrameGroupBy.cumprod
- pandas.core.groupby.DataFrameGroupBy.cumsum
- pandas.core.groupby.DataFrameGroupBy.describe
- pandas.core.groupby.DataFrameGroupBy.diff
- pandas.core.groupby.DataFrameGroupBy.ffill
- pandas.core.groupby.DataFrameGroupBy.fillna
- pandas.core.groupby.DataFrameGroupBy.first
- pandas.core.groupby.DataFrameGroupBy.head
- pandas.core.groupby.DataFrameGroupBy.idxmax
- pandas.core.groupby.DataFrameGroupBy.idxmin
- pandas.core.groupby.DataFrameGroupBy.last
- pandas.core.groupby.DataFrameGroupBy.max
- pandas.core.groupby.DataFrameGroupBy.mean
- pandas.core.groupby.DataFrameGroupBy.median
- pandas.core.groupby.DataFrameGroupBy.min
- pandas.core.groupby.DataFrameGroupBy.ngroup
- pandas.core.groupby.DataFrameGroupBy.nth
- pandas.core.groupby.DataFrameGroupBy.nunique
- pandas.core.groupby.DataFrameGroupBy.ohlc
- pandas.core.groupby.DataFrameGroupBy.pct_change
- pandas.core.groupby.DataFrameGroupBy.prod
- pandas.core.groupby.DataFrameGroupBy.quantile
- pandas.core.groupby.DataFrameGroupBy.rank
- pandas.core.groupby.DataFrameGroupBy.resample
- pandas.core.groupby.DataFrameGroupBy.rolling
- pandas.core.groupby.DataFrameGroupBy.sample
- pandas.core.groupby.DataFrameGroupBy.sem
- pandas.core.groupby.DataFrameGroupBy.shift
- pandas.core.groupby.DataFrameGroupBy.size
- pandas.core.groupby.DataFrameGroupBy.skew
- pandas.core.groupby.DataFrameGroupBy.std
- pandas.core.groupby.DataFrameGroupBy.sum
- pandas.core.groupby.DataFrameGroupBy.var
- pandas.core.groupby.DataFrameGroupBy.tail
- pandas.core.groupby.DataFrameGroupBy.take
- pandas.core.groupby.DataFrameGroupBy.value_counts
- pandas.core.groupby.SeriesGroupBy.all
- pandas.core.groupby.SeriesGroupBy.any
- pandas.core.groupby.SeriesGroupBy.bfill
- pandas.core.groupby.SeriesGroupBy.corr
- pandas.core.groupby.SeriesGroupBy.count
- pandas.core.groupby.SeriesGroupBy.cov
- pandas.core.groupby.SeriesGroupBy.cumcount
- pandas.core.groupby.SeriesGroupBy.cummax
- pandas.core.groupby.SeriesGroupBy.cummin
- pandas.core.groupby.SeriesGroupBy.cumprod
- pandas.core.groupby.SeriesGroupBy.cumsum
- pandas.core.groupby.SeriesGroupBy.describe
- pandas.core.groupby.SeriesGroupBy.diff
- pandas.core.groupby.SeriesGroupBy.ffill
- pandas.core.groupby.SeriesGroupBy.fillna
- pandas.core.groupby.SeriesGroupBy.first
- pandas.core.groupby.SeriesGroupBy.head
- pandas.core.groupby.SeriesGroupBy.last
- pandas.core.groupby.SeriesGroupBy.idxmax
- pandas.core.groupby.SeriesGroupBy.idxmin
- pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing
- pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing
- pandas.core.groupby.SeriesGroupBy.max
- pandas.core.groupby.SeriesGroupBy.mean
- pandas.core.groupby.SeriesGroupBy.median
- pandas.core.groupby.SeriesGroupBy.min
- pandas.core.groupby.SeriesGroupBy.ngroup
- pandas.core.groupby.SeriesGroupBy.nlargest
- pandas.core.groupby.SeriesGroupBy.nsmallest
- pandas.core.groupby.SeriesGroupBy.nth
- pandas.core.groupby.SeriesGroupBy.nunique
- pandas.core.groupby.SeriesGroupBy.unique
- pandas.core.groupby.SeriesGroupBy.ohlc
- pandas.core.groupby.SeriesGroupBy.pct_change
- pandas.core.groupby.SeriesGroupBy.prod
- pandas.core.groupby.SeriesGroupBy.quantile
- pandas.core.groupby.SeriesGroupBy.rank
- pandas.core.groupby.SeriesGroupBy.resample
- pandas.core.groupby.SeriesGroupBy.rolling
- pandas.core.groupby.SeriesGroupBy.sample
- pandas.core.groupby.SeriesGroupBy.sem
- pandas.core.groupby.SeriesGroupBy.shift
- pandas.core.groupby.SeriesGroupBy.size
- pandas.core.groupby.SeriesGroupBy.skew
- pandas.core.groupby.SeriesGroupBy.std
- pandas.core.groupby.SeriesGroupBy.sum
- pandas.core.groupby.SeriesGroupBy.var
- pandas.core.groupby.SeriesGroupBy.tail
- pandas.core.groupby.SeriesGroupBy.take
- pandas.core.groupby.SeriesGroupBy.value_counts
- pandas.core.groupby.DataFrameGroupBy.boxplot
- pandas.core.groupby.DataFrameGroupBy.hist
- pandas.core.groupby.SeriesGroupBy.hist
- pandas.core.groupby.DataFrameGroupBy.plot
- pandas.core.groupby.SeriesGroupBy.plot
Learning pandas' groupby functionality can be approached systematically by grouping related methods together. Here's a structured way to learn these methods, organized by functionality and complexity:
1. Basic GroupBy Mechanics
Start with understanding how to create groups and inspect them:
DataFrameGroupBy.__iter__,SeriesGroupBy.__iter__(iteration over groups)DataFrameGroupBy.groups,SeriesGroupBy.groups(dictionary of groups)DataFrameGroupBy.indices,SeriesGroupBy.indices(group indices)DataFrameGroupBy.get_group,SeriesGroupBy.get_group(extract a specific group)pandas.Grouper(flexible grouping rules, e.g., time-based)pandas.NamedAgg(for named aggregation, e.g.,agg(new_col=('old_col', 'sum')))
2. Core Aggregation Methods
Learn how to compute summaries/statistics on groups:
- Basic Aggregations:
count(),size(),sum(),mean(),median(),min(),max()nunique(),first(),last(),prod(),var(),std(),sem(),skew()- Advanced Aggregations:
agg()/aggregate()(flexible single/multiple operations)describe()(summary stats)quantile()(percentiles)value_counts()(frequency counts)
3. Group-Wise Transformations
Methods that return data aligned with the original input:
transform()(apply function to each group, returning same shape)rank(),cumsum(),cumprod(),cummin(),cummax()(cumulative operations)pct_change(),diff()(group-wise differences/percent changes)shift(),bfill(),ffill()(handling missing data within groups)
4. Filtering Groups
Select subsets of groups based on conditions:
filter()(keep groups that meet a criterion)head(),tail(),nth()(select rows within groups)sample()(random sampling from groups)
5. Specialized Operations
- Index/Position Handling:
idxmin(),idxmax()(return index of min/max)ngroup()(group IDs)cumcount()(counter within each group)- Time-Series Specific:
resample()(time-based regrouping)rolling()(rolling window operations)- Correlation/Covariance:
corr(),cov(),corrwith()(group-wise relationships)- Unique Values:
unique()(unique values per group)nlargest(),nsmallest()(top/bottom values per group)
6. Visualization
plot(),hist(),boxplot()(visualizing grouped data)
7. Advanced Workflow Tools
apply()(flexible apply-any-function)pipe()(method chaining with groupby)ohlc()(open-high-low-close for financial data)
Suggested Learning Order
- Basic Mechanics (Section 1) → How groups are formed and inspected.
- Aggregations (Section 2) → Summarizing data (most common use case).
- Transformations (Section 3) → Modify data while preserving shape.
- Filtering (Section 4) → Subset groups.
- Specialized Ops (Section 5) → As needed for your use case.
- Visualization/Advanced (Sections 6–7) → For deeper analysis.
Key Notes
agg()/aggregate()are interchangeable;agg()is more commonly used.apply()is powerful but slower; prefer built-in methods (e.g.,sum()) when possible.transform()vs.agg(): Usetransformto broadcast results back to the original rows.
Pandas GroupBy: Basic Mechanics Study Notes
Understanding how to create, inspect, and extract groups in pandas.
1. DataFrameGroupBy.__iter__ & SeriesGroupBy.__iter__
Purpose: Iterate over groups in a GroupBy object.
Syntax:
python
for name, group in df.groupby('column'):
# name = group key (e.g., unique value in 'column')
# group = DataFrame/Series for that groupExample:
python
import pandas as pd
df = pd.DataFrame({
'Animal': ['Dog', 'Cat', 'Dog', 'Cat'],
'Speed': [40, 30, 35, 25]
})
for animal, group in df.groupby('Animal'):
print(f"Animal: {animal}")
print(group)Output:
Animal: Cat
Animal Speed
1 Cat 30
3 Cat 25
Animal: Dog
Animal Speed
0 Dog 40
2 Dog 35When to Use:
- When you need to manually process each group (e.g., custom aggregations, filtering).
- Useful for debugging or inspecting groups before applying operations.
Note:
- Prefer built-in
GroupBymethods (e.g.,agg(),transform()) for efficiency.
*2. DataFrameGroupBy.groups & SeriesGroupBy.groups
Purpose: Returns a dictionary mapping group keys to row indices.
Syntax:
python
grouped = df.groupby('column')
grouped.groups # Returns {group_key: [indices]}Example:
python
grouped = df.groupby('Animal')
print(grouped.groups)Output:
{'Cat': [1, 3], 'Dog': [0, 2]}When to Use:
- To inspect which rows belong to each group.
- Useful for debugging or when you need direct access to group indices.
Note:
- The keys are the unique values in the grouping column.
- The values are lists of row indices (positions in the original DataFrame).
*3. DataFrameGroupBy.indices & SeriesGroupBy.indices
Purpose: Similar to .groups, but returns a dictionary of NumPy arrays (faster for large data).
Syntax:
python
grouped = df.groupby('column')
grouped.indices # Returns {group_key: np.array(indices)}Example:
python
print(grouped.indices)Output:
{'Cat': array([1, 3]), 'Dog': array([0, 2])}When to Use:
- When working with large datasets (NumPy arrays are more efficient than lists).
- If you need indices for further NumPy-based computations.
Note:
- Almost identical to
.groups, but returns arrays instead of lists.
*4. DataFrameGroupBy.get_group & SeriesGroupBy.get_group
Purpose: Extract a single group as a DataFrame or Series.
Syntax:
python
grouped = df.groupby('column')
grouped.get_group('group_name')Example:
python
dog_group = grouped.get_group('Dog')
print(dog_group)Output:
Animal Speed
0 Dog 40
2 Dog 35When to Use:
- When you need to work with a specific subset of data.
- Useful for debugging or isolating a group for further analysis.
Note:
- Raises
KeyErrorif the group does not exist.
*5. pandas.Grouper
Purpose: Flexible grouping (especially for time-based grouping).
Syntax:
python
df.groupby(pd.Grouper(key='date_column', freq='D')) # Daily groupingExample:
python
df = pd.DataFrame({
'Date': pd.date_range('2023-01-01', periods=4, freq='D'),
'Value': [10, 20, 15, 25]
})
grouped = df.groupby(pd.Grouper(key='Date', freq='2D')).sum()
print(grouped)Output:
Value
Date
2023-01-01 30 # Jan 1 + Jan 2
2023-01-03 40 # Jan 3 + Jan 4When to Use:
- For time-based resampling (e.g., daily, weekly, monthly).
- When grouping by a column with a custom frequency.
Note:
- Similar to
resample(), but works insidegroupby().
*6. pandas.NamedAgg
Purpose: Named aggregation (clear column naming in agg()).
Syntax:
python
df.groupby('column').agg(
new_name=pd.NamedAgg(column='col', aggfunc='sum')
)
# Or (shorthand):
df.groupby('column').agg(new_name=('col', 'sum'))Example:
python
result = df.groupby('Animal').agg(
Avg_Speed=('Speed', 'mean'),
Max_Speed=('Speed', 'max')
)
print(result)Output:
Avg_Speed Max_Speed
Animal
Cat 27.5 30
Dog 37.5 40When to Use:
- When you need readable column names after aggregation.
- Replaces the older
dictsyntax ({'col': ['mean', 'sum']}).
Note:
- Introduced in pandas 0.25+ for cleaner aggregation syntax.
Summary Table
Method/Class | Purpose | Example Use Case |
__iter__ | Loop over groups | Custom per-group processing |
.groups | Get group indices (as lists) | Debugging group memberships |
.indices | Get group indices (as arrays) | Faster index access |
get_group() | Extract a single group | Isolating a subset |
pd.Grouper | Time-based grouping | Resampling by day/week |
pd.NamedAgg | Named aggregations | Clean column naming in agg() |
Best Practices
✅ Use get_group() to inspect a single group.
✅ Prefer .indices for large datasets (faster than .groups).
✅ Use pd.Grouper for time-series grouping.
✅ Use NamedAgg for readable aggregation results.