Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas groupby()

The groupby() method in Pandas is used for grouping rows based on some columns and then performing an aggregation function.

Example

import pandas as pd

# creating a dictionary 
data = {
    'City': ['NY', 'LA', 'NY', 'LA', 'NY'],         
    'Temperature': [55, 78, 56, 76, 54],            
    'Humidity': [65, 50, 60, 49, 63]                
}

# convert dictionary to a dataframe named df
df = pd.DataFrame(data)

# group dataframe by City column and 
# calculate the mean for each group
grouped = df.groupby('City').mean()

# display the grouped and aggregated data
print(grouped)

'''
Output

     Temperature  Humidity
City                       
LA          77.0      49.5
NY          55.0      62.666667
'''

Run Code

groupby() Syntax

The syntax of the groupby() method in Pandas is:

df.groupby(by=None, axis=0, level=None, sort=True, as_index=True, dropna=True)

groupby() Arguments

The groupby() method takes following arguments:

by - determines how to group the data
axis (optional) - specifies if we want to group by rows or columns
level (optional) - determines which level to use for grouping
sort (optional) - sorts the grouped data
as_index (optional) - if True, group labels become indices; if False, they remain columns
dropna (optional) - exclude null values if True.

groupby() Return Value

The groupby() method returns an object which represents data grouped by a key(s).

Example 1: Grouping by a Single Column in Pandas

import pandas as pd

# create a dictionary containing the data
data = {'Genre': ['Fiction', 'Non-Fiction', 'Fiction', 'Non-Fiction', 'Fiction'],
        'BooksSold': [150, 90, 80, 60, 200]}

# create a DataFrame using the data dictionary
df = pd.DataFrame(data)

# group the DataFrame by the Genre column and
# calculate the total number of books sold for each genre
grouped = df.groupby('Genre')['BooksSold'].sum()

# print the grouped data
print(grouped)

Run Code

Output

Genre
Fiction        430
Non-Fiction    150
Name: BooksSold, dtype: int64

In the above example, df.groupby('Genre')['BooksSold'].sum() is used to group by a single column and calculate sum.

This line does the following:

df.groupby('Category') - groups the df DataFrame by the unique values in the Genre column.
[BooksSold] - specifies that we are interested in the 'BooksSold' column within each group.
.sum() - calculates the sum of the 'BookSold' values for each group.

Example 2: Use of axis Argument in groupby()

import pandas as pd

# create a sample DataFrame
data = {
    'A': [1, 2, 1, 2],
    'B': [3, 4, 3, 4],
    'C': [5, 6, 7, 8]
}

df = pd.DataFrame(data)

# group along rows (axis=0) based on column A
grouped_rows = df.groupby('A', axis=0)

# calculate the sum of each group
sum_rows = grouped_rows.sum()

print("Grouped along rows:")
print(sum_rows)

# group along columns (axis=1) based on index labels
grouped_cols = df.groupby(df.columns, axis=1)

# calculate the sum of each group
sum_cols = grouped_cols.sum()

print("\nGrouped along columns:")
print(sum_cols)

Run Code

Output

Grouped along rows:
A   B   C
       
1   6  12
2   8  14

Grouped along columns:
   A  B  C
0  1  3  5
1  2  4  6
2  1  3  7
3  2  4  8

In this example, we first create a DataFrame with columns A, B, and C.

We then use the groupby() method to group the data along rows based on the A column and along columns based on the index labels (columns A, B, and C).

The axis parameter is used to specify whether the grouping should be done along rows or columns.

Example 3: Use of level argument in groupby()

import pandas as pd

# create a DataFrame with a multi-level index
data = {'A': [1, 2, 3, 4, 5],
        'B': [10, 20, 30, 40, 50]}
index = pd.MultiIndex.from_tuples([('Group1', 'A'), ('Group1', 'B'), ('Group2', 'A'), ('Group2', 'B'), ('Group3', 'A')],
                                  names=['Group', 'Category'])
df = pd.DataFrame(data, index=index)

# display original DataFrame
print("Original DataFrame:")
print(df)

# group by the Group level and calculate the sum
grouped = df.groupby(level='Group').sum()

# display the result
print("\nGrouped by 'Group' level and summed:")
print(grouped)

Run Code

Output

Original DataFrame:
                  A   B
Group  Category        
Group1 A          1  10
       B          2  20
Group2 A          3  30
       B          4  40
Group3 A          5  50

Grouped by 'Group' level and summed:
        A   B
Group        
Group1  3  30
Group2  7  70
Group3  5  50

Here, we create a DataFrame with a multi-level index where the levels are named Group and Category.

We then use the groupby() method with the level argument set to Group to group the data by the Group level and calculate the sum for each group.

As a result, we get a new DataFrame with the groups as the index and the sums of columns A and B for each group.

Example 4: Sort the Grouped Data

import pandas as pd

# create a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
        'Value': [10, 15, 20, 25, 30, 35]}
df = pd.DataFrame(data)

# group by Category column and sort the grouped data
grouped = df.groupby('Category', sort=True)

# calculate the mean of Value for each group
result = grouped['Value'].mean()

print(result)

Run Code

Output

Category
A    20.0
B    25.0
Name: Value, dtype: float64

In this example, we use the groupby() method to group the DataFrame by the Category column and specify sort=True, which means the groups will be sorted alphabetically based on the Category values.

Example 5: Use of as_index Argument in groupby()

The as_index() argument is used to specify whether grouping columns should be treated as index columns or not.

as_index=True - grouped columns become the index of the resulting DataFrame
as_index=False - grouped columns remain as regular columns in the resulting DataFrame

Let's look at an example.

import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B', 'A'],
        'Value': [10, 20, 15, 25, 30]}

df = pd.DataFrame(data)

# group by Category and calculate the mean, with 'Category' as a regular column.
result1 = df.groupby('Category', as_index=False).mean()
print("When as_index=Flase:\n", result1)

print()

# group by Category and calculate the mean, with 'Category' becoming the index.
result2 = df.groupby('Category', as_index=True).mean()
print("\nWhen as_index=True:\n", result2)

Run Code

Output

When as_index=Flase:
    Category     Value
0        A       18.333333
1        B       22.500000

When as_index=True:
          Value
Category           
A         18.333333
B         22.500000

Here,

when as_index=False, the Category column is not set as the index of the resulting grouped DataFrame, and it remains a regular column
when as_index=True, the Category column becomes the index of the resulting grouped DataFrame

Example 6: Grouping by a Multiple Column in Pandas

import pandas as pd

# create a DataFrame with student data
data = {
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Grade': ['A', 'B', 'A', 'A', 'B'],
    'Score': [90, 85, 92, 88, 78],
    'Attendance': [95, 98, 92, 97, 88]
}

df = pd.DataFrame(data)

# define the aggregate functions to be applied to the Score and Attendance columns
agg_functions = {
    'Score': 'min',        # Calculate the minimum Score
    'Attendance': 'mean'  # Calculate the mean Attendance
}

# group the DataFrame by Gender and Grade, then apply the aggregate functions
grouped = df.groupby(['Gender', 'Grade']).aggregate(agg_functions)

# print the resulting grouped DataFrame
print(grouped)

Run Code

Output

               Score  Attendance
Gender  Grade                   
Female  A       88      97.0
        B       85      98.0
Male    A       90      93.5
        B       78      88.0

In the above example,

The DataFrame is grouped by Gender and Grade, creating multi-level row indices.
The Score column shows the minimum score for each combination of Gender and Grade.
The Attendance column shows the mean attendance for each combination of Gender and Grade.

Example 7: Use of dropna Argument in groupby()

The dropna argument specifies how the grouping operation should handle rows with missing values in the columns by which you are grouping your data.

When dropna=True - excludes rows with missing values in the grouping columns from the groups
When dropna=False - includes rows with missing values in the grouping columns in their own separate group

Let's look at an example.

import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B', None],
        'Value': [10, 20, 15, 25, 30]}

df = pd.DataFrame(data)

# group by Category with dropna=True (default)
grouped_true = df.groupby('Category', dropna=True).mean()
print("With grouped_true: ")
print(grouped_true)
print()

# group by Category with dropna=False
grouped_false = df.groupby('Category', dropna=False).mean()
print("With grouped_false: ")
print(grouped_false)

Run Code

Output

With grouped_true: 
          Value
Category       
A          12.5
B          22.5
With grouped_false: 
           Value
Category       
A          12.5
B          22.5
NaN        30.0

Here, rows with None in the Category column are excluded from the grouping when dropna=True.

However, a separate group is created for the rows with None in the Category column when dropna=False.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically
and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Pandas groupby()

Example

groupby() Syntax

groupby() Arguments

groupby() Return Value

Example 1: Grouping by a Single Column in Pandas

Example 2: Use of axis Argument in groupby()

Example 3: Use of level argument in groupby()

Example 4: Sort the Grouped Data

Example 5: Use of as_index Argument in groupby()

Example 6: Grouping by a Multiple Column in Pandas

Example 7: Use of dropna Argument in groupby()

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Pandas groupby()

Example

groupby() Syntax

groupby() Arguments

groupby() Return Value

Example 1: Grouping by a Single Column in Pandas

Example 2: Use of axis Argument in groupby()

Example 3: Use of level argument in groupby()

Example 4: Sort the Grouped Data

Example 5: Use of as_index Argument in groupby()

Example 6: Grouping by a Multiple Column in Pandas

Example 7: Use of dropna Argument in groupby()

Learn Python practically
and Get Certified.