Pandas aggregate()

The aggregate() method in Pandas is used to perform summary computations on data, often on grouped data.

Example

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 
                                  'B': [4, 5, 6]})

# apply sum to each column
result = df.aggregate('sum')  

print(result)

'''
Output

A     6
B    15
dtype: int64

'''

aggregate() Syntax

The syntax of the aggregate() method in Pandas is:

df.aggregate(func, axis=0, *args, **kwargs)

aggregate() Arguments

The aggregate() method takes following arguments:

  • func - an aggregate function like sum, mean, etc.
  • axis - specifies whether to apply the aggregation operation along rows or columns.
  • *args and **kwargs - additional arguments that can be passed to the aggregation functions.

aggregate() Return Value

The aggregate() method can return a single value, a Series, or a DataFrame, depending on the input data and the aggregation operations specified.


Example 1: Apply Single Aggregate Function

import pandas as pd

# create a DataFrame with 'Region' and 'Sales' columns
data = {
    'Region': ['East', 'West', 'East', 'North', 'West', 'East', 'North', 'West'],
    'Sales': [100, 200, 150, 120, 250, 175, 100, 300]
}

df = pd.DataFrame(data)

# calculate total sum of the Sales column total_sales_sum = df['Sales'].aggregate('sum')
print("Total Sales Sum:", total_sales_sum)
# calculate the mean of the Sales column average_sales = df['Sales'].aggregate('mean')
print("Average Sales:", average_sales)
# calculate the maximum value in the Sales column max_sales = df['Sales'].aggregate('max')
print("Maximum Sales:", max_sales)

Output

Total Sales Sum: 1395
Average Sales: 174.375
Maximum Sales: 300

Here,

  • df['Sales'].aggregate('sum') - calculates the total sum of the Sales column in the df DataFrame
  • df['Sales'].aggregate('mean') - calculates the mean (average) the Sales column in the df DataFrame
  • df['Sales'].aggregate('max') - computes the maximum value in the Sales column.

Example 2: Apply Multiple Aggregate Functions in Pandas

import pandas as pd

# create a DataFrame
data = {
    'Product': ['Widget', 'Widget', 'Gadget', 'Gadget', 'Widget', 'Gadget'],
    'Sales': [240, 350, 560, 470, 680, 590]
}

df = pd.DataFrame(data)

# group by the 'Product' column and aggregate the 'Sales' column result = df.groupby('Product')['Sales'].agg(['sum', 'mean', 'max', 'min'])
print(result)

Output

         sum    mean       max  min
Product                            
Gadget   1620  540.000000  590  470
Widget   1270  423.333333  680  240

In the above example, we're using the aggregate() function to apply multiple aggregation functions (sum, mean, max, and min) to the Sales column after grouping by the Product column.

The resulting DataFrame shows the calculated values for each category.


Example 3: Apply Different Aggregation Functions

import pandas as pd

data = {
    'Type': ['X', 'X', 'Y', 'Y', 'X', 'Y'],
    'Quantity': [100, 150, 200, 250, 300, 350],
    'Price': [20, 30, 40, 50, 60, 70]
}

# create the DataFrame
df = pd.DataFrame(data)

# define aggregation functions for each column agg_funcs = { # applying 'sum' to Quantity column 'Quantity': 'sum', # applying 'mean' and 'max' to Price column 'Price': ['mean', 'max'] }
# group by the 'Type' column and aggregate result = df.groupby('Type').aggregate(agg_funcs)
print(result)

Output

          Quantity Price     
            sum     mean      max
Type                     
X           550    36.666667  60
Y           800    53.333333  70

Here, we're using the aggregate() function to apply different aggregation functions to different columns after grouping by the Type column.

The resulting DataFrame shows the calculated values for each category and each specified aggregation function.


Example 4: Use of axis Argument in DataFrame Transposition

import pandas as pd

data = {
    'Value1': [10, 15, 20, 25, 30, 35],
    'Value2': [5, 8, 12, 15, 18, 21]
}

df = pd.DataFrame(data)

# apply the sum function column-wise (down the rows) column_sum = df.aggregate('sum', axis=0)
print("Column-wise sum:") print(column_sum) print("\n")
# apply the sum function row-wise (across the columns) row_sum = df.aggregate('sum', axis=1)
print("Row-wise sum:") print(row_sum)

Output

Column-wise sum:
Value1    135
Value2     79
dtype: int64

Row-wise sum:
0    15
1    23
2    32
3    40
4    48
5    56
dtype: int64

In the above example,

  1. column_sum computes the sum of the values within each column individually. For the column Value1, it adds up the numbers 10, 15, 20, 25, 30, and 35.
  2. row_sum calculates the sum of the values across each row. For the first row, it adds the values in Value1 and Value2, which are 10 and 5, respectively.