Pandas apply()

The apply() method in Pandas allows us to apply a function along the axis of a DataFrame or a Series.

Example

import pandas as pd

# create a Pandas Series
data = pd.Series([1, 2, 3, 4, 5])

# define a function to square each element
def square(x):
    return x ** 2

# apply the square function to each element of the Series result = data.apply(square)
print(result) ''' Output 0 1 1 4 2 9 3 16 4 25 dtype: int64 '''

apply() Syntax

The syntax of the apply() method in Pandas is:

apply(func, axis=0, *args, **kwargs)

apply() Arguments

The apply() function takes following arguments:

  • func - the function to apply
  • axis (optional) - specifies the axis along which the function will be applied
  • *args and *kwargs (optional) - additional arguments and keyword arguments that can be passed to the function

apply() Return Value

The apply() method returns a new DataFrame or Series as a result of applying the specified function.


Example1: Apply a Function to Each Element of a Pandas Series

import pandas as pd

# create a Pandas Series
data = pd.Series([1, 2, 3, 4, 5])

# define a function to add a constant value to each element
def add_constant(x):

    # add 10 to each element
    return x + 10  

# apply the add_constant function to each element of the Series result = data.apply(add_constant)
print(result)

Output

0    11
1    12
2    13
3    14
4    15
dtype: int64

In the above example, we have created a Pandas Series data containing the numbers from 1 to 5.

And we defined a function add_constant() that accepts x as an argument and adds a constant value 10 to it.

Then we used the apply() method to apply the add_constant() function to each element of the Series data.

Hence, the result is a new Series result where each element is the original value x plus 10.


Example2: Apply Function to Each Row DataFrame

import pandas as pd

data = pd.DataFrame({'A': [1, 2, 3], 
                                       'B': [4, 5, 6]})

# define a function to calculate the sum of each row
def sum_row(row):
    return row['A'] + row['B']

# apply the function row-wise (axis=1) result = data.apply(sum_row, axis=1)
print(result)

Output

0    5
1    7
2    9
dtype: int64

In the above example, we have used the apply() method with axis=1 to apply the sum_row() function to each row of the DataFrame.

This means that the function will be applied horizontally, adding the A and B values for each row separately.

Hence, the output shows the sum of A and B for each row in the DataFrame. For example, in the first row, A is 1, B is 4, and the sum is 5.

Similarly, the sums for the other rows are calculated and displayed in the Series.


Example 3: Using Lambda Functions with apply()

import pandas as pd

# create a Pandas Series
data = pd.Series([1, 2, 3, 4, 5])

# define a lambda function to square each element of the Series
square_function = lambda x: x ** 2

# apply the square_function to each element of the Series result = data.apply(square_function)
# display the result print(result)

Output

0     1
1     4
2     9
3    16
4    25
dtype: int64

In this example, we have defined a lambda function named square_function that accepts x as an argument and returns its square.

Then, we used the apply() method to apply the square_function to each element of the Series data.

This results in a new Series named result with squared values.


Example 4: Applying Functions to Grouped DataFrames

import pandas as pd

data = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 
                                       'Value': [1, 2, 3, 4]})

# group the DataFrame by 'Category' and # calculate the mean for each group result = data.groupby('Category')['Value'].apply(lambda x: x.mean())
print(result)

Output

Category
A    2.0
B    3.0
Name: Value, dtype: float64

Here, we have a DataFrame named data with two columns: 'Category' and 'Value'. And we have grouped the DataFrame by the Category column.

Then, we used the apply() method with a lambda function to calculate the mean value for each group based on the Value column.

The result is a Pandas Series where the mean values are calculated separately for each unique Category.