Pandas diff()

The diff() method in Pandas is used to calculate the difference of a DataFrame or Series element compared with another element in the DataFrame or Series.

Example

import pandas as pd

# sample DataFrame
data = pd.DataFrame({'A': [1, 2, 4, 7, 11]})

# calculate the difference with the previous row
diff_data = data.diff()

print(diff_data)

'''
Output

     A
0  NaN
1  1.0
2  2.0
3  3.0
4  4.0
'''

diff() Syntax

The syntax of the diff() method in Pandas is:

df.diff(periods=1, axis=0)

diff() Arguments

The diff() method in Pandas has the following arguments:

  • periods (optional): number of periods to shift for calculating the difference
  • axis (optional): take difference over rows or columns.

diff() Return Value

The diff() method returns a DataFrame or Series that is the same size as the input, containing the calculated differences.


Example 1: Calculating Differences Over Columns

import pandas as pd

# sample DataFrame
data = pd.DataFrame({'A': [1, 3, 6, 10], 'B': [1, 5, 15, 35]})

# calculate the difference across columns
diff_data = data.diff(axis=1)

print(diff_data)

Output

    A   B
0 NaN   0
1 NaN   2
2 NaN   9
3 NaN  25

In this example, we compute the differences across columns.

NaN appears in the first column after applying the diff() method because this method calculates the difference between each element and its predecessor.


Example 2: Non-default Periods

import pandas as pd

# sample DataFrame
data = pd.DataFrame({'A': [2, 4, 8, 16, 32]})

# calculate the difference with a period of 2 rows
diff_data = data.diff(periods=2)

print(diff_data)

Output

      A
0   NaN
1   NaN
2   6.0
3  12.0
4  24.0

Here, we used diff() with a period value of 2, meaning it calculates the difference between each element and the one two places before it.