The diff()
method in Pandas is used to calculate the difference of a DataFrame or Series element compared with another element in the DataFrame or Series.
Example
import pandas as pd
# sample DataFrame
data = pd.DataFrame({'A': [1, 2, 4, 7, 11]})
# calculate the difference with the previous row
diff_data = data.diff()
print(diff_data)
'''
Output
A
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
'''
diff() Syntax
The syntax of the diff()
method in Pandas is:
df.diff(periods=1, axis=0)
diff() Arguments
The diff()
method in Pandas has the following arguments:
periods
(optional): number of periods to shift for calculating the differenceaxis
(optional): take difference over rows or columns.
diff() Return Value
The diff()
method returns a DataFrame or Series that is the same size as the input, containing the calculated differences.
Example 1: Calculating Differences Over Columns
import pandas as pd
# sample DataFrame
data = pd.DataFrame({'A': [1, 3, 6, 10], 'B': [1, 5, 15, 35]})
# calculate the difference across columns
diff_data = data.diff(axis=1)
print(diff_data)
Output
A B 0 NaN 0 1 NaN 2 2 NaN 9 3 NaN 25
In this example, we compute the differences across columns.
NaN
appears in the first column after applying the diff()
method because this method calculates the difference between each element and its predecessor.
Example 2: Non-default Periods
import pandas as pd
# sample DataFrame
data = pd.DataFrame({'A': [2, 4, 8, 16, 32]})
# calculate the difference with a period of 2 rows
diff_data = data.diff(periods=2)
print(diff_data)
Output
A 0 NaN 1 NaN 2 6.0 3 12.0 4 24.0
Here, we used diff()
with a period value of 2, meaning it calculates the difference between each element and the one two places before it.