Pandas cumsum()

The cumsum() method in Pandas is used to provide the cumulative sum of elements along a particular axis.

Example

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [1, 2, 3, 4]
})

# compute cumulative sum along rows df_cumsum = df.cumsum()
print(df_cumsum) ''' Output A B 0 10 1 1 30 3 2 60 6 3 100 10 '''

cumsum() Syntax

The syntax of the cumsum() method in Pandas is:

cumsum(axis=None, skipna=True, *args, **kwargs)

cumsum() Arguments

The cumsum() method takes following arguments:

  • axis (optional) - specifies the axis along which the cumulative sum is computed
  • skipna (optional) - specifies whether to exclude null values or not
  • *args and *kwargs (optional) - additional arguments and keyword arguments that can be passed to the function.

cumsum() Return Value

The cumsum() method returns a cumulative sum of elements along the given axis.


Example 1: Get Cumulative Sum Using cumsum()

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Sales': [150, 200, 250, 300],
    'Expenses': [50, 60, 70, 80]
})

print("Original DataFrame:")
print(df)
print()

# compute the cumulative sum across rows (default behavior) df_cumsum = df.cumsum()
print("DataFrame after cumsum:") print(df_cumsum)

Output

Original DataFrame:
    Sales  Expenses
0    150        50
1    200        60
2    250        70
3    300        80

DataFrame after cumsum:
     Sales  Expenses
0    150        50
1    350       110
2    600       180
3    900       260

In the above example, we have created the df dataframe that represents sales and expenses over four time periods.

The cumsum() method on this df DataFrame computes the cumulative sum over both columns: Sales and Expenses.

Note: The cumsum() method is useful when we want to see the accumulated values over time.


Example 2: Compute Cumulative Sum Across Columns

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Price': [10, 20, 30],
    'Tax': [2, 4, 6],
    'Discount': [1, 2, 3]
})

print("Original DataFrame:")
print(df)
print()

# compute the cumulative sum across columns df_cumsum_col = df.cumsum(axis=1)
print("\nDataFrame after cumulative sum over columns:") print(df_cumsum_col)

Output

Original DataFrame:
   Price  Tax  Discount
0     10    2         1
1     20    4         2
2     30    6         3


DataFrame after cumulative sum over columns:
   Price  Tax  Discount
0     10   12        13
1     20   24        26
2     30   36        39

Here, we have used df.cumsum(axis=1) to compute the cumulative sum over the columns.

This means for each row, we're adding the values from left to right (across the columns).


Example 3: Handle Missing Data with skipna

In pandas, the skipna parameter in cumsum() determines whether to exclude missing values when performing the cumulative sum operation.

Let's look at an example.

import pandas as pd

# create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': [None, 2, 3, 4]
})

print("Original DataFrame:")
print(df)

# cumulative sum with skipna=True (default) cumsum_skipna_true = df.cumsum(skipna=True)
print("\nCumulative sum with skipna=True:") print(cumsum_skipna_true)
# cumulative sum with skipna=False cumsum_skipna_false = df.cumsum(skipna=False)
print("\nCumulative sum with skipna=False:") print(cumsum_skipna_false)

Output

Original DataFrame:
A    B
0  1.0  NaN
1  2.0  2.0
2  NaN  3.0
3  4.0  4.0

Cumulative sum with skipna=True:
     A    B
0  1.0  NaN
1  3.0  2.0
2  NaN  5.0
3  7.0  9.0

Cumulative sum with skipna=False:
     A   B
0  1.0 NaN
1  3.0 NaN
2  NaN NaN
3  NaN NaN

Here, when

  1. skipna=True (default) - cumsum() skips the missing values during its computation, resulting in accumulated values wherever possible
  2. skipna=False - cumsum() sets all subsequent values in the accumulation to NaN for that column.