Pandas rolling()

The rolling() method in Pandas is used to perform rolling window calculations on sequential data.

A rolling window is a fixed-size interval or subset of data that moves sequentially through a larger dataset.

And it is used for calculations such as averages, sums, or other statistics, with the window rolling one step at a time through the data to provide insights into trends and patterns within the dataset.

Example

import pandas as pd

# create a DataFrame with sequential data
data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})

# use rolling() to calculate the rolling maximum window_size = 3 rolling_max = data['value'].rolling(window=window_size).max()
# display the rolling_max print(rolling_max) ''' Output 0 NaN 1 NaN 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 Name: value, dtype: float64 '''

rolling() Syntax

The syntax of the rolling() method in Pandas is:

df.rolling(window, min_periods=1, center=False, on=None, axis=0, closed=None)

rolling() Arguments

The rolling() method takes following arguments:

  • window - size of the rolling window (sequential data)
  • min_periods (optional) - minimum non-null observations needed for a valid result
  • center (optional) - use center label as result index if True, else right end label (default)
  • on (optional) - specifies the column to use as the rolling window anchor
  • axis (optional) - specifies the axis along which the rolling window is applied. Default is 0 (along rows)
  • closed (optional) - specifies which side of the window interval is closed.

rolling() Return Value

The rolling() method returns an object, which is not a final computed result but rather an intermediate object that allows us to apply various aggregation functions within the rolling window.


Example 1: Use rolling() to Calculate Rolling Minimum

import pandas as pd

# create a DataFrame with sequential data
data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})

# use rolling() to calculate the rolling minimum window_size = 3 rolling_min = data['value'].rolling(window=window_size).min()
# display the rolling_min print(rolling_min)

Output

0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    4.0
6    5.0
7    6.0
8    7.0
Name: value, dtype: float64

In the above example, we have used the rolling() method on the value column of the data DataFrame to calculate the rolling minimum.

The window_size is set to 3, which means it calculates the minimum value within a rolling window of size 3 as it moves through the value column.

In the output, the first two values (at index 0 and 1) are NaN because there are not enough data points to calculate the minimum in the beginning due to the window size of 3.

Starting from index 2, each subsequent value represents the minimum value within a rolling window of size 3.

For example, at index 2, the rolling window includes [1, 2, 3], and the minimum is 1.0. Similarly, at index 3, the rolling window includes [2, 3, 4], and the minimum is 2.0, and so on.

Note: After calling rolling(), we can apply any aggregation functions to compute calculations within the rolling window, such as mean(), sum(), min(), max(), etc.


Example 2: Handle Missing Data in Rolling Calculations

import pandas as pd

# create a DataFrame with missing values
data = pd.DataFrame({'value': [1, None, 3, 4, 5, None, 7, 8, 9]})

# calculate the rolling mean with # window size of 2 and min_periods set to 2 window_size = 2 rolling_mean = data['value'].rolling(window=window_size, min_periods=2).mean()
# display the rolling_mean print(rolling_mean)

Output

0    NaN
1    NaN
2    NaN
3    3.5
4    4.5
5    NaN
6    NaN
7    7.5
8    8.5
Name: value, dtype: float64

In this example, the rolling() method calculates the mean using a specified window size.

We've set window=2, which means it calculates the mean of every 2 consecutive values.

The parameter min_periods is set to 2, which means that at least 2 non-NaN values are needed to compute the mean. If there are less than 2 non-NaN values within a window, the result will be NaN.


Example 3: Centered Rolling Window Calculations in Pandas

import pandas as pd

# create a DataFrame with time-based data
data = pd.DataFrame({'value': [1, 2, 3, 4, 5, 6, 7, 8, 9]})

# calculate a centered rolling sum with a window size of 3 window_size = 3 centered_rolling_sum = data['value'].rolling(window=window_size, center=True).sum()
# display the result print(centered_rolling_sum)

Output

0     NaN
1     6.0
2     9.0
3    12.0
4    15.0
5    18.0
6    21.0
7    24.0
8     NaN
Name: value, dtype: float64

Here, we have used the rolling() method to apply a moving window calculation on the value column of the df DataFrame.

We've set a window size of 3 and specified the center=True parameter, which means each calculated value is centered on its respective window.

Due to the centered approach, the first and last entries don't have both a previous and next value. Hence, their rolling sum is represented as NaN.


Example 4: Use on Argument in rolling() For Date-based Calculations

import pandas as pd

# sample DataFrame
df = pd.DataFrame({
    'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
    'value': [10, 20, 30, 40, 50]
})

# convert 'date' column to datetime type
df['date'] = pd.to_datetime(df['date'])

# compute rolling sum based on the 'date' column with a window size of 3 days df['rolling_sum'] = df.rolling(window='3D', on='date')['value'].sum()
print(df)

Output

   date        value  rolling_sum
0 2023-01-01     10         10.0
1 2023-01-02     20         30.0
2 2023-01-03     30         60.0
3 2023-01-04     40         90.0
4 2023-01-05     50        120.0

In the above example, we've used the rolling() method with the window='3D' argument, specifying a rolling window of 3 days.

By setting on='date', we ensure that the rolling calculation is based on the dates in the date column rather than the default index.

The result is stored in a new column called rolling_sum, which contains the cumulative sum of value for every 3-day period.


Example 5: Applying Column-Wise Rolling Operations

import pandas as pd

# sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [9, 8, 7, 6, 5]
})

# apply rolling sum with window size of 2, along columns df_rolling = df.rolling(window=2, axis=1).sum()
print(df_rolling)

Output

     A     B     C
0  NaN   6.0  14.0
1  NaN   8.0  14.0
2  NaN  10.0  14.0
3  NaN  12.0  14.0
4  NaN  14.0  14.0

In this example, we applied the rolling window column-wise by specifying axis=1.

For each row:

  1. The value in the A column is NaN because there's no preceding column to form a window of size 2.
  2. The value in the B column is the sum of columns A and B.
  3. The value in the C column is the sum of columns B and C.

Example 6: Window Boundaries Using closed Parameter

The possible values for the closed parameter are:

  1. 'right' (default) - close the right side of the window.
  2. 'left' - close the left side of the window.
  3. 'both' - close both sides of the window.
  4. 'neither' - do not close either side of the window.

Let's look at an example.

import pandas as pd

# sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

# rolling with different closed values print("Closed on right (default):") print(df.rolling(window=2, closed='left').sum())
print("\nClosed on left:") print(df.rolling(window=2, closed='left').sum())
print("\nClosed on both:") print(df.rolling(window=2, closed='both').sum())
print("\nClosed on neither:") print(df.rolling(window=2, closed='neither').sum())

Output

Closed on right (default):
    A     B
0  NaN   NaN
1  3.0  11.0
2  5.0  13.0
3  7.0  15.0

Closed on left:
    A     B
0  NaN   NaN
1  NaN   NaN
2  3.0   11.0
3  5.0   13.0

Closed on both:
    A     B
0  NaN   NaN
1  3.0   11.0
2  6.0   18.0
3  9.0   21.0

Closed on neither:
   A   B
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN

Here,

  1. closed='right' - sums are for the current row and the row just before it
  2. closed='left' - sums are for the row just before the current row and the one before that
  3. closed='both' - sums three rows (current, previous, and next), unexpectedly acting like a window size of 3 instead of the specified 2
  4. closed='right' - all values are NaN since neither the current row nor the previous one is included in the sum.