Pandas resample()

The resample() method in Pandas converts time series data to a different frequency.

Example

import pandas as pd

# create a time series DataFrame
index = pd.date_range('1/1/2020', periods=4, freq='T')
data = pd.Series([0.0, None, 2.0, 3.0], index=index)
df = pd.DataFrame(data, columns=['A'])

print('Original Data:')
print(df)

# resample the data by 2 minutes and sum the values resampled_data = df.resample('2T').sum()
print() print('Resampled Data:') print(resampled_data) ''' Output Original Data: A 2020-01-01 00:00:00 0.0 2020-01-01 00:01:00 NaN 2020-01-01 00:02:00 2.0 2020-01-01 00:03:00 3.0 Resampled Data: A 2020-01-01 00:00:00 0.0 2020-01-01 00:02:00 5.0 '''

Here, we converted a DataFrame with a one-minute frequency into a two-minute frequency using the resample('2T') call. Also, .sum() aggregates the data in each bin.


resample() Syntax

The syntax of the resample() method in Pandas is:

df.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None)

resample() Arguments

The resample() method in Pandas has the following arguments:

  • rule: the target frequency for resampling
  • axis (optional): specifies the axis to resample on
  • closed (optional): defines which side of each interval is closed - 'right' or 'left'
  • label (optional): decides which side of each interval is labeled - 'right' or 'left'
  • convention (optional): for resampling with PeriodIndex, defines whether to use the start or end of the rule
  • kind (optional): chooses the index type for the resampled data
  • loffset (optional): adjusts the resampled time labels by the given offset
  • base (optional): sets the offset for the resample operation
  • on (optional): selects a specific column for resampling in DataFrame
  • level (optional): identifies a particular level of a MultiIndex to resample.

resample() Return Value

The resample method() returns a Resampler object, which allows for various data aggregation operations for time series data.


Example 1: Downsampling and Aggregating

Downsampling is the process of reducing the frequency of a time series dataset by aggregating data points within larger intervals.

Let's look at an example.

.

import pandas as pd

# create a time series DataFrame
range_of_dates = pd.date_range('1/1/2020', periods=5, freq='T')
df = pd.DataFrame({ 'A': [1, 2, 3, 4, 5] }, index=range_of_dates)

# resample to 3-minute intervals and compute the mean
downsampled = df.resample('3T').mean()

print(downsampled)

Output

A
2020-01-01 00:00:00  2.0
2020-01-01 00:03:00  4.5

In this example, we decreased the data frequency to every three minutes (downsampling) and used .mean() for aggregation.

To learn more about aggregate functions, visit Pandas Aggregate Function.


Example 2: Upsampling and Filling

Upsampling is the process of increasing the frequency of a time series dataset by introducing additional data points within smaller intervals, often requiring data imputation methods such as filling or interpolation.

Let's look at an example.

import pandas as pd

# time series data
range_of_dates = pd.date_range('1/1/2020', periods=2, freq='D')
df = pd.DataFrame({ 'A': [1, 2] }, index=range_of_dates)

# resample to finer granularity and forward-fill the values
upsampled = df.resample('12H').ffill()

print(upsampled)

Output

A
2020-01-01 00:00:00  1
2020-01-01 12:00:00  1
2020-01-02 00:00:00  2

In this example, we upsampled the data from daily to 12-hourly frequency, with forward filling to handle missing values.