Pandas quantile()

The quantile() method in Pandas returns values at the given quantile over the requested axis.

A quantile is a way to understand the distribution of data within a DataFrame or Series.

Example

import pandas as pd

# sample DataFrame
data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}

df = pd.DataFrame(data)

# calculate the median, which is the 50th percentile or quantile(0.5)
median = df.quantile(0.5)

print(median)

'''
Output

A    2.0
B    5.0
Name: 0.5, dtype: float64
'''

quantile() Syntax

The syntax of the quantile() method in Pandas is:

df.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')

quantile() Arguments

The quantile() method has the following arguments.

  • q (optional): the quantile to compute, which must be between 0 and 1 (default 0.5)
  • axis (optional): the axis to compute the quantile along
  • numeric_only (optional): if False, the quantile of datetime and timedelta data will be computed as well (default True)
  • interpolation (optional): specifies the interpolation method to use when the desired quantile lies between two data points.

quantile() Return Value

The quantile() method returns a scalar or Series if q is a single quantile, and a DataFrame if q is an array of multiple quantiles.


Example 1: Single Quantile

import pandas as pd

data = {'A': [1, 3, 5, 7],
        'B': [2, 4, 6, 8]}
df = pd.DataFrame(data)

# calculate the 25th percentile
quantile_25 = df.quantile(0.25)

print(quantile_25)

Output

A    2.5
B    3.5
Name: 0.25, dtype: float64

Here, we calculated the 25th percentile (first quartile) for each column.


Example 2: Multiple Quantiles

import pandas as pd

data = {'A': [1, 3, 5, 7],
        'B': [2, 4, 6, 8]}
df = pd.DataFrame(data)

# calculate the 25th and 75th percentiles
quantiles = df.quantile([0.25, 0.75])

print(quantiles)

Output

        A    B
0.25  2.5  3.5
0.75  5.5  6.5

In this example, we calculated multiple quantiles for each column, resulting in a DataFrame showing the 25th and 75th percentiles.


Example 3: Quantile with Interpolation

import pandas as pd

data = {'A': [1, 3, 5, 7],
        'B': [2, 4, 6, 8]}
df = pd.DataFrame(data)

# calculate the median with a different interpolation method
median_higher = df.quantile(0.5, interpolation='higher')

print(median_higher)

Output

A    5
B    6
Name: 0.5, dtype: int64

In this example, we have set the interpolation parameter to 'higher'.

By choosing 'higher', we force the quantile function to return the actual observed value from the dataset that is higher than the median position.