Pandas qcut()

The qcut() method in Pandas is used for dividing a continuous variable into quantile-based bins, effectively transforming it into a categorical variable.

Example

import pandas as pd

# define a list of numeric data
data = [320, 280, 345, 378, 290, 310, 260, 300]

# use qcut() to divide the data into 4 quantiles result = pd.qcut(data, 4)
print(result) ''' Output [(305.0, 326.25], (259.999, 287.5], (326.25, 378.0], (326.25, 378.0], (287.5, 305.0], (305.0, 326.25], (259.999, 287.5], (287.5, 305.0]] Categories (4, interval[float64, right]): [(259.999, 287.5] < (287.5, 305.0] < (305.0, 326.25] < (326.25, 378.0]] '''

qcut() Syntax

The syntax of the qcut() method in Pandas is:

pandas.qcut(x, q, labels=None, retbins=False, precision=3)

qcut() Arguments

The qcut() method takes following arguments:

  • x - the input array to be binned
  • q - the number of quantiles or array of quantiles
  • labels (optional) - specifies the labels for the returned bins
  • retbins (optional) - specifies whether to return the bins or not
  • precision (optional) - precision of the quantiles.

qcut() Return Value

The qcut() method in Pandas returns a Categorical object representing the binned variable with equal frequency bins.


Example 1: Categorizing Data Using qcut()

import pandas as pd

# create a list of temperatures
temperatures = [68, 72, 75, 80, 85, 90, 95, 100, 65, 70, 78, 82]

# use qcut() to categorize each temperature into 4 equal-sized bins (quartiles) temperature_categories = pd.qcut(temperatures, 4)
print(temperature_categories)

Output

[(64.999, 71.5], (71.5, 79.0], (71.5, 79.0], (79.0, 86.25], (79.0, 86.25], ..., (86.25, 100.0], (64.999, 71.5], (64.999, 71.5], (71.5, 79.0], (79.0, 86.25]]
Length: 12
Categories (4, interval[float64, right]): [(64.999, 71.5] < (71.5, 79.0] < (79.0, 86.25] <
                                           (86.25, 100.0]]

In the above example, we have the list named temperatures containing various temperature readings.

We then used pd.qcut() to divide these temperature values into 4 quartiles, ensuring an equal number of temperatures in each bin.


Example 2: Naming Bins in Pandas qcut()

import pandas as pd

# create a list of exam scores
scores = [67, 85, 78, 92, 74, 70, 56, 90]

# define custom labels for the bins
bin_labels = ['D', 'C', 'B', 'A']

# use qcut() to divide scores into 4 quantiles and assign the custom labels score_categories = pd.qcut(scores, 4, labels=bin_labels)
print(score_categories)

Output

['D', 'B', 'B', 'A', 'C', 'C', 'D', 'A']
Categories (4, object): ['D' < 'C' < 'B' < 'A']

In this example, we defined the bin_labels list with string labels D, C, B, A that correspond to quartile grades.

The pd.qcut() method is used to categorize the scores into 4 bins (quartiles) based on their distribution, with each bin getting a label from bin_labels.


Example 4: Extract Bin Information Using retbins Argument in qcut()

import pandas as pd

# create a list of data points
data_points = [12, 20, 19, 27, 25, 35, 29, 40, 31, 38]

# use qcut() with retbins=True to get both the binned data and the bin edges binned_data, bins = pd.qcut(data_points, 4, retbins=True)
print("Binned Data:") print(binned_data) print("\nBin Edges:") print(bins)

Output

Binned Data:
[(11.999, 21.25], (11.999, 21.25], (11.999, 21.25], (21.25, 28.0], (21.25, 28.0], (34.0, 40.0], (28.0, 34.0], (34.0, 40.0], (28.0, 34.0], (34.0, 40.0]]
Categories (4, interval[float64, right]): [(11.999, 21.25] < (21.25, 28.0] < (28.0, 34.0] <
                                           (34.0, 40.0]]

Bin Edges:
[12.   21.25 28.   34.   40.  ]

In the above example, we use the pd.qcut() method with the retbins=True argument to categorize a list of numeric data points into quantiles and also to obtain the precise bin edges that define these quantiles.


Example 5: Specify the precision of the Labels of the Bins

import pandas as pd

# create a list of floating-point numbers
data = [1.123, 2.345, 3.567, 4.789, 5.901, 6.234, 7.456, 8.678]

# use qcut() to divide data into 4 quantiles quantiles = pd.qcut(data, 4, precision=2)
print(quantiles)

Output

[(1.11, 3.26], (1.11, 3.26], (3.26, 5.34], (3.26, 5.34], (5.34, 6.54], (5.34, 6.54], (6.54, 8.68], (6.54, 8.68]]
Categories (4, interval[float64, right]): [(1.11, 3.26] < (3.26, 5.34] < (5.34, 6.54] <
                                           (6.54, 8.68]]

Here, we used pd.qcut() with precision=2. This means that the labels of the bins will be displayed with 2 decimal places.