Pandas describe()

The describe() method in Pandas provides a statistical summary of the dataset; central tendency, dispersion, and shape of the distribution.

Example

import pandas as pd

# create a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [5, 6, 7, 8, 9]}

df = pd.DataFrame(data)

# use describe() to get the statistical summary of the DataFrame
summary = df.describe()

print(summary)

'''
Output

              A         B
count  5.000000  5.000000
mean   3.000000  7.000000
std    1.581139  1.581139
min    1.000000  5.000000
25%    2.000000  6.000000
50%    3.000000  7.000000
75%    4.000000  8.000000
max    5.000000  9.000000
'''

describe() Syntax

The syntax of the describe() method in Pandas is:

obj.describe(percentiles=None, include=None, exclude=None)

describe() Arguments

The describe() method takes the following arguments:

  • percentiles (optional) - a list-like object of numbers which determines the percentiles to include in the output
  • include (optional) - a list-like object of data types to include in the output
  • exclude (optional) - a list-like object of data types to exclude from the output.

describe() Return Value

The describe() method returns a DataFrame that provides descriptive statistics of the input DataFrame or Series.


Example 1: describe() for Categorical Data

We can also use describe() to get the description of categorical data.

import pandas as pd

# create a sample DataFrame with categorical data
data = {'Colors': ['Red', 'Blue', 'Blue', 'Red', 'Green']}
df = pd.DataFrame(data)

# get the description of categorical data
description = df.describe(include='all')

print(description)

Output

       Colors
count       5
unique      3
top      Red
freq        2

Example 2: Custom Percentiles

import pandas as pd

# create a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# use describe() specifying custom percentiles
description = df.describe(percentiles=[.1, .5, .9])

print(description)

Output

          Values
count   5.000000
mean   30.000000
std    15.811388
min    10.000000
10%    14.000000
50%    30.000000
90%    46.000000
max    50.000000

In this example, we provided the custom percentiles (10%, 50% and 90%) to the describe() method to get those details.


Example 3: Including and Excluding Data Types

import numpy as np
import pandas as pd

# create a mixed DataFrame
data = {
    'Age': [25, 30, 35, 40],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df = pd.DataFrame(data)

# describe only numeric columns
numeric_description = df.describe(include=[np.number])
print("Numbers only:")
print(numeric_description)

print()
# describe only object columns
print("Other types only:")
str_description = df.describe(exclude=[np.number])
print(str_description)

Output

Numbers only:
             Age
count   4.000000
mean   32.500000
std     6.454972
min    25.000000
25%    28.750000
50%    32.500000
75%    36.250000
max    40.000000

Other types only:
         Name
count       4
unique      4
top     Alice
freq        1

In this example, we included and excluded certain data types to get the summary of specified data types only.

Here, we used NumPy data types because NumPy provides specific data types (numeric, categorical, etc.) that are consistent with Pandas since Pandas is built on Numpy.