Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas crosstab()

The crosstab() method in Pandas allows us to create contingency tables, also known as cross-tabulations.

A contingency table helps us understand the relationship between two or more categorical variables within a dataset.

Example

import pandas as pd

# sample DataFrame
data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No']}

df = pd.DataFrame(data)

# create a cross-tabulation of Gender and Smoker
cross_tab = pd.crosstab(df['Gender'], df['Smoker'])

print(cross_tab)

'''
Output

Smoker  No  Yes
Gender         
Female   2    0
Male     1    2
'''

crosstab() Syntax

The syntax of the crosstab() method in Pandas is:

pd.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)

crosstab() Arguments

The crosstab() method has the following arguments:

index: the column or array-like object whose values will be used as rows
columns: the column or array-like object whose values will be used as columns
values (optional): the column to aggregate values based on the intersection of index and columns
rownames (optional): the names to be used for the row index
colnames (optional): the names to be used for the column index
aggfunc (optional): the aggregation function to apply to values
margins (optional): whether to include row and column margins
margins_name (optional): the name to be used for the margin labels
dropna (optional): whether to exclude missing values
normalize (optional): whether to normalize the values to show proportions.

crosstab() Return Value

The crosstab() method returns a DataFrame representing the cross-tabulation of the factors specified in index and columns.

Example 1: Basic Cross-Tabulation

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Employed': ['Yes', 'Yes', 'Yes', 'Yes', 'No']}

df = pd.DataFrame(data)

# create a basic cross-tabulation of Gender and Employed
cross_tab = pd.crosstab(df['Gender'], df['Employed'])

print(cross_tab)

Output

Employed  No  Yes
Gender            
Female      0    2
Male        1    2

In this example, we created a basic cross-tabulation of Gender and Employed to understand the distribution of employed and unemployed people among genders.

Example2: Margins in crosstab()

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No']}

df = pd.DataFrame(data)

# create a cross-tabulation with margins
cross_tab = pd.crosstab(df['Gender'], df['Smoker'], margins=True, margins_name='Total')

print(cross_tab)

Output

Smoker  No  Yes  Total
Gender                
Female   2    0      2
Male     1    2      3
Total    3    2      5

In this example, we included row and column margins in the cross-tabulation to show the totals for each row and column.

Example 3: Normalized Cross-Tabulation

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No']}

df = pd.DataFrame(data)

# create a normalized cross-tabulation of Gender and Smoker
cross_tab = pd.crosstab(df['Gender'], df['Smoker'], normalize=True)

print(cross_tab)

Output

Smoker        No       Yes
Gender                    
Female  0.166667  0.166667
Male    0.333333  0.333333

In this example, we created a normalized cross-tabulation to show proportions instead of raw counts.

Example 4: Aggregate Functions with crosstab()

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
        'Smoker': ['Yes', 'No', 'Yes', 'No', 'No'],
        'Age': [25, 30, 35, 40, 45]}

df = pd.DataFrame(data)

# create a cross-tabulation of Gender and Smoker with average Age as the aggregation
cross_tab = pd.crosstab(df['Gender'], df['Smoker'], values=df['Age'], aggfunc='mean')

print(cross_tab)

Output

Smoker    No   Yes
Gender            
Female  35.0   NaN
Male    45.0  30.0

In this example, we used aggfunc=mean to calculate the mean age for smokers and non smokers of different genders.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically
and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Pandas crosstab()

Example

crosstab() Syntax

crosstab() Arguments

crosstab() Return Value

Example 1: Basic Cross-Tabulation

Example2: Margins in crosstab()

Example 3: Normalized Cross-Tabulation

Example 4: Aggregate Functions with crosstab()

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Pandas crosstab()

Example

crosstab() Syntax

crosstab() Arguments

crosstab() Return Value

Example 1: Basic Cross-Tabulation

Example2: Margins in crosstab()

Example 3: Normalized Cross-Tabulation

Example 4: Aggregate Functions with crosstab()

Learn Python practically
and Get Certified.