Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas Categorical

Categorical data is a type of data that represents categories or labels rather than numerical values.

In simple words, it is a way of classifying into distinct categories, such as genders, country names, or education levels.

Categorical data is handy when we have data that naturally fit into predefined options.

Create Categorical Data Type in Pandas

In Pandas, the Categorical() method is used to create a categorical data type from a given sequence of values.

import pandas as pd

data = ['red', 'blue', 'green', 'red', 'blue']

# create a categorical column
categorical_data = pd.Categorical(data)

print(categorical_data)

Run Code

Output

['red', 'blue', 'green', 'red', 'blue']
Categories (3, object): ['blue', 'green', 'red']

In the above example, the Categorical() function converts the data list into a categorical series.

The output includes the original data values and a list of unique categories present in the data.

Convert Pandas Series to Categorical Series

In Pandas, we can convert a regular Pandas Series to a Categorical Series using either the astype() function or the dtype parameter within the pd.Series() constructor.

Using the astype() Function

import pandas as pd

# create a regular Series
data = ['red', 'blue', 'green', 'red', 'blue']
series1 = pd.Series(data)

# convert the Series to a categorical Series using .astype()
categorical_s = series1.astype('category')

print(categorical_s)

Run Code

Output

0      red
1     blue
2    green
3      red
4     blue
dtype: category
Categories (3, object): ['blue', 'green', 'red']

Here, series1.astype('category') specifies we want to convert the series1 series into a categorical series.

Using the dtype parameter Inside Series()

import pandas as pd

# create a categorical Series
data = ['A', 'B', 'A', 'C', 'B']
cat_series = pd.Series(data, dtype="category")

print(cat_series)

Run Code

Here, we have used the dtype="category" parameter inside Series() to convert normal series into categorical series.

The output will be the same as above.

Access Categories and Codes in Pandas

In Pandas, the cat accessor allows us to access categories and codes. Here's the attributes provided by the cat accessor to access categories and codes:

categories - returns the unique categories present in the categorical variable
codes - returns the integer codes representing the categories for each element in the categorical variable

Let's look at an example.

import pandas as pd

# create a categorical Series
data = ['A', 'B', 'A', 'C', 'B']
cat_series = pd.Series(data, dtype="category")

# using .cat accessor
print(cat_series.cat.categories)
print(cat_series.cat.codes)

Run Code

Output

Index(['A', 'B', 'C'], dtype='object')
0    0
1    1
2    0
3    2
4    1
dtype: int8

In the above example, first we have used cat_series.cat.categories to access the unique categories present in cat_series.

In this case, the output will be Index(['A', 'B', 'C'], dtype='object'), which are the distinct categories in the data.

Then, we have used cat_series.cat.codes to access the integer codes corresponding to the categories in cat_series.

Let's see how we got the output,

Here,

The element at index 0 of cat_series is A, which corresponds to category 0.
The element at index 1 of cat_series is B, which corresponds to category 1.
The element at index 2 of cat_series is A, which again corresponds to category 0.
The element at index 3 of cat_series is C, which corresponds to category 2.
The element at index 4 of cat_series is B, which again corresponds to category 1.

Rename Categories in Pandas

We can rename the categories in Pandas using the cat.rename_categories() method. For example,

import pandas as pd

# create a categorical Series
data = ['A', 'B', 'A', 'C', 'B']
cat_series = pd.Series(data, dtype="category")

# create a dictionary for renaming categories
category_mapping = {"A": "Category A", "B": "Category B", "C": "Category C"}

# rename categories using .rename_categories() and recreate the Series
cat_series_renamed = cat_series.cat.rename_categories(category_mapping)

print(cat_series_renamed)

Run Code

Output

0    Category A
1    Category B
2    Category A
3    Category C
4    Category B
dtype: category
Categories (3, object): ['Category A', 'Category B', 'Category C']

In this example, the categories A, B, and C are renamed to Category A, Category B, and Category C respectively.

Add New Categories in Pandas

In Pandas, we can add new categories to the existing set of categories using the cat.add_categories() method.

Let's look at an example.

import pandas as pd

# create a categorical Series
data = ['A', 'B', 'A', 'C', 'B']
cat_series = pd.Series(data, dtype="category")

# add new categories and reassign the variable
new_categories = ['D', 'E']
cat_series = cat_series.cat.add_categories(new_categories)

print(cat_series)

Run Code

Output

0    A
1    B
2    A
3    C
4    B
dtype: category
Categories (5, object): ['A', 'B', 'C', 'D', 'E']

Here, we added the new categories D and E to the categorical Series, and the result was assigned back to cat_series, effectively updating the variable with the new categories.

Remove Categories in Pandas

To remove categories from a categorical variable in Pandas, we can use the cat.remove_categories() method.

Let's look at an example.

import pandas as pd

# create a categorical Series
data = ['A', 'B', 'A', 'C', 'B']
cat_series = pd.Series(data, dtype="category")

# display the original categorical variable
print("Original Series:")
print(cat_series)

# remove specific categories
categories_to_remove = ["B", "C"]
cat_series_removed = cat_series.cat.remove_categories(categories_to_remove)

# display the modified categorical variable
print("\nModified Series:")
print(cat_series_removed)

Run Code

Output

Original Series:
0    A
1    B
2    A
3    C
4    B
dtype: category
Categories (3, object): ['A', 'B', 'C']
Modified Series:
0      A
1    NaN
2      A
3    NaN
4    NaN
dtype: category
Categories (1, object): ['A']

In this example, we have used the cat.remove_categories() to remove the categories B and C from cat_series.

Check if Categorical Variable is Ordered or Not

In Pandas, to check if a categorical variable is ordered, you can use the ordered attribute provided by the cat accessor in pandas. For example,

import pandas as pd

# create an ordered categorical Series
data = ['low', 'medium', 'high', 'low', 'medium']
ordered_cat_series = pd.Categorical(data, categories=['low', 'medium', 'high'], ordered=True)

# check if the categorical variable is ordered
is_ordered = ordered_cat_series.ordered

print("Is ordered:", is_ordered)

Run Code

Output

Is ordered: True

In this example, ordered_cat_series.ordered will be True because the categorical variable ordered_cat_series was created with the ordered=True parameter.

Note: Ordering categorical variables in Pandas helps in maintaining a logical sequence for analysis and visualization. Recognizing this order ensures accurate statistical tests, meaningful visual representations, and consistent data interpretation.

Introduction
Create Categorical Data Type in Pandas
Convert Pandas Series to Categorical Series
Access Categories and Codes in Pandas
Rename Categories in Pandas
Add New Categories in Pandas
Remove Categories in Pandas
Check if Categorical Variable is Ordered or Not

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically
and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Categorical

Create Categorical Data Type in Pandas

Convert Pandas Series to Categorical Series

Using the astype() Function

Using the dtype parameter Inside Series()

Access Categories and Codes in Pandas

Rename Categories in Pandas

Add New Categories in Pandas

Remove Categories in Pandas

Check if Categorical Variable is Ordered or Not

Table of Contents

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Categorical

Create Categorical Data Type in Pandas

Convert Pandas Series to Categorical Series

Using the astype() Function

Using the dtype parameter Inside Series()

Access Categories and Codes in Pandas

Rename Categories in Pandas

Add New Categories in Pandas

Remove Categories in Pandas

Check if Categorical Variable is Ordered or Not

Table of Contents

Learn Python practically
and Get Certified.