Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas duplicated()

The duplicated() method in Pandas is used to mark duplicate rows based on column values.

Example

import pandas as pd

# sample DataFrame
data = {'A': [1, 2, 2],
        'B': [4, 5, 5]}

df = pd.DataFrame(data)

# identify duplicate rows
duplicates = df.duplicated()

print(duplicates)
'''
Output

0    False
1    False
2     True
dtype: bool
'''

Run Code

duplicated() Syntax

The syntax of the duplicated() method in Pandas is:

df.duplicated(subset=None, keep='first')

duplicated() Arguments

The duplicated() method has the following arguments:

subset (optional): column label or sequence of labels to consider for identifying duplicates
keep (optional): determines which duplicates (if any) to mark

duplicated() Return Value

The duplicated() method returns a boolean Series indicating whether each row is a duplicate.

Example 1: Identifying Duplicates in a Specific Column

import pandas as pd

data = {'A': [1, 2, 2],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)

# identify duplicates in column 'A'
duplicates_in_A = df.duplicated(subset='A')

print(duplicates_in_A)

Run Code

Output

0    False
1    False
2     True
dtype: bool

In this example, we identified duplicates based on column A using the subset='A' argument.

Here, the third element of column A is a duplicate.

Example 2: Keeping Last Occurrences

import pandas as pd

data = {'A': [1, 2, 2, 2],
        'B': [4, 5, 5, 5]}
df = pd.DataFrame(data)

# keep the last occurrence of the duplicate rows
last_occurrences = df.duplicated(keep='last')

print(last_occurrences)

Run Code

Output

0    False
1     True
2     True
3    False
dtype: bool

In this example, we marked all duplicates as True except for the last occurrence using the keep='last' argument.

Here, there are three occurrences of the row values [2, 5]. The first two are marked True whereas the last one is marked False.

Example 3: Marking All Duplicates

import pandas as pd

data = {'A': [1, 2, 2, 2],
        'B': [4, 5, 5, 5]}
df = pd.DataFrame(data)

# mark all duplicates
all_duplicates = df.duplicated(keep=False)

print(all_duplicates)

Run Code

Output

0    False
1     True
2     True
3     True
dtype: bool

In this example, we marked all duplicate rows as True using the keep=False argument.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically
and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Pandas duplicated()

Example

duplicated() Syntax

duplicated() Arguments

duplicated() Return Value

Example 1: Identifying Duplicates in a Specific Column

Example 2: Keeping Last Occurrences

Example 3: Marking All Duplicates

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Learn Python practically and Get Certified.

Popular Tutorials

Reference Materials

Popular Examples

Pandas duplicated()

Example

duplicated() Syntax

duplicated() Arguments

duplicated() Return Value

Example 1: Identifying Duplicates in a Specific Column

Example 2: Keeping Last Occurrences

Example 3: Marking All Duplicates

Learn Python practically
and Get Certified.