Pandas set_index()

The set_index() method in Pandas is used to set the index of the DataFrame.

This method allows us to use one or more columns as the index. Once set, the specified column(s) will become the new row labels of the DataFrame.

Example

import pandas as pd

# sample DataFrame
df = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3']
})

# set column 'A' as the index df = df.set_index('A')
print(df) ''' Output B C A A0 B0 C0 A1 B1 C1 A2 B2 C2 A3 B3 C3 '''

set_index() Syntax

The syntax of the set_index() method in Pandas is:

df.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

set_index() Arguments

The set_index() method takes following arguments:

  • keys - specifies which column(s) to use as the new index.
  • drop (optional) - if True, removes the column(s) used as the new index. If False, the column(s) is retained in the DataFrame.
  • append (optional) - if True, adds the new index alongside the existing index. If False, the existing index is replaced with the new one.
  • inplace (optional) - if True, modifies the original DataFrame in place. If False, returns a new DataFrame.
  • verify_integrity (optional) - if True, ensures the new index doesn't have duplicate values. If False, doesn't check for duplicates.

set_index() Return Value

The set_index() method returns a new DataFrame with the specified column(s) set as the index.


Example 1: Set a Single Column as the Index

import pandas as pd

# creating a sample DataFrame
data = {
    'ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# setting 'ID' column as the index df_indexed = df.set_index('ID')
print("DataFrame after setting 'ID' as index:") print(df_indexed)

Output

Original DataFrame:
    ID     Name  Age
0  101    Alice   25
1  102      Bob   30
2  103  Charlie   35
3  104    David   40

DataFrame after setting 'ID' as index:
         Name  Age
ID               
101    Alice   25
102      Bob   30
103  Charlie   35
104    David   40

In the above example, after using the set_index('ID') method, the ID column is now the index of the df_indexed DataFrame.


Example 2: Retain Columns While Setting Them as Index

import pandas as pd

# creating a sample DataFrame
data = {
    'ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# setting 'ID' column as the index but retaining it in the DataFrame df_indexed = df.set_index('ID', drop=False)
print("\nDataFrame after setting 'ID' as index but retaining it as a column:") print(df_indexed)

Output

Original DataFrame:
     ID     Name  Age
0  101    Alice   25
1  102      Bob   30
2  103  Charlie   35
3  104    David   40

DataFrame after setting 'ID' as index but retaining it as a column:
       ID     Name  Age
ID                    
101  101    Alice   25
102  102      Bob   30
103  103  Charlie   35
104  104    David   40

Here, we have used drop=False inside set_index() to retain columns while setting them as index.

So as we can see in the result, the ID column has been set as the index of the df_indexed DataFrame, but it's also retained as a column within the DataFrame.


Example 3: Set Multiple Columns as the Index

import pandas as pd

# create a sample DataFrame
data = {
    'Country': ['USA', 'USA', 'Canada', 'Canada'],
    'State': ['California', 'New York', 'Ontario', 'Quebec'],
    'City': ['Los Angeles', 'New York City', 'Toronto', 'Montreal'],
    'Population': [3977687, 8175133, 2731571, 1704694]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print() 

# setting 'Country' and 'State' columns as the index df_multi_indexed = df.set_index(['Country', 'State'])
print("\nDataFrame after setting 'Country' and 'State' as indices:") print(df_multi_indexed)

Output

Original DataFrame:
   Country   State        City          Population
0     USA  California    Los Angeles    3977687
1     USA   New York   New York City    8175133
2  Canada   Ontario     Toronto         2731571
3  Canada   Quebec      ontreal         1704694

DataFrame after setting 'Country' and 'State' as indices:
                       City           Population
Country  State                                
USA       California   Los Angeles    3977687
          New York    New York City   8175133
Canada   Ontario      Toronto         2731571
          Quebec      Montreal        1704694

In this example, set_index(['Country', 'State']) sets both the Country and State columns as the index, resulting in the multi-index DataFrame.


Example 4: Append a Column to the Existing Index

import pandas as pd

# creating a sample DataFrame
data = {
    'ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

# initially set the 'ID' column as our index
df.set_index('ID', inplace=True)

print("DataFrame with 'ID' as index:")
print(df)
print()

# append 'City' to the existing index, creating a multi-index df.set_index(['City'], append=True, inplace=True)
print("\nDataFrame after appending 'City' to the existing index:") print(df)

Output

DataFrame with 'ID' as index:
          Name  Age    City
ID                            
101    Alice   25     New York
102      Bob   30  Los Angeles
103  Charlie   35      Chicago
104    David   40      Houston
DataFrame after appending 'City' to the existing index:
                   Name    Age
ID   City                     
101  New York      Alice    25
102  Los Angeles   Bob      30
103  Chicago       Charlie  35
104  Houston       David    40

In the above example, we have the df DataFrame initially indexed by ID.

And then we used the set_index() method with the append=True parameter to append the City column to the index, creating a multi-index consisting of ID and City.

Here, the inplace=True argument modifies the original DataFrame directly without creating a new one and without returning anything.


Example 5: Check for Duplicates in the New Index

import pandas as pd

# sample DataFrame
data = {
    'ID': [101, 102, 103, 101],  # Note the duplicate ID '101'
    'Name': ['Alice', 'Bob', 'Charlie', 'Eve'],
    'Age': [25, 30, 35, 28]
}

df = pd.DataFrame(data)

# attempt to set 'ID' as index and checking for duplicates
try:
df.set_index('ID', verify_integrity=True, inplace=True)
except ValueError as e: print(e)

Output

Index has duplicate keys: Int64Index([101], dtype='int64', name='ID')

Here, since there's a duplicate in the ID column, a ValueError is raised indicating the presence of the duplicate key in the index.

Note: To learn more about exception handling, please visit Python Exception Handling.