In Pandas, an index refers to the labeled array that identifies rows or columns in a DataFrame or a Series. For example,
Name Age City 0 John 25 New York 1 Alice 28 London 2 Bob 32 Paris
In the above DataFrame, the numbers 0, 1, and 2 represent the index, providing unique labels to each row.
We can use indexes to uniquely identify data and access data with efficiency and precision.
Create Indexes in Pandas
Pandas offers several ways to create indexes. Some common methods are as follows:
- Default Index
- Setting Index
- Creating a Range Index
Default Index
When we create a DataFrame or Series without specifying an index explicitly, Pandas assigns a default integer index starting from 0. For example,
import pandas as pd
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df)
Output
Name Age City 0 John 25 New York 1 Alice 28 London 2 Bob 32 Paris
In this example, the default index [0, 1, 2]
is automatically assigned to the rows.
Setting Index
We can set an existing column as the index using the set_index()
method. For example,
import pandas as pd
# create dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
# set the 'Name' column as index
df.set_index('Name', inplace=True)
print(df)
Output
Name Age City John 25 New York Alice 28 London Bob 32 Paris
In this example, the Name
column is set as the index, replacing the default integer index.
Here, the inplace=True
parameter performs the operation directly on the object itself, without creating a new object. When we specify inplace=True
, the original object is modified, and the changes are directly applied.
Creating a Range Index
We can create a range index with specific start and end values using the RangeIndex()
function. For example,
import pandas as pd
# create dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
# create a range index
df = pd.DataFrame(data, index=pd.RangeIndex(5, 8, name='Index'))
print(df)
Output
Name Age City Index 5 John 25 New York 6 Alice 28 London 7 Bob 32 Paris
Here, a range index from 5 to 8(excluded) is created with the name Index
.
Modifying Indexes in Pandas
Pandas allows us to make changes to indexes easily. Some common modification operations are:
- Renaming Index
- Resetting Index
Renaming Index
We can rename an index using the rename()
method. For example,
import pandas as pd
# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
# display original dataframe
print('Original DataFrame:')
print(df)
print()
# rename index
df.rename(index={0: 'A', 1: 'B', 2: 'C'}, inplace=True)
# display dataframe after index is renamed
print('Modified DataFrame')
print(df)
Output
Original DataFrame: Name Age City 0 John 25 New York 1 Alice 28 London 2 Bob 32 Paris Modified DataFrame Name Age City A John 25 New York B Alice 28 London C Bob 32 Paris
In this example, we renamed the indexes 0, 1, and 2 to 'A'
, 'B'
, and 'C'
respectively.
Resetting Index
We can reset the index to the default integer index using the reset_index()
method. For example,
import pandas as pd
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
# create a dataframe
df = pd.DataFrame(data)
# rename index
df.rename(index={0: 'A', 1: 'B', 2: 'C'}, inplace=True)
# display dataframe
print('Original DataFrame:')
print(df)
print('\n')
# reset index
df.reset_index(inplace=True)
# display dataframe after index is reset
print('Modified DataFrame:')
print(df)
Output
Original DataFrame: Name Age City A John 25 New York B Alice 28 London C Bob 32 Paris Modified DataFrame: index Name Age City 0 A John 25 New York 1 B Alice 28 London 2 C Bob 32 Paris
Access Rows by Index
We can access rows of a DataFrame using the .iloc
property. For example,
import pandas as pd
# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
second_row = df.iloc[1]
print(second_row)
Output
Name Alice Age 28 City London Name: 1, dtype: object
In this example, we displayed the second row of the df DataFrame by its index value (1) using the .iloc
property.
To learn more, please visit the Pandas Indexing and Slicing article.
Get DataFrame Index
We can access the DataFrame Index using the index
attribute. For example,
import pandas as pd
# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 28, 32],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
# return index object
print(df.index)
# return index values
print(df.index.values)
Output
RangeIndex(start=0, stop=3, step=1) [0 1 2]
Here,
df.index
- returns the index objectdf.index.values
- returns the index values as a list
Types of Indexes
Pandas supports different types of indexes that offer various functionalities based on the data requirements. A few notable types are listed in the table below.
Type | Description | Examples |
---|---|---|
Range Index (RangeIndex) | It represents a sequence of integers within a specified range. It is of type int64 . The range index [0, 1, 2, ...] is often used as the default index when creating DataFrame |
[0, 1, 2, 3, 4, 5, 6] [100, 101, 102, 103, 104] |
Categorical Index (CategoricalIndex) | It is used when dealing with categorical data. It stores a fixed set of unique categorical values. | ['Red', 'Green', 'Blue', 'Red', 'Blue'] ['Category A', 'Category B', 'Category C', 'Category A', 'Category B'] |
Datetime Index (DatetimeIndex) | It is used when working with time series data. It is of type datetime64 . |
['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-04', '2023-06-05'] ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'] |
In addition to these, there are other types of indexes:
Multi Index (MultiIndex) |
It allows us to have multiple levels of indexing on one or more axes of a DataFrame or a Series object. |
Interval Index (IntervalIndex) |
It is used to represent intervals or ranges of values in pandas. |
Timedelta Index (TimedeltaIndex) |
It represents a sequence of time durations. Each element in the index represents a specific duration of time, such as hours, minutes, seconds, or a combination of these. |
Period Index (PeriodIndex) |
It represents a sequence of time periods. Each element in the index represents a specific time period, such as a day, month, quarter, or year. |
To learn more, please refer to the official documentation on Pandas Index.