Pandas pivot()

The pivot() method in Pandas is used to reshape data. It's particularly handy when we have data in a long format and want to pivot it to a wide format based on column values.

Example

import pandas as pd

# create a sample dataset
data = {
    'Student': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
    'Subject': ['Math', 'History', 'Math', 'History', 'Math', 'History'],
    'Score': [90, 85, 78, 88, 92, 79]
}

# convert data dictionary to DataFrame
df = pd.DataFrame(data)

# use pivot() to reshape the data pivot_df = df.pivot(index='Student', columns='Subject', values='Score')
# display the reshaped data print(pivot_df) ''' Output Subject History Math Student Alice 85 90 Bob 88 78 Charlie 79 92 '''

pivot() Syntax

The syntax of the pivot() method in Pandas is:

df.pivot(columns, index=None, values=None)

pivot() Arguments

The pivot() function takes following arguments:

  • columns - column(s) to use to make new dataframe's columns
  • index (optional) - column to use as the new frame's index. If None, use the existing index
  • values (optional) - column(s) to use for populating new dataframe's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

pivot() Return Value

The pivot() method returns a reshaped DataFrame organized by the given index/column values.


Example1: Using pivot() to Reshape the DataFrame

import pandas as pd

# sample DataFrame
data = {
    'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
    'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
    'Temperature': [65, 66, 78, 79],
    'Humidity': [56, 57, 54, 52]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# pivot with just the 'City' column
df_pivot = df.pivot(columns='City')

print("\nPivoted DataFrame:")
print(df_pivot)

Output

Original DataFrame:
Date         City  Temperature  Humidity
0  2023-10-01     New York           65        56
1  2023-10-02     New York           66        57
2  2023-10-01  Los Angeles           78        54
3  2023-10-02  Los Angeles           79        52

Pivoted DataFrame:
Date             Temperature             Humidity         
City Los Angeles    New York Los Angeles New York Los Angeles New York
0            NaN  2023-10-01         NaN     65.0         NaN     56.0
1            NaN  2023-10-02         NaN     66.0         NaN     57.0
2     2023-10-01         NaN        78.0      NaN        54.0      NaN
3     2023-10-02         NaN        79.0      NaN        52.0      NaN
> 

Here, the code above produces a multi-level column DataFrame where the top level of columns represents the unique cities and the second level represents the original columns Temperature and Humidity.

Each row then corresponds to a specific date from the original Date column.

The NaN values in the pivoted DataFrame indicates there's no available data for a particular combination.


Example2: Use pivot() with the index argument

import pandas as pd

# sample DataFrame
data = {
    'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
    'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
    'Temperature': [65, 66, 78, 79],
    'Humidity': [56, 57, 54, 52]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# pivot using 'Date' as the index df_pivot = df.pivot(index='Date', columns='City')
print("\nPivoted DataFrame:") print(df_pivot)

Output

Original DataFrame:
   Date           City         Temperature   Humidity
0  2023-10-01     New York           65        56
1  2023-10-02     New York           66        57
2  2023-10-01  Los Angeles           78        54
3  2023-10-02  Los Angeles           79        52

Pivoted DataFrame:
                  Temperature             Humidity         
City       Los Angeles New York Los Angeles New York
Date                                                
2023-10-01          78       65          54       56
2023-10-02          79       66          52       57

In this example, the Date column values become the index of the pivoted DataFrame.

The City values become new columns, yielding a multi-level DataFrame with cities as the top columns and Temperature and Humidity as sub-columns.


Example3: Using pivot() in Pandas with the values Argument

import pandas as pd

# sample DataFrame
data = {
    'Date': ['2023-10-01', '2023-10-02', '2023-10-01', '2023-10-02'],
    'City': ['New York', 'New York', 'Los Angeles', 'Los Angeles'],
    'Temperature': [65, 66, 78, 79],
    'Humidity': [56, 57, 54, 52]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# pivot with Date as the index, City as the columns, # and Temperature as the values df_pivot = df.pivot(index='Date', columns='City', values='Temperature')
print("\nPivoted DataFrame:") print(df_pivot)

Output

Original DataFrame:
     Date         City         Temperature  Humidity
0  2023-10-01     New York           65        56
1  2023-10-02     New York           66        57
2  2023-10-01  Los Angeles           78        54
3  2023-10-02  Los Angeles           79        52

Pivoted DataFrame:
City        Los Angeles  New York
Date                             
2023-10-01           78        65
2023-10-02           79        66

In this example, the pivot() method is called on df using the syntax: df.pivot(index='Date', columns='City', values='Temperature').

Here,

  • index='Date' - unique values in Date become rows in the new DataFrame
  • columns='City' - unique values in City set the columns (New York and Los Angeles)
  • values='Temperature' - Temperature values from original data populate the cells in the pivoted table

And finally, a grid is formed where rows are dates, columns are cities, and cells show temperatures, allowing easy visual comparison.