Pandas pivot_table()

The pivot_table() function in Pandas allows us to create a spreadsheet-style pivot table making it easier to group and analyze our data.

Example

import pandas as pd

# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, 75, 30, 77]}
df = pd.DataFrame(data)

# pivot the dataframe
pivot_df = df.pivot_table(index='Date', columns='City', values='Temperature')

print(pivot_df)

'''
Output

           Temperature         
City       Los Angeles New York
Date                           
2023-01-01          75       32
2023-01-02          77       30
'''

In this example, we used the pivot_table() method to create a pivot table where Date becomes the index, City becomes the columns, and Temperature becomes the values.


pivot_table() Syntax

The syntax of the pivot_table() method in Pandas is:

df.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')

pivot_table() Arguments

The pivot_table() method has the following arguments:

  • values (optional): the column to aggregate
  • index (optional): the key or keys to group by on the pivot table index
  • columns (optional): the key or keys to group by on the pivot table columns
  • aggfunc (optional): the aggregation function or list of functions to be used
  • fill_value (optional): value to replace missing values with after pivot
  • margins (optional): whether to add all rows/columns
  • dropna (optional): if set to False, do not include columns whose entries are all NaN
  • margins_name (optional): the name to use for the row/column that contains totals when margins is True

pivot_table() Return Value

The pivot_table() method returns a reshaped DataFrame organized by the given index/column values.


Example: pivot_table() with Multiple Values

If we omit the values argument in pivot_table(), it selects all the remaining columns (besides the ones specified index and columns) as values for the pivot table.

import pandas as pd

# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, 75, 30, 77],
        'Humidity': [80, 10, 85, 5]}

df = pd.DataFrame(data)

# pivot the dataframe
pivot_df = df.pivot_table(index='Date', columns='City')

print(pivot_df)

Output

Humidity          Temperature         
City       Los Angeles New York Los Angeles New York
Date                                                
2023-01-01        10.0     80.0        75.0     32.0
2023-01-02         5.0     85.0        77.0     30.0

In this example, we created a pivot table for multiple values i.e. Temperature and Humidity.


Example: pivot_table() With Aggregate Functions

We can use the pivot_table() method with different aggregate functions using the aggfunc parameter. We can set the value of aggfunc to functions such as 'sum', 'mean', 'count', 'max' or 'min'.

import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, 75, 30, 77, 33, 78],
        'Humidity': [80, 10, 85, 5, 81, 7]}

df = pd.DataFrame(data)

# calculate mean temperature for each city using pivot_table() mean_temperature = df.pivot_table(index='City', values='Temperature', aggfunc='mean')
print(mean_temperature)

Output

Temperature
City                    
Los Angeles    76.666667
New York       31.666667

In the above example, we calculated the mean temperature of each city using the aggfunc='mean' argument in pivot_table().


Example: pivot_table() With MultiIndex

We can create a pivot table with MultiIndex using the pivot_table() method.

import pandas as pd

# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles','Delhi', 'Chennai', 'Delhi', 'Chennai'],
        'Country': ['USA', 'USA', 'USA', 'USA', 'India', 'India', 'India', 'India'],
        'Temperature': [32, 75, 30, 77, 75, 80, 78, 79]}
df = pd.DataFrame(data)

# create a pivot table with multiindex pivot_df = df.pivot_table(index=['Country', 'City'], columns='Date', values='Temperature')
print(pivot_df)

Output

Date                 2023-01-01  2023-01-02
Country City                               
India   Chennai            80.0        79.0
        Delhi              75.0        78.0
USA     Los Angeles        75.0        77.0
        New York           32.0        30.0

In this example, we created a pivot table with a MultiIndex by passing a list of columns as an index argument.


Example: pivot_table() with fill_value

The fill_value argument replaces all the NaN values with a specified value.

import pandas as pd
import numpy as np

# create the DataFrame
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles'],
        'Temperature': [32, np.nan, 30, 77, np.nan, 76]}
df = pd.DataFrame(data)

# create a pivot table pivot_df = df.pivot_table(index='Date', columns='City', values='Temperature')
print("\nDefault Pivot Table\n", pivot_df) print()
# create a pivot table with fill_value=0 pivot_df_dropna = df.pivot_table(index='Date', columns='City', values='Temperature', fill_value=0)
print("\nPivot Table with fill_value=0:\n", pivot_df_dropna)

Output

Default Pivot Table
City        Los Angeles  New York
Date                             
2023-01-01          NaN      32.0
2023-01-02         77.0      30.0
2023-01-03         76.0       NaN

Pivot Table with fill_value=0:
City        Los Angeles  New York
Date                             
2023-01-01          0.0      32.0
2023-01-02         77.0      30.0
2023-01-03         76.0       0.0

In this example, we replaced the NaN values with 0 using the fill_value=0 argument.


Example: pivot_table() with dropna

The dropna argument specifies whether to remove the columns whose entries are all NaN. The default value of dropna is True.

import pandas as pd
import numpy as np

# create the DataFrame
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-03'],
        'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles', 'Chicago'],
        'Temperature': [32, 75, 30, 77, np.nan, 76, np.nan]}
df = pd.DataFrame(data)

# create a pivot table pivot_df = df.pivot_table(index='Date', columns='City', values='Temperature')
print("\nDefault Pivot Table\n", pivot_df) print()
# create a pivot table with dropna=True pivot_df_dropna = df.pivot_table(index='Date', columns='City', values='Temperature', dropna=False)
print("\nPivot Table with dropna=False:\n", pivot_df_dropna)

Output

Default Pivot Table
 City        Los Angeles  New York
Date                             
2023-01-01         75.0      32.0
2023-01-02         77.0      30.0
2023-01-03         76.0       NaN

Pivot Table with dropna=False:
 City        Chicago  Los Angeles  New York
Date                                      
2023-01-01      NaN         75.0      32.0
2023-01-02      NaN         77.0      30.0
2023-01-03      NaN         76.0       NaN

In this example, we used the dropna argument to determine the handling of columns with entirely NaN entries. By default, the dropna parameter is set to True, resulting in the automatic removal of the Chicago column.

Notice that the New York column is not dropped despite having one NaN value. This is because dropna removes the columns whose entries are all NaN.