The pivot_table()
function in Pandas allows us to create a spreadsheet-style pivot table making it easier to group and analyze our data.
Example
import pandas as pd
# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
'Temperature': [32, 75, 30, 77]}
df = pd.DataFrame(data)
# pivot the dataframe
pivot_df = df.pivot_table(index='Date', columns='City', values='Temperature')
print(pivot_df)
'''
Output
Temperature
City Los Angeles New York
Date
2023-01-01 75 32
2023-01-02 77 30
'''
In this example, we used the pivot_table()
method to create a pivot table where Date
becomes the index, City
becomes the columns, and Temperature
becomes the values.
pivot_table() Syntax
The syntax of the pivot_table()
method in Pandas is:
df.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')
pivot_table() Arguments
The pivot_table()
method has the following arguments:
values
(optional): the column to aggregateindex
(optional): the key or keys to group by on the pivot table indexcolumns
(optional): the key or keys to group by on the pivot table columnsaggfunc
(optional): the aggregation function or list of functions to be usedfill_value
(optional): value to replace missing values with after pivotmargins
(optional): whether to add all rows/columnsdropna
(optional): if set toFalse
, do not include columns whose entries are allNaN
margins_name
(optional): the name to use for the row/column that contains totals when margins isTrue
pivot_table() Return Value
The pivot_table()
method returns a reshaped DataFrame organized by the given index/column values.
Example: pivot_table() with Multiple Values
If we omit the values
argument in pivot_table()
, it selects all the remaining columns (besides the ones specified index
and columns
) as values for the pivot table.
import pandas as pd
# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],
'Temperature': [32, 75, 30, 77],
'Humidity': [80, 10, 85, 5]}
df = pd.DataFrame(data)
# pivot the dataframe
pivot_df = df.pivot_table(index='Date', columns='City')
print(pivot_df)
Output
Humidity Temperature City Los Angeles New York Los Angeles New York Date 2023-01-01 10.0 80.0 75.0 32.0 2023-01-02 5.0 85.0 77.0 30.0
In this example, we created a pivot table for multiple values i.e. Temperature
and Humidity
.
Example: pivot_table() With Aggregate Functions
We can use the pivot_table()
method with different aggregate functions using the aggfunc
parameter. We can set the value of aggfunc
to functions such as 'sum'
, 'mean'
, 'count'
, 'max'
or 'min'
.
import pandas as pd
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles'],
'Temperature': [32, 75, 30, 77, 33, 78],
'Humidity': [80, 10, 85, 5, 81, 7]}
df = pd.DataFrame(data)
# calculate mean temperature for each city using pivot_table()
mean_temperature = df.pivot_table(index='City', values='Temperature', aggfunc='mean')
print(mean_temperature)
Output
Temperature City Los Angeles 76.666667 New York 31.666667
In the above example, we calculated the mean temperature of each city using the aggfunc='mean'
argument in pivot_table()
.
Example: pivot_table() With MultiIndex
We can create a pivot table with MultiIndex
using the pivot_table()
method.
import pandas as pd
# create a dataframe
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles','Delhi', 'Chennai', 'Delhi', 'Chennai'],
'Country': ['USA', 'USA', 'USA', 'USA', 'India', 'India', 'India', 'India'],
'Temperature': [32, 75, 30, 77, 75, 80, 78, 79]}
df = pd.DataFrame(data)
# create a pivot table with multiindex
pivot_df = df.pivot_table(index=['Country', 'City'], columns='Date', values='Temperature')
print(pivot_df)
Output
Date 2023-01-01 2023-01-02 Country City India Chennai 80.0 79.0 Delhi 75.0 78.0 USA Los Angeles 75.0 77.0 New York 32.0 30.0
In this example, we created a pivot table with a MultiIndex
by passing a list of columns as an index
argument.
Example: pivot_table() with fill_value
The fill_value
argument replaces all the NaN
values with a specified value.
import pandas as pd
import numpy as np
# create the DataFrame
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles'],
'Temperature': [32, np.nan, 30, 77, np.nan, 76]}
df = pd.DataFrame(data)
# create a pivot table
pivot_df = df.pivot_table(index='Date', columns='City', values='Temperature')
print("\nDefault Pivot Table\n", pivot_df)
print()
# create a pivot table with fill_value=0
pivot_df_dropna = df.pivot_table(index='Date', columns='City', values='Temperature', fill_value=0)
print("\nPivot Table with fill_value=0:\n", pivot_df_dropna)
Output
Default Pivot Table City Los Angeles New York Date 2023-01-01 NaN 32.0 2023-01-02 77.0 30.0 2023-01-03 76.0 NaN Pivot Table with fill_value=0: City Los Angeles New York Date 2023-01-01 0.0 32.0 2023-01-02 77.0 30.0 2023-01-03 76.0 0.0
In this example, we replaced the NaN
values with 0 using the fill_value=0
argument.
Example: pivot_table() with dropna
The dropna
argument specifies whether to remove the columns whose entries are all NaN
. The default value of dropna
is True
.
import pandas as pd
import numpy as np
# create the DataFrame
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-03'],
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles', 'Chicago'],
'Temperature': [32, 75, 30, 77, np.nan, 76, np.nan]}
df = pd.DataFrame(data)
# create a pivot table
pivot_df = df.pivot_table(index='Date', columns='City', values='Temperature')
print("\nDefault Pivot Table\n", pivot_df)
print()
# create a pivot table with dropna=True
pivot_df_dropna = df.pivot_table(index='Date', columns='City', values='Temperature', dropna=False)
print("\nPivot Table with dropna=False:\n", pivot_df_dropna)
Output
Default Pivot Table City Los Angeles New York Date 2023-01-01 75.0 32.0 2023-01-02 77.0 30.0 2023-01-03 76.0 NaN Pivot Table with dropna=False: City Chicago Los Angeles New York Date 2023-01-01 NaN 75.0 32.0 2023-01-02 NaN 77.0 30.0 2023-01-03 NaN 76.0 NaN
In this example, we used the dropna
argument to determine the handling of columns with entirely NaN
entries. By default, the dropna
parameter is set to True
, resulting in the automatic removal of the Chicago
column.
Notice that the New York
column is not dropped despite having one NaN
value. This is because dropna
removes the columns whose entries are all NaN
.