The hist()
method in Pandas is used for plotting histograms to visually summarize the distribution of a dataset. A histogram represents the frequency distribution of numerical data by dividing the data range into bins and showing how many values fall into each bin.
This function calls matplotlib.pyplot.hist()
, on each series in the DataFrame, resulting in one histogram per column.
Example
import pandas as pd
import matplotlib.pyplot as plt
# sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [6, 7, 8, 9, 10]}
df = pd.DataFrame(data)
# plot histogram for all columns
hist_plot = df.hist()
plt.show()
hist() Syntax
The syntax of the hist()
method in Pandas is:
df.hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwargs)
hist() Arguments
The hist()
method has the following arguments:
column
(optional): specifies which columns to plotby
(optional): allows grouping by the specified columngrid
(optional): adds a grid to the histogramxlabelsize
andylabelsize
(optional): control the font size of the x-axis and y-axis labels, respectivelyxrot
andyrot
(optional): rotation of x-axis and y-axis labelsax
(optional): matplotlib axes object where the histogram is plottedsharex
andsharey
(optional): control sharing of properties among x (sharex
) or y (sharey
) axesfigsize
(optional): a tuple to control the figure sizelayout
(optional): controls the layout of the histogramsbins
(optional): specifies the number of bins or the specific bin edges**kwargs
(optional): additional keyword arguments
hist() Return Value
The hist()
method returns a matplotlib Axes
object or a numpy array of them.
Example 1: Basic Histogram
import pandas as pd
import matplotlib.pyplot as plt
data = {'A': [12, 13, 14, 27, 29, 41, 43, 45],
'B': [20, 35, 30, 35, 27, 28, 32, 44]}
df = pd.DataFrame(data)
# plot a basic histogram of one column
hist_plot = df['A'].hist(bins=5)
plt.show()
Output
In this example, we displayed a histogram for column A
with 5 bins.
Here, the minimum value is 12 and the maximum value is 45, the width of a bin is:
(45-12)/5 = 6.6
So the bin ranges are:
Bin1
: 12 to 18.6Bin2
: 18.6 to 25.2Bin3
: 25.2 to 31.8Bin4
: 31.8 to 38.4Bin5
: 38.4 to 45
Example 2: Customize a Histogram
import pandas as pd
import matplotlib.pyplot as plt
data = {'A': [12, 13, 14, 27, 29, 41, 43, 45],
'B': [20, 35, 30, 35, 27, 28, 32, 44]}
df = pd.DataFrame(data)
# plot histogram with additional customizations
hist_plot = df.hist(bins=3, grid=False, figsize=(8,6), color='#86bf91', zorder=2, rwidth=0.9)
plt.show()
Output
In this example, we customized the histogram in many ways. We changed the number of bins to 3, turned off the grid for a cleaner look, chose a specific color for the bars, and adjusted the size of the figure to make it larger.
Here,
bins=3
: sets the number of bins to 3grid=False
: turns off the grid linesfigsize=(8,6)
: adjusts the figure size to 8x6color='#86bf91'
: sets the color according to the hex coderwidth=0.9
: sets the relative bar widthzorder=2
: controls the order of drawing
Example 3: Group Histograms by a Column
import pandas as pd
import matplotlib.pyplot as plt
data = {'Scores': [90, 85, 92, 88, 91],
'Class': ['A', 'B', 'A', 'B', 'B']}
df = pd.DataFrame(data)
# plot a histogram of scores grouped by class
hist_plot = df.hist(column='Scores', by='Class')
plt.show()
Output
In this example, we created histograms for the Scores
column and grouped the data by the Class
category. This generated a separate histogram for each class.