Pandas hist()

The hist() method in Pandas is used for plotting histograms to visually summarize the distribution of a dataset. A histogram represents the frequency distribution of numerical data by dividing the data range into bins and showing how many values fall into each bin.

This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

Example

import pandas as pd
import matplotlib.pyplot as plt

# sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [6, 7, 8, 9, 10]}

df = pd.DataFrame(data)

# plot histogram for all columns
hist_plot = df.hist()
plt.show()

hist() Syntax

The syntax of the hist() method in Pandas is:

df.hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwargs)

hist() Arguments

The hist() method has the following arguments:

  • column (optional): specifies which columns to plot
  • by (optional): allows grouping by the specified column
  • grid (optional): adds a grid to the histogram
  • xlabelsize and ylabelsize (optional): control the font size of the x-axis and y-axis labels, respectively
  • xrot and yrot (optional): rotation of x-axis and y-axis labels
  • ax (optional): matplotlib axes object where the histogram is plotted
  • sharex and sharey (optional): control sharing of properties among x (sharex) or y (sharey) axes
  • figsize (optional): a tuple to control the figure size
  • layout (optional): controls the layout of the histograms
  • bins (optional): specifies the number of bins or the specific bin edges
  • **kwargs (optional): additional keyword arguments

hist() Return Value

The hist() method returns a matplotlib Axes object or a numpy array of them.


Example 1: Basic Histogram

import pandas as pd
import matplotlib.pyplot as plt

data = {'A': [12, 13, 14, 27, 29, 41, 43, 45],
        'B': [20, 35, 30, 35, 27, 28, 32, 44]}
df = pd.DataFrame(data)

# plot a basic histogram of one column
hist_plot = df['A'].hist(bins=5)
plt.show()

Output

Basic Histogram
Basic Histogram

In this example, we displayed a histogram for column A with 5 bins.

Here, the minimum value is 12 and the maximum value is 45, the width of a bin is:

(45-12)/5 = 6.6

So the bin ranges are:

  • Bin1: 12 to 18.6
  • Bin2: 18.6 to 25.2
  • Bin3: 25.2 to 31.8
  • Bin4: 31.8 to 38.4
  • Bin5: 38.4 to 45

Example 2: Customize a Histogram

import pandas as pd
import matplotlib.pyplot as plt

data = {'A': [12, 13, 14, 27, 29, 41, 43, 45],
        'B': [20, 35, 30, 35, 27, 28, 32, 44]}
df = pd.DataFrame(data)

# plot histogram with additional customizations
hist_plot = df.hist(bins=3, grid=False, figsize=(8,6), color='#86bf91', zorder=2, rwidth=0.9)
plt.show()

Output

Customized Histogram
Customized Histogram

In this example, we customized the histogram in many ways. We changed the number of bins to 3, turned off the grid for a cleaner look, chose a specific color for the bars, and adjusted the size of the figure to make it larger.

Here,

  • bins=3: sets the number of bins to 3
  • grid=False: turns off the grid lines
  • figsize=(8,6): adjusts the figure size to 8x6
  • color='#86bf91': sets the color according to the hex code
  • rwidth=0.9: sets the relative bar width
  • zorder=2: controls the order of drawing

Example 3: Group Histograms by a Column

import pandas as pd
import matplotlib.pyplot as plt

data = {'Scores': [90, 85, 92, 88, 91],
        'Class': ['A', 'B', 'A', 'B', 'B']}
df = pd.DataFrame(data)

# plot a histogram of scores grouped by class
hist_plot = df.hist(column='Scores', by='Class')
plt.show()

Output

Group Histograms by a Column
Group Histograms by a Column

In this example, we created histograms for the Scores column and grouped the data by the Class category. This generated a separate histogram for each class.