Pandas mean()

The mean() method in Pandas is used to compute the arithmetic mean of a set of numbers.

Example

import pandas as pd

# sample DataFrame
data = {
    'Math': [85, 90, 78],
    'Physics': [92, 88, 84]
}

df = pd.DataFrame(data)

# compute the mean for each subject (column) mean_scores = df.mean()
print(mean_scores) ''' Output Math 84.333333 Physics 88.000000 dtype: float64 '''

mean() Syntax

The syntax of the mean() method in Pandas is:

df.mean(axis=0, skipna=True, level=None, numeric_only=None)

mean() Arguments

The mean() method takes following arguments:

  • axis (optional) - specifies axis along which the mean will be computed
  • skipna (optional) - determines whether to include or exclude missing values
  • level (optional) - compute the mean at a particular level
  • numeric_only (optional) - specifies whether to include only numeric columns in the computation or not.

mean() Return Value

The mean() method returns a series object that represents the average value for each column or each row.


Example 1: Compute mean() Along Different Axis

import pandas as pd

# sample DataFrame
data = {
    'Math': [85, 90, 78],
    'Physics': [92, 88, 84]
}

df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])

print("DataFrame:")
print(df)
print()

# compute the mean for each subject (column) mean_scores = df.mean() print("\nMean scores for each subject:")
print(mean_scores) print()
# compute the mean score for each student (row) mean_scores_by_student = df.mean(axis=1)
print("\nMean scores for each student:") print(mean_scores_by_student)

Output

DataFrame:
         Math  Physics
Alice      85       92
Bob        90       88
Charlie    78       84

Mean scores for each subject:
Math       84.333333
Physics    88.000000
dtype: float64

Mean scores for each student:
Alice      88.5
Bob        89.0
Charlie    81.0
dtype: float64

In the above example,

  1. The mean() method without any arguments computes the mean for each column (i.e., the average score for each subject).
  2. The mean(axis=1) computes the mean across each row (i.e., the average score for each student).

Note: We can also pass axis=0 inside mean() to compute the mean of each column.


Example 2: Calculate Mean of a Specific Column

import pandas as pd

# sample DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [5, 15, 25, 35],
    'C': [1, 2, 3, 4]
})

# calculate the mean of column 'A' mean_A = df['A'].mean()
print(f"Mean of column 'A': {mean_A}")

Output

Mean of column 'A': 25.0

In this example, we've created the df DataFrame with three columns: A, B, and C.

Then, we used df['A'].mean() to compute the average of the values in column A, which resulted in a mean of 25.0.


Example 3: Use of numeric_only Argument in mean()

import pandas as pd

# sample DataFrame with a mix of numeric and non-numeric columns
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000],
    'Department': ['HR', 'IT', 'Finance', 'Admin']
})

# compute mean with numeric_only set to True mean_values_numeric = df.mean(numeric_only=True) print("Mean with numeric_only=True:")
print(mean_values_numeric) print()
# try to compute mean with numeric_only set to False try: mean_values_all = df.mean(numeric_only=False)
print("Mean with numeric_only=False:") print(mean_values_all) except TypeError as e: print(f"Error: {e}")

Output

Mean with numeric_only=True:
Age          32.5
Salary    65000.0
dtype: float64

ERROR!
Error: Could not convert ['AliceBobCharlieDavid' 'HRITFinanceAdmin'] to numeric

Here,

  • When numeric_only=True, mean() only computes the mean for the numeric columns, ignoring the non-numeric ones.
  • When numeric_only=False, it attempts to compute the mean for all columns, including non-numeric ones. This raises a TypeError because it's not possible to compute the mean of non-numeric data in this context.

Note: To learn more about exception handling, please visit Python Exception Handling.


Example 4: Effect of the skipna Argument on Calculating Averages

import pandas as pd

# sample DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, 3, None],
    'B': [4, 5, None, 8],
    'C': [7, 10, 13, 19],
    'D': [None, 10, 11, 12]
})

# compute mean with skipna set to True (default behavior) mean_values_skipna_true = df.mean(skipna=True)
print("Mean with skipna=True:") print(mean_values_skipna_true) print()
# compute mean with skipna set to False mean_values_skipna_false = df.mean(skipna=False)
print("Mean with skipna=False:") print(mean_values_skipna_false)

Output

Mean with skipna=True:
A     2.000000
B     5.666667
C    12.250000
D    11.000000
dtype: float64

Mean with skipna=False:
A      NaN
B      NaN
C    12.25
D      NaN
dtype: float64

In this example,

  • With skipna=True, columns A and B averages are computed without considering the missing values, while column C has no None and column D average is computed considering the three valid numbers.
  • With skipna=False, columns A, B, and D contain None, so their means are NaN, while column C has no None, so its average is calculated.