The mean()
method in Pandas is used to compute the arithmetic mean of a set of numbers.
Example
import pandas as pd
# sample DataFrame
data = {
'Math': [85, 90, 78],
'Physics': [92, 88, 84]
}
df = pd.DataFrame(data)
# compute the mean for each subject (column)
mean_scores = df.mean()
print(mean_scores)
'''
Output
Math 84.333333
Physics 88.000000
dtype: float64
'''
mean() Syntax
The syntax of the mean()
method in Pandas is:
df.mean(axis=0, skipna=True, level=None, numeric_only=None)
mean() Arguments
The mean()
method takes following arguments:
axis
(optional) - specifies axis along which the mean will be computedskipna
(optional) - determines whether to include or exclude missing valueslevel
(optional) - compute the mean at a particular levelnumeric_only
(optional) - specifies whether to include only numeric columns in the computation or not.
mean() Return Value
The mean()
method returns a series object that represents the average value for each column or each row.
Example 1: Compute mean() Along Different Axis
import pandas as pd
# sample DataFrame
data = {
'Math': [85, 90, 78],
'Physics': [92, 88, 84]
}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print("DataFrame:")
print(df)
print()
# compute the mean for each subject (column)
mean_scores = df.mean()
print("\nMean scores for each subject:")
print(mean_scores)
print()
# compute the mean score for each student (row)
mean_scores_by_student = df.mean(axis=1)
print("\nMean scores for each student:")
print(mean_scores_by_student)
Output
DataFrame: Math Physics Alice 85 92 Bob 90 88 Charlie 78 84 Mean scores for each subject: Math 84.333333 Physics 88.000000 dtype: float64 Mean scores for each student: Alice 88.5 Bob 89.0 Charlie 81.0 dtype: float64
In the above example,
- The
mean()
method without any arguments computes the mean for each column (i.e., the average score for each subject). - The
mean(axis=1)
computes the mean across each row (i.e., the average score for each student).
Note: We can also pass axis=0
inside mean()
to compute the mean of each column.
Example 2: Calculate Mean of a Specific Column
import pandas as pd
# sample DataFrame
df = pd.DataFrame({
'A': [10, 20, 30, 40],
'B': [5, 15, 25, 35],
'C': [1, 2, 3, 4]
})
# calculate the mean of column 'A'
mean_A = df['A'].mean()
print(f"Mean of column 'A': {mean_A}")
Output
Mean of column 'A': 25.0
In this example, we've created the df DataFrame with three columns: A
, B
, and C
.
Then, we used df['A'].mean()
to compute the average of the values in column A
, which resulted in a mean of 25.0.
Example 3: Use of numeric_only Argument in mean()
import pandas as pd
# sample DataFrame with a mix of numeric and non-numeric columns
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000],
'Department': ['HR', 'IT', 'Finance', 'Admin']
})
# compute mean with numeric_only set to True
mean_values_numeric = df.mean(numeric_only=True)
print("Mean with numeric_only=True:")
print(mean_values_numeric)
print()
# try to compute mean with numeric_only set to False
try:
mean_values_all = df.mean(numeric_only=False)
print("Mean with numeric_only=False:")
print(mean_values_all)
except TypeError as e:
print(f"Error: {e}")
Output
Mean with numeric_only=True: Age 32.5 Salary 65000.0 dtype: float64 ERROR! Error: Could not convert ['AliceBobCharlieDavid' 'HRITFinanceAdmin'] to numeric
Here,
- When
numeric_only=True
,mean()
only computes the mean for the numeric columns, ignoring the non-numeric ones. - When
numeric_only=False
, it attempts to compute the mean for all columns, including non-numeric ones. This raises aTypeError
because it's not possible to compute the mean of non-numeric data in this context.
Note: To learn more about exception handling, please visit Python Exception Handling.
Example 4: Effect of the skipna Argument on Calculating Averages
import pandas as pd
# sample DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, 3, None],
'B': [4, 5, None, 8],
'C': [7, 10, 13, 19],
'D': [None, 10, 11, 12]
})
# compute mean with skipna set to True (default behavior)
mean_values_skipna_true = df.mean(skipna=True)
print("Mean with skipna=True:")
print(mean_values_skipna_true)
print()
# compute mean with skipna set to False
mean_values_skipna_false = df.mean(skipna=False)
print("Mean with skipna=False:")
print(mean_values_skipna_false)
Output
Mean with skipna=True: A 2.000000 B 5.666667 C 12.250000 D 11.000000 dtype: float64 Mean with skipna=False: A NaN B NaN C 12.25 D NaN dtype: float64
In this example,
- With
skipna=True
, columnsA
andB
averages are computed without considering the missing values, while columnC
has noNone
and columnD
average is computed considering the three valid numbers. - With
skipna=False
, columnsA
,B
, andD
containNone
, so their means areNaN
, while columnC
has noNone
, so its average is calculated.