Pandas sum()

The sum() method in Pandas is used to calculate the sum of a DataFrame along a specific axis.

Example

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# calculate the sum of each column column_sum = df.sum()
print(column_sum) ''' Output A 6 B 15 dtype: int64 '''

sum() Syntax

The syntax of the sum() method in Pandas is:

df.sum(axis=None, skipna=True, numeric_only=None, min_count=0)

sum() Arguments

The sum() method takes following arguments:

  • axis (optional) - specifies axis along which the sum will be computed
  • skipna (optional) - determines whether to include or exclude missing values
  • numeric_only (optional) - specifies whether to include only numeric columns in the computation or not
  • min_count (optional) - required number of valid values to perform the operation

sum() Return Value

The sum() method returns the sum of the values along the specified axis.


Example 1: Compute Sum Along Different Axis

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# calculate the sum of each column column_sum = df.sum()
# calculate the sum of each row row_sum = df.sum(axis=1)
print("Sum of each column:") print(column_sum) print("\nSum of each row:") print(row_sum)

Output

Sum of each column:
A     6
B    15
C    24
dtype: int64

Sum of each row:
0    12
1    15
2    18
dtype: int64

In the above example,

  1. column_sum = df.sum() - calculates the sum of values in each column of the df DataFrame. Default axis=0 means it operates column-wise.
  2. row_sum = df.sum(axis=1) - calculates the sum of values in each row of df by setting axis=1, meaning it operates row-wise.

Note: We can also pass axis=0 inside sum() to compute the sum of each column.


Example 2: Calculate Sum of a Specific Column

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# calculate the sum of column 'A' sum_A = df['A'].sum()
# calculate the sum of column 'B' sum_B = df['B'].sum()
print("sum of column A:", sum_A) print("sum of column B:", sum_B)

Output

sum of column A: 6
sum of column B: 15

In this example, df['A'] selects column A of the df DataFrame, and sum() calculates the sum of its values. The same is done for column B.


Example 3: Use of numeric_only Argument in sum()

import pandas as pd

# create a DataFrame with both numeric and non-numeric columns
data = {
    'A': [10, 20, 30, 40],
    'B': [5, 3, 2, 1],
    'C': ['a', 'b', 'c', 'd'],
    'D': [1.5, 2.5, 3.5, 4.5]
}

df = pd.DataFrame(data)

# sum only the numeric columns summed = df.sum(numeric_only=True)
print(summed)

Output

A    100.0
B     11.0
D     12.0
dtype: float64

Here, when using numeric_only=True, the sum is calculated only for columns A, B, and D and column C is excluded because it contains string data.

If we hadn't specified any value for numeric_only as

summed_all = df.sum()

The output would be:

A     100
B      11
C    abcd
D    12.0
dtype: object

Example 4: Effect of skipna Argument on Calculating sum

import pandas as pd

# create a DataFrame with NaN values
df = pd.DataFrame({
    'A': [1, None, 3],
    'B': [4, 5, None],
    'C': [7, 8, 9]
})

# calculate the sum of each column, ignoring NaN values sum_skipna_true = df.sum()
# calculate the sum of each column, including NaN values sum_skipna_false = df.sum(skipna=False)
print("sum with skipna=True (default):") print(sum_skipna_true) print("\nsum with skipna=False:") print(sum_skipna_false)

Output

sum with skipna=True (default):
A     4.0
B     9.0
C    24.0
dtype: float64

sum with skipna=False:
A     NaN
B     NaN
C    24.0
dtype: float64

In this example,

  • With skipna=True - sums of columns A, B, and C are 4.0, 9.0, and 24.0, respectively, ignoring None values.
  • With skipna=False - sums of columns A and B are NaN due to None values, while C is 24.0.

Example 5: Calculate sums With Minimum Value Counts

import pandas as pd

# create a DataFrame with some missing values
df = pd.DataFrame({
    'A': [1, None, 3],
    'B': [4, 5, None],
    'C': [None, None, 9]
})

# calculate the sum of each column with min_count set to 1 sum_min_count_1 = df.sum(min_count=1)
# calculate the sum of each column with min_count set to 2 sum_min_count_2 = df.sum(min_count=2)
# calculate the sum of each column with min_count set to 3 sum_min_count_3 = df.sum(min_count=3)
print("sum with min_count=1:\n", sum_min_count_1) print("\nsum with min_count=2:\n", sum_min_count_2) print("\nsum with min_count=3:\n", sum_min_count_3)

Output

sum with min_count=1:
A    4.0
B    9.0
C    9.0
dtype: float64

sum with min_count=2:
A    4.0
B    9.0
C    NaN
dtype: float64

sum with min_count=3:
A   NaN
B   NaN
C   NaN
dtype: float64

Here,

  • When min_count=1, the sum will be calculated if there is at least one non-missing value in the column. Here, all columns meet this criterion.
  • When min_count=2, the sum will be calculated if there are at least two non-missing values in the column.
  • When min_count=3, the sum will be calculated if there are at least three non-NA values in the column. None of the columns meets this criterion, so all results should be NaN.