Pandas sum()

The sum() method in Pandas is used to calculate the sum of a DataFrame along a specific axis.

Example

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# calculate the sum of each column column_sum = df.sum()
print(column_sum) ''' Output A 6 B 15 dtype: int64 '''

sum() Syntax

The syntax of the sum() method in Pandas is:

df.sum(axis=None, skipna=True, numeric_only=None, min_count=0)

sum() Arguments

The sum() method takes following arguments:

  • axis (optional) - specifies axis along which the sum will be computed
  • skipna (optional) - determines whether to include or exclude missing values
  • numeric_only (optional) - specifies whether to include only numeric columns in the computation or not
  • min_count (optional) - required number of valid values to perform the operation

sum() Return Value

The sum() method returns the sum of the values along the specified axis.


Example 1: Compute Sum Along Different Axis

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# calculate the sum of each column column_sum = df.sum()
# calculate the sum of each row row_sum = df.sum(axis=1)
print("Sum of each column:") print(column_sum) print("\nSum of each row:") print(row_sum)

Output

Sum of each column:
A     6
B    15
C    24
dtype: int64

Sum of each row:
0    12
1    15
2    18
dtype: int64

In the above example,

  1. column_sum = df.sum() - calculates the sum of values in each column of the df DataFrame. Default axis=0 means it operates column-wise.
  2. row_sum = df.sum(axis=1) - calculates the sum of values in each row of df by setting axis=1, meaning it operates row-wise.

Note: We can also pass axis=0 inside sum() to compute the sum of each column.


Example 2: Calculate Sum of a Specific Column

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# calculate the sum of column 'A' sum_A = df['A'].sum()
# calculate the sum of column 'B' sum_B = df['B'].sum()
print("sum of column A:", sum_A) print("sum of column B:", sum_B)

Output

sum of column A: 6
sum of column B: 15

In this example, df['A'] selects column A of the df DataFrame, and sum() calculates the sum of its values. The same is done for column B.


Example 3: Use of numeric_only Argument in sum()

import pandas as pd

# create a DataFrame with both numeric and non-numeric columns
data = {
    'A': [10, 20, 30, 40],
    'B': [5, 3, 2, 1],
    'C': ['a', 'b', 'c', 'd'],
    'D': [1.5, 2.5, 3.5, 4.5]
}

df = pd.DataFrame(data)

# sum only the numeric columns summed = df.sum(numeric_only=True)
print(summed)

Output

A    100.0
B     11.0
D     12.0
dtype: float64

Here, when using numeric_only=True, the sum is calculated only for columns A, B, and D and column C is excluded because it contains string data.

If we hadn't specified any value for numeric_only as

summed_all = df.sum()

The output would be:

A     100
B      11
C    abcd
D    12.0
dtype: object

Example 4: Effect of skipna Argument on Calculating sum

import pandas as pd

# create a DataFrame with NaN values
df = pd.DataFrame({
    'A': [1, None, 3],
    'B': [4, 5, None],
    'C': [7, 8, 9]
})

# calculate the sum of each column, ignoring NaN values sum_skipna_true = df.sum()
# calculate the sum of each column, including NaN values sum_skipna_false = df.sum(skipna=False)
print("sum with skipna=True (default):") print(sum_skipna_true) print("\nsum with skipna=False:") print(sum_skipna_false)

Output

sum with skipna=True (default):
A     4.0
B     9.0
C    24.0
dtype: float64

sum with skipna=False:
A     NaN
B     NaN
C    24.0
dtype: float64

In this example,

  • With skipna=True - sums of columns A, B, and C are 4.0, 9.0, and 24.0, respectively, ignoring None values.
  • With skipna=False - sums of columns A and B are NaN due to None values, while C is 24.0.

Example 5: Calculate sums With Minimum Value Counts

import pandas as pd

# create a DataFrame with some missing values
df = pd.DataFrame({
    'A': [1, None, 3],
    'B': [4, 5, None],
    'C': [None, None, 9]
})

# calculate the sum of each column with min_count set to 1 sum_min_count_1 = df.sum(min_count=1)
# calculate the sum of each column with min_count set to 2 sum_min_count_2 = df.sum(min_count=2)
# calculate the sum of each column with min_count set to 3 sum_min_count_3 = df.sum(min_count=3)
print("sum with min_count=1:\n", sum_min_count_1) print("\nsum with min_count=2:\n", sum_min_count_2) print("\nsum with min_count=3:\n", sum_min_count_3)

Output

sum with min_count=1:
A    4.0
B    9.0
C    9.0
dtype: float64

sum with min_count=2:
A    4.0
B    9.0
C    NaN
dtype: float64

sum with min_count=3:
A   NaN
B   NaN
C   NaN
dtype: float64

Here,

  • When min_count=1, the sum will be calculated if there is at least one non-missing value in the column. Here, all columns meet this criterion.
  • When min_count=2, the sum will be calculated if there are at least two non-missing values in the column.
  • When min_count=3, the sum will be calculated if there are at least three non-NA values in the column. None of the columns meets this criterion, so all results should be NaN.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO
  • Real-World Projects
  • On-Demand Learning
  • AI Mentor
  • Builder Community