Pandas var()

The var() method in Pandas computes the variance of a dataset. Variance is a measure of the dispersion of a set of data points around their mean value.

Example

import pandas as pd

# sample DataFrame
data = {'A': [1, 20, 333],
        'B': [4, 5, 7]}

df = pd.DataFrame(data)

# calculate the variance
variance = df.var()

print(variance)

'''
Output

A    34759.000000
B        2.333333
dtype: float64
'''

var() Syntax

The syntax of the var() method in Pandas is:

df.var(axis=0, skipna=True, ddof=1, numeric_only=None, **kwargs)

var() Arguments

The var() method includes the following arguments:

  • axis (optional): specifies the axis to compute the variance along
  • skipna (optional): whether to exclude null values when computing the result
  • ddof (optional): Delta Degrees of Freedom (The divisor used in calculations is N - ddof, where N represents the number of elements)
  • numeric_only (optional): whether to include only float, int, boolean columns
  • **kwargs: additional keyword arguments.

var() Return Value

The var() method returns:

  • a scalar for a Series
  • a Series or DataFrame (depending on the input) for a DataFrame

Example 1: Simple Variance Calculation

import pandas as pd

data = {'A': [2, 4, 6, 8, 10],
        'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)

# calculate the variance
variance = df.var()

print(variance)

Output

A    10.0
B     8.2
dtype: float64

In this example, we calculated the variance for each column. The output is a Series containing variance values for each column of the df DataFrame.


Example 2: Variance with Different ddof

import pandas as pd

data = {'A': [2, 4, 6, 8, 10],
        'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)

# calculate the variance with ddof=0
variance = df.var(ddof=0)

print(variance)

Output

A    8.00
B    6.56
dtype: float64

In this example, we calculated the variance with different Delta Degrees of Freedom (ddof=0).

In statistical calculations, ddof is a parameter that affects the divisor used in the calculation. For example,

  • when ddof=0, the divisor is N
  • when ddof=1, the divisor is N−1

where, N is the number of data points.


Example 3: Variance Excluding Null Values for Numeric Columns Only

import pandas as pd

data = {'A': [2, None, 6, 8, 10],
        'B': [1, 3, 5, None, 8],
        'C': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

# calculate the variance excluding NA values
# for numeric columns only
variance = df.var(skipna=True, numeric_only=True)

print(variance)

Output

A    11.666667
B     8.916667
dtype: float64

Here, we calculated the variance while excluding null values using the skipna=True argument.

We also excluded the non-numeric column C using numeric_only=True.


Example 4: Variance of Rows

import pandas as pd

data = {'A': [2, 4, 6, 8, 10],
        'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)

# calculate the variance with axis=1
variance = df.var(axis=1)

print(variance)

Output

0    0.5
1    0.5
2    0.5
3    0.5
4    2.0
dtype: float64

In this example, we calculated variance data along the rows using the axis=1 argument.