Pandas where()

The where() method in Pandas is used to replace values in a DataFrame based on a condition.

Example

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [2, 3, 4, 5]
})

# use where() to replace values less than 3 with 0 df_modified = df.where(df >= 3, other=0)
print(df_modified) ''' Output A B 0 0 0 1 0 3 2 3 4 3 4 5 '''

where() Syntax

The syntax of the where() method in Pandas is:

df.where(cond, other=NaN, inplace=False, axis=None, level=None)

where() Arguments

The where() method takes following arguments:

  • cond - the condition we want to check for.
  • other (optional) - the value to replace with where the condition is False. By default, it is NaN.
  • inplace (optional) - if True, it will modify the DataFrame in place. By default, it's False, which means it will return a new DataFrame.
  • axis (optional) - specifies whether to apply the condition along rows or columns.
  • level (optional) - alignment level if other is a Series or DataFrame.

where() Return Value

The where() method returns a new DataFrame with the original data where the condition is True and the specified replacement value where the condition is False.


Example 1: Use where() to Conditionally Replace Values

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
})

# replace values in column 'A' # where the condition is False (if the values are not equal to 2) result = df['A'].where(df['A'] == 2, other=-1)
print(result)

Output

0   -1
1    2
2   -1
Name: A, dtype: int64

In this example, we are using the where() method to replace values in the A column.

So only the value in column A that equals 2 remains unchanged, while all other values in the same column are replaced with -1.

If we don't use the other argument as

# without other argument
result = df['A'].where(df['A'] == 2)

All the values in result that do not meet the condition (df['A'] == 2) will be replaced with NaN by default.

Hence, the output will be

0    NaN
1    2.0
2    NaN
Name: A, dtype: float64

Example 2: Use of axis Argument in where()

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# define a condition
condition = df > 2

# create a Series for replacing
replacement_series = pd.Series([-1, -2, -3])

# replace values in columns using corresponding values from the Series result_axis_0 = df.where(condition, other=replacement_series, axis=0)
# replace values in rows using corresponding values from the Series result_axis_1 = df.where(condition, other=replacement_series, axis=1)
print("Replacement with axis=0:") print(result_axis_0) print("\nReplacement with axis=1:") print(result_axis_1)

Output

Replacement with axis=0:
   A  B  C
0 -1  4  7
1 -2  5  8
2  3  6  9

Replacement with axis=1:
   A    B  C
0  NaN  4  7
1  NaN  5  8
2  3.0  6  9

Here,

  1. With axis=0, -1 replaces values in column A that are not > 2, -2 replaces values in column B, and -3 replaces values in column C.
  2. With axis=1, replacements are made row-wise and since your other series do not cover all columns, we get NaN for the columns without a corresponding replacement value.

Example 3: Use of level argument in where()

import pandas as pd

# create a DataFrame with a MultiIndex
index = pd.MultiIndex.from_product([['A', 'B'], [1, 2]], names=['Upper', 'Lower'])
df = pd.DataFrame({'Value': [10, 1, 20, 2]}, index=index)

# define condition to keep the numbers 
# that are not 1 at the 'Upper' level 'A'
condition = df['Value'] != 1

# apply where condition at level 'Upper' level_example = df.where(condition, other=-1, level='Upper')
print(level_example)

Output

                     Value
Upper Lower      
A        1           10
         2           -1
B        1           20
         2            2

In the above example, the where() method replaces values with -1 in the DataFrame where the condition df['Value'] != 1 is False.

The condition is checked across all levels of the MultiIndex, unaffected by the level argument which only aligns the other value for the replacement.

Thus, all occurrences of 1 in the DataFrame are replaced by -1.