Pandas dropna()

The dropna() method in Pandas is used to drop missing (NaN) values from a DataFrame.

Example

import pandas as pd

# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, None, 5]}

df = pd.DataFrame(data)

# drop missing values df_dropped = df.dropna()
print(df_dropped) ''' Output A B 1 2.0 2.0 4 5.0 5.0 '''

dropna() Syntax

The syntax of the dropna() method in Pandas is:

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

dropna() Arguments

The dropna() method takes following arguments:

  • axis (optional) - specifies whether to drop rows or columns
  • how (optional) - determines the condition for dropping
  • thresh (optional) - specifies a minimum number of non-null values required to keep the row/column
  • subset (optional) - allows us to specify a subset of columns to consider when dropping rows with missing values
  • inplace (optional) - If True, modifies the original DataFrame in place; if False, returns a new DataFrame.

dropna() Return Value

The dropna() method returns a new DataFrame with missing values dropped according to the specified parameters.


Example1: Drop Missing Values

import pandas as pd

# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
        'B': [None, 2, 13, None, 65]}

df = pd.DataFrame(data)

# drop missing values df_dropped = df.dropna()
print(df_dropped)

Output

      A     B
1  20.0   2.0
4  55.0  65.0

In the above example, we have used the dropna() method to remove rows containing missing values from the df DataFrame and store the result in the df_dropped DataFrame.

The df_dropped contains only the rows from df that don't have any missing values.


Example 2: Use axis Argument to Drop Rows and Columns Containing Missing Values

import pandas as pd

# create a DataFrame with missing values
data = {'A': [1, 2, None, 4],
        'B': [None, 2, 3, 4],
        'C': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# original DataFrame remains unchanged
print("Original DataFrame:")
print(df)
print()

# drop rows with any missing values and create a new DataFrame df_rows_dropped = df.dropna(axis=0, inplace=False)
print("DataFrame with rows dropped:") print(df_rows_dropped) print()
# drop columns with any missing values and create a new DataFrame df_columns_dropped = df.dropna(axis=1, inplace=False)
print("DataFrame with columns dropped:") print(df_columns_dropped)

Output

Original DataFrame:
    A     B    C
0  1.0   NaN   1
1  2.0    2.0  2
2  NaN  3.0    3
3  4.0    4.0  4

DataFrame with rows dropped:
    A    B   C
1  2.0  2.0  2
3  4.0  4.0  4

DataFrame with columns dropped:
   C
0  1
1  2
2  3
3  4

Here,

  • Rows with any missing values are dropped using axis=0, and the result is stored in df_rows_dropped.
  • Columns with any missing values are dropped using axis=1, and the result is stored in df_columns_dropped.

Also, the use of inplace=False argument ensures that the original DataFrame remains unchanged and the results are stored in new DataFrames.


Example 3: Determine Condition for Dropping

import pandas as pd

data = {'A': [1, 2, None, 4],
        'B': [None, 2, None, 4]}

df = pd.DataFrame(data)

# drop rows with any missing values result_any = df.dropna(how='any')
print("Using how='any':") print(result_any) print()
# drop rows with all missing values result_all = df.dropna(how='all')
print("\nUsing how='all':") print(result_all)

Output

Using how='any':
     A    B
1  2.0  2.0
3  4.0  4.0

Using how='all':
     A    B
0  1.0  NaN
1  2.0  2.0
3  4.0  4.0

Here, when

  • how='any' (default) - rows containing any missing values are dropped, leaving only the rows where both columns 'A' and 'B' have non-null values.
  • how='all' - rows containing all missing values are removed, and only rows with at least one non-null value in any column are kept.

Example 4: Drop Rows Based on Threshold

import pandas as pd

# creating a DataFrame with some NaN values
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, None]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# use dropna() with thresh parameter # Keeping only those rows which have at least 3 non-NaN values cleaned_df_rows = df.dropna(thresh=3)
print("\nDataFrame after dropping rows with less than 3 non-NaN values:") print(cleaned_df_rows)

Output

Original DataFrame:
    A    B    C   D
0  1.0  5.0   9  13.0
1  2.0  NaN  10  14.0
2  NaN  NaN  11  15.0
3  4.0  8.0  12   NaN

DataFrame after dropping rows with less than 3 non-NaN values:
    A    B    C   D
0  1.0  5.0   9  13.0
1  2.0  NaN  10  14.0
3  4.0  8.0  12   NaN

In the above example, we have used the dropna(thresh=3) method to remove rows which do not have at least 3 non-NaN values.

Hence, row at index 2 is removed.


Example 5: Selectively Remove Rows Containing Missing Data

import pandas as pd

# creating a DataFrame with some NaN values
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, None]
}
df = pd.DataFrame(data)

# use dropna() with subset parameter # drop the rows where NaN appears in column 'B' or 'D' cleaned_df = df.dropna(subset=['B', 'D'])
print("Original DataFrame:") print(df) print("\nDataFrame after dropping rows with NaN in columns 'B' or 'D':") print(cleaned_df)

Output

DataFrame dropped forward:
Original DataFrame:
    A     B     C    D
0  1.0   5.0   9   13.0
1  2.0   NaN    10  14.0
2  NaN   NaN    11  15.0
3  4.0   8.0    12   NaN

DataFrame after dropping rows with NaN in columns 'B' or 'D':
    A    B   C   D
0  1.0  5.0  9  13.0

Here, when we apply dropna(subset=['B', 'D']), it checks only columns B and D for missing values.

If any missing value is found in these columns, the corresponding row is removed.