The dropna()
method in Pandas is used to drop missing (NaN) values from a DataFrame.
Example
import pandas as pd
# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# drop missing values
df_dropped = df.dropna()
print(df_dropped)
'''
Output
A B
1 2.0 2.0
4 5.0 5.0
'''
dropna() Syntax
The syntax of the dropna()
method in Pandas is:
df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
dropna() Arguments
The dropna()
method takes following arguments:
axis
(optional) - specifies whether to drop rows or columnshow
(optional) - determines the condition for droppingthresh
(optional) - specifies a minimum number of non-null values required to keep the row/columnsubset
(optional) - allows us to specify a subset of columns to consider when dropping rows with missing valuesinplace
(optional) - IfTrue
, modifies the original DataFrame in place; ifFalse
, returns a new DataFrame.
dropna() Return Value
The dropna()
method returns a new DataFrame with missing values dropped according to the specified parameters.
Example1: Drop Missing Values
import pandas as pd
# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
'B': [None, 2, 13, None, 65]}
df = pd.DataFrame(data)
# drop missing values
df_dropped = df.dropna()
print(df_dropped)
Output
A B 1 20.0 2.0 4 55.0 65.0
In the above example, we have used the dropna()
method to remove rows containing missing values from the df DataFrame and store the result in the df_dropped DataFrame.
The df_dropped contains only the rows from df that don't have any missing values.
Example 2: Use axis Argument to Drop Rows and Columns Containing Missing Values
import pandas as pd
# create a DataFrame with missing values
data = {'A': [1, 2, None, 4],
'B': [None, 2, 3, 4],
'C': [1, 2, 3, 4]}
df = pd.DataFrame(data)
# original DataFrame remains unchanged
print("Original DataFrame:")
print(df)
print()
# drop rows with any missing values and create a new DataFrame
df_rows_dropped = df.dropna(axis=0, inplace=False)
print("DataFrame with rows dropped:")
print(df_rows_dropped)
print()
# drop columns with any missing values and create a new DataFrame
df_columns_dropped = df.dropna(axis=1, inplace=False)
print("DataFrame with columns dropped:")
print(df_columns_dropped)
Output
Original DataFrame:
A B C
0 1.0 NaN 1
1 2.0 2.0 2
2 NaN 3.0 3
3 4.0 4.0 4
DataFrame with rows dropped:
A B C
1 2.0 2.0 2
3 4.0 4.0 4
DataFrame with columns dropped:
C
0 1
1 2
2 3
3 4
Here,
- Rows with any missing values are dropped using
axis=0
, and the result is stored in df_rows_dropped. - Columns with any missing values are dropped using
axis=1
, and the result is stored in df_columns_dropped.
Also, the use of inplace=False
argument ensures that the original DataFrame remains unchanged and the results are stored in new DataFrames.
Example 3: Determine Condition for Dropping
import pandas as pd
data = {'A': [1, 2, None, 4],
'B': [None, 2, None, 4]}
df = pd.DataFrame(data)
# drop rows with any missing values
result_any = df.dropna(how='any')
print("Using how='any':")
print(result_any)
print()
# drop rows with all missing values
result_all = df.dropna(how='all')
print("\nUsing how='all':")
print(result_all)
Output
Using how='any': A B 1 2.0 2.0 3 4.0 4.0 Using how='all': A B 0 1.0 NaN 1 2.0 2.0 3 4.0 4.0
Here, when
how='any'
(default) - rows containing any missing values are dropped, leaving only the rows where both columns'A'
and'B'
have non-null values.how='all'
- rows containing all missing values are removed, and only rows with at least one non-null value in any column are kept.
Example 4: Drop Rows Based on Threshold
import pandas as pd
# creating a DataFrame with some NaN values
data = {
'A': [1, 2, None, 4],
'B': [5, None, None, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, None]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# use dropna() with thresh parameter
# Keeping only those rows which have at least 3 non-NaN values
cleaned_df_rows = df.dropna(thresh=3)
print("\nDataFrame after dropping rows with less than 3 non-NaN values:")
print(cleaned_df_rows)
Output
Original DataFrame:
A B C D
0 1.0 5.0 9 13.0
1 2.0 NaN 10 14.0
2 NaN NaN 11 15.0
3 4.0 8.0 12 NaN
DataFrame after dropping rows with less than 3 non-NaN values:
A B C D
0 1.0 5.0 9 13.0
1 2.0 NaN 10 14.0
3 4.0 8.0 12 NaN
In the above example, we have used the dropna(thresh=3)
method to remove rows which do not have at least 3 non-NaN values.
Hence, row at index 2 is removed.
Example 5: Selectively Remove Rows Containing Missing Data
import pandas as pd
# creating a DataFrame with some NaN values
data = {
'A': [1, 2, None, 4],
'B': [5, None, None, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, None]
}
df = pd.DataFrame(data)
# use dropna() with subset parameter
# drop the rows where NaN appears in column 'B' or 'D'
cleaned_df = df.dropna(subset=['B', 'D'])
print("Original DataFrame:")
print(df)
print("\nDataFrame after dropping rows with NaN in columns 'B' or 'D':")
print(cleaned_df)
Output
DataFrame dropped forward:
Original DataFrame:
A B C D
0 1.0 5.0 9 13.0
1 2.0 NaN 10 14.0
2 NaN NaN 11 15.0
3 4.0 8.0 12 NaN
DataFrame after dropping rows with NaN in columns 'B' or 'D':
A B C D
0 1.0 5.0 9 13.0
Here, when we apply dropna(subset=['B', 'D'])
, it checks only columns B
and D
for missing values.
If any missing value is found in these columns, the corresponding row is removed.