The filter()
method in Pandas is used to filter rows and columns from a DataFrame based on specified conditions.
Example
import pandas as pd
# create a sample DataFrame
data = {'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]}
df = pd.DataFrame(data)
# use filter() to select specific columns by name
selected_columns = df.filter(items=['A', 'C'])
# print the resulting DataFrame
print(selected_columns)
'''
Output
A C
0 1 7
1 2 8
2 3 9
'''
filter() Syntax
The syntax of the filter()
method in Pandas is:
df.filter(items=None, like=None, regex=None)
filter() Arguments
The filter()
method takes following arguments:
items
(optional) - a list containing the labels of the columns we want to keeplike
(optional) - a string that represents a substring to match in the column namesregex
(optional) - a regular expression pattern
filter() Return Value
The filter()
method returns the selected columns from a DataFrame based on specified conditions, such as column names, substrings, or regular expression patterns.
Example1: Select Columns Containing Certain Substring
import pandas as pd
# create a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['New York', 'Los Angeles', 'Chicago']
}
# create a DataFrame df from data
df = pd.DataFrame(data)
# use filter() to select specific columns ('Name' and 'Age') from df
selected_columns = df.filter(items=['Name', 'Age'])
# display the selected columns
print(selected_columns)
Output
Name Age 0 Alice 25 1 Bob 30 2 Charlie 22
In the above example, we first created the df DataFrame with three columns: Name
, Age
, and City
.
Then, we use the filter()
method with the items
parameter to select only the Name
and Age
columns.
Example 2: Use like Parameter to Select Columns Containing Certain Substring
import pandas as pd
# sample DataFrame
data = {'apple_count': [3, 2, 5],
'banana_count': [1, 4, 6],
'orange_count': [4, 3, 2]}
df = pd.DataFrame(data)
# select columns containing the substring "apple"
filtered_columns = df.filter(like='apple')
print(filtered_columns)
Output
apple_count 0 3 1 2 2 5
In this example, we used the filter()
method with the like
parameter to select columns in the DataFrame that contain the substring apple
in their column names.
The result is stored in the filtered_columns DataFrame, which only contains the apple_count
column since it matches the substring apple
.
Example 3: Select Columns Using Regular Expression Pattern
import pandas as pd
# create a sample DataFrame
data = {'A_column': [1, 2, 3],
'B_column': [4, 5, 6],
'C_Column': [7, 8, 9]}
df = pd.DataFrame(data)
# use filter() with a regular expression pattern to select columns
filtered_df = df.filter(regex='^A|C_')
print(filtered_df)
Output
A_column C_Column 0 1 7 1 2 8 2 3 9
Here, we have created the df DataFrame with columns A_column
, B_column
, and C_column
.
We have used the filter()
function with the regex
parameter set to '^A|C_'
, which means we want to select columns that start with 'A'
or have names starting with 'C_'
.
As a result, the filtered_df contains only columns 'A_column'
and 'C_column'
.
Note: To learn more about Regular Expressions, please visit Python RegEx.