The str.contains()
method in Pandas is used to test if a pattern or regex is contained within a string of a Series.
Example
import pandas as pd
# create a pandas Series
cities = pd.Series(['New York', 'London', 'Tokyo', 'Paris', 'Moscow'])
# use contains() to check which city names contain the substring 'o'
contains_o = cities.str.contains('o')
print(contains_o)
'''
Output
0 True
1 True
2 True
3 False
4 True
dtype: bool
'''
str.contains() Syntax
The syntax of the str.contains()
method in Pandas is:
Series.str.contains(pat, case=True, na=nan, regex=True)
str.contains() Arguments
The str.contains()
method takes following arguments:
pat
- string pattern or regular expression we are looking for within each element of the Seriescase
(optional) - specifies whether to perform case-sensitive or case-insensitive matchingna
(optional) - a fill value for missing valuesregex
(optional) - specifies whether to assume the pattern as a regular expression or not
str.contains() Return Value
The str.contains()
method returns a Boolean Series showing whether each element in the Series contains the pattern or regex.
Example1: Check Which Series Elements Contain Given Substring
import pandas as pd
# create a Series
data = pd.Series(['apple', 'banana', 'cherry', 'date'])
# use contains() to check which elements contain the substring 'a'
contains_a = data.str.contains('a')
print(contains_a)
Output
0 True 1 True 2 False 3 True dtype: bool
In the above example, we first created the data Series with fruit names.
Then, we used the str.contains()
method to check which elements in the Series contain the substring a
.
The result is a Series of Boolean values (True
or False
), indicating whether each element in data contains a
.
Example 2: Case-Sensitive and Case-Insensitive Searches with case Parameter
import pandas as pd
# create a Series
data = pd.Series(['Apple', 'banana', 'Cherry', 'Date', 'APRICOT'])
# case-sensitive search (default behavior)
case_sensitive_result = data.str.contains('a')
# case-insensitive search
case_insensitive_result = data.str.contains('a', case=False)
print("Case-sensitive search:\n", case_sensitive_result)
print("\nCase-insensitive search:\n", case_insensitive_result)
Output
Case-sensitive search: 0 False 1 True 2 False 3 True 4 False dtype: bool Case-insensitive search: 0 True 1 True 2 False 3 True 4 True dtype: bool
Here,
data.str.contains('a')
- only returnsTrue
for elements wherea
appears in the exact case specified (lowercasea
).data.str.contains('a', case=False)
- ignores the case ofa
, thus matching botha
andA
in any element of the data Series.
Example 3: Handle Missing Data with na Parameter in Pandas str.contains()
import pandas as pd
# create a Series with missing values
data = pd.Series(['apple', 'banana', None, 'cherry', None, 'date'])
# check which elements contain 'a', treating missing values as False
result_with_na_false = data.str.contains('a', na=False)
# check which elements contain 'a', treating missing values as True
result_with_na_true = data.str.contains('a', na=True)
print("With na=False:\n", result_with_na_false)
print("\nWith na=True:\n", result_with_na_true)
Output
With na=False: 0 True 1 True 2 False 3 False 4 False 5 True dtype: bool With na=True: 0 True 1 True 2 True 3 False 4 True 5 True dtype: bool
In this example, when
na=False
, missing valuesNone
in the Series results inFalse
in the output result_with_na_false Series.na=True
, missing values in the Series result inTrue
in the output result_with_na_true Series.
Example 4: Using Regular Expression in str.contains()
import pandas as pd
# create a Series
data = pd.Series(['Apple123', 'banana', 'Cherry', 'Date', 'XYZ', '12345', 'abc'])
# regular expression to find strings containing digits or a, b, c (case insensitive)
regex_pattern = '[0-9abcABC]'
# use str.contains() with the regex pattern
result = data.str.contains(regex_pattern, regex=True)
print(result)
Output
0 True 1 True 2 True 3 True 4 False 5 True 6 True dtype: bool
In the above example, the regex pattern [0-9abcABC]
looks for any character that is either a digit from 0 to 9 or one of the letters a
, b
, or c
in either upper or lower case.
And the str.contains()
method with regex=True
is used to apply this pattern to each element in the data Series.
Hence, the result Series will contain True
for elements that match the pattern and False
for those that don't.
Note: To learn more about Regular Expressions, please visit Python RegEx.