Pandas str.contains()

The str.contains() method in Pandas is used to test if a pattern or regex is contained within a string of a Series.

Example

import pandas as pd

# create a pandas Series 
cities = pd.Series(['New York', 'London', 'Tokyo', 'Paris', 'Moscow'])

# use contains() to check which city names contain the substring 'o' contains_o = cities.str.contains('o')
print(contains_o) ''' Output 0 True 1 True 2 True 3 False 4 True dtype: bool '''

str.contains() Syntax

The syntax of the str.contains() method in Pandas is:

Series.str.contains(pat, case=True, na=nan, regex=True)

str.contains() Arguments

The str.contains() method takes following arguments:

  • pat - string pattern or regular expression we are looking for within each element of the Series
  • case (optional) - specifies whether to perform case-sensitive or case-insensitive matching
  • na (optional) - a fill value for missing values
  • regex (optional) - specifies whether to assume the pattern as a regular expression or not

str.contains() Return Value

The str.contains() method returns a Boolean Series showing whether each element in the Series contains the pattern or regex.


Example1: Check Which Series Elements Contain Given Substring

import pandas as pd

# create a Series
data = pd.Series(['apple', 'banana', 'cherry', 'date'])

# use contains() to check which elements contain the substring 'a' contains_a = data.str.contains('a')
print(contains_a)

Output

0     True
1     True
2    False
3     True
dtype: bool

In the above example, we first created the data Series with fruit names.

Then, we used the str.contains() method to check which elements in the Series contain the substring a.

The result is a Series of Boolean values (True or False), indicating whether each element in data contains a.


Example 2: Case-Sensitive and Case-Insensitive Searches with case Parameter

import pandas as pd

# create a Series
data = pd.Series(['Apple', 'banana', 'Cherry', 'Date', 'APRICOT'])

# case-sensitive search (default behavior) case_sensitive_result = data.str.contains('a')
# case-insensitive search case_insensitive_result = data.str.contains('a', case=False)
print("Case-sensitive search:\n", case_sensitive_result) print("\nCase-insensitive search:\n", case_insensitive_result)

Output

Case-sensitive search:
0    False
1     True
2    False
3     True
4    False
dtype: bool

Case-insensitive search:
0     True
1     True
2    False
3     True
4     True
dtype: bool

Here,

  1. data.str.contains('a') - only returns True for elements where a appears in the exact case specified (lowercase a).
  2. data.str.contains('a', case=False) - ignores the case of a, thus matching both a and A in any element of the data Series.

Example 3: Handle Missing Data with na Parameter in Pandas str.contains()

import pandas as pd

# create a Series with missing values
data = pd.Series(['apple', 'banana', None, 'cherry', None, 'date'])

# check which elements contain 'a', treating missing values as False result_with_na_false = data.str.contains('a', na=False)
# check which elements contain 'a', treating missing values as True result_with_na_true = data.str.contains('a', na=True)
print("With na=False:\n", result_with_na_false) print("\nWith na=True:\n", result_with_na_true)

Output

With na=False:
0     True
1     True
2    False
3    False
4    False
5     True
dtype: bool

With na=True:
0     True
1     True
2     True
3    False
4     True
5     True
dtype: bool

In this example, when

  1. na=False, missing values None in the Series results in False in the output result_with_na_false Series.
  2. na=True, missing values in the Series result in True in the output result_with_na_true Series.

Example 4: Using Regular Expression in str.contains()

import pandas as pd

# create a Series
data = pd.Series(['Apple123', 'banana', 'Cherry', 'Date', 'XYZ', '12345', 'abc'])

# regular expression to find strings containing digits or a, b, c (case insensitive)
regex_pattern = '[0-9abcABC]'

# use str.contains() with the regex pattern result = data.str.contains(regex_pattern, regex=True)
print(result)

Output

0     True
1     True
2     True
3     True
4    False
5     True
6     True
dtype: bool

In the above example, the regex pattern [0-9abcABC] looks for any character that is either a digit from 0 to 9 or one of the letters a, b, or c in either upper or lower case.

And the str.contains() method with regex=True is used to apply this pattern to each element in the data Series.

Hence, the result Series will contain True for elements that match the pattern and False for those that don't.

Note: To learn more about Regular Expressions, please visit Python RegEx.