The str.split()
method in Pandas is used to split the elements of a Series into separate parts based on a specified delimiter.
Example
import pandas as pd
# create a Series
data = pd.Series(['apple,banana,cherry', 'cat,dog'])
# split each string by the comma delimiter
split_data = data.str.split(',')
print(split_data)
'''
Output
0 [apple, banana, cherry]
1 [cat, dog]
dtype: object
'''
str.split() Syntax
The syntax of the str.split()
method in Pandas is:
Series.str.split(pat=None, n=-1, expand=False, regex=None)
str.split() Arguments
The str.split()
method takes following arguments:
pat
(optional) - the string or regular expression to split onn
(optional) - an integer, specifying the maximum number of splits.expand
(optional) - ifTrue
, returns a DataFrame with separate columns for each split. IfFalse
, returns a Seriesregex
(optional) - specifies whether to assume the pattern as a regular expression or not
str.split() Return Value
The str.split()
method returns a DataFrame with separate columns for each split if expand=True
. Else returns a Series if expand=False
.
Example1: Basic Split on Delimiter
import pandas as pd
# create a Series
data = pd.Series(['apple,banana,cherry', 'dog,cat,mouse'])
# split each string in the Series by commas
result = data.str.split(',')
print(result)
Output
0 [apple, banana, cherry]
1 [dog, cat, mouse]
dtype: object
In the above example, we first created the data Series with fruit names and animal names.
Then, we used the str.split(',')
method to split each string in data by commas.
Example 2: Limit the Number of Splits
import pandas as pd
# create a Series
data = pd.Series(['apple-banana-cherry', 'dog-cat-mouse', 'sun-moon-stars'])
# split each string only at the first hyphen
result = data.str.split('-', n=1)
print(result)
Output
0 [apple, banana-cherry]
1 [dog, cat-mouse]
2 [sun, moon-stars]
dtype: object
Here, we have used the str.split()
method on each string in the data Series, specifying a hyphen -
as the separator.
The parameter n=1
limits the operation to perform only one split per string.
Hence, the result is a Series where each element is a list containing two strings, the part before the first hyphen and the remainder of the string.
Example 3: Split and Expand into DataFrame
import pandas as pd
# create a Series
data = pd.Series(['apple,banana,cherry', 'dog,cat,mouse', 'sun,moon,stars'])
# split each string by the comma and e
# expand the result into separate DataFrame columns
result = data.str.split(',', expand=True)
print(result)
Output
0 1 2
0 apple banana cherry
1 dog cat mouse
2 sun moon stars
In this example, the expand=True
parameter causes the split segments to be turned into separate columns in a DataFrame.
So, for a string like apple,banana,cherry
, it would be split into three separate columns with the values apple
, banana
, and cherry
in the first row of the DataFrame.
The same process applies to the other strings in the data Series.
Example 4: Split Using Regular Expression
import pandas as pd
# create a Series with dates in different formats
data = pd.Series(['2023-11-21', '11/21/2023', '21.11.2023'])
# regular expression pattern to match different date separators
regex_pattern = r'[-/.]'
# use str.split() with the regex pattern to split each date string into parts
result = data.str.split(regex_pattern, regex=True)
print(result)
Output
0 [2023, 11, 21] 1 [11, 21, 2023] 2 [21, 11, 2023] dtype: object
In the above example, the regex pattern r'[-/.]'
matches common date separators: hyphen -
, forward slash /
, and dot .
.
And the str.split()
method is used on the data Series with the specified regex pattern. The regex=True
argument tells pandas to interpret the pattern as a regular expression.
Hence, the result Series will contain lists of date components, where each date string is split into separate parts based on the separators.
Note: To learn more about Regular Expressions, please visit Python RegEx.