Pandas str.split()

The str.split() method in Pandas is used to split the elements of a Series into separate parts based on a specified delimiter.

Example

import pandas as pd

# create a Series
data = pd.Series(['apple,banana,cherry', 'cat,dog'])

# split each string by the comma delimiter split_data = data.str.split(',')
print(split_data) ''' Output 0 [apple, banana, cherry] 1 [cat, dog] dtype: object '''

str.split() Syntax

The syntax of the str.split() method in Pandas is:

Series.str.split(pat=None, n=-1, expand=False, regex=None)

str.split() Arguments

The str.split() method takes following arguments:

  • pat (optional) - the string or regular expression to split on
  • n (optional) - an integer, specifying the maximum number of splits.
  • expand (optional) - if True, returns a DataFrame with separate columns for each split. If False, returns a Series
  • regex (optional) - specifies whether to assume the pattern as a regular expression or not

str.split() Return Value

The str.split() method returns a DataFrame with separate columns for each split if expand=True. Else returns a Series if expand=False.


Example1: Basic Split on Delimiter

import pandas as pd

# create a Series
data = pd.Series(['apple,banana,cherry', 'dog,cat,mouse'])

# split each string in the Series by commas result = data.str.split(',')
print(result)

Output

0    [apple, banana, cherry]
1    [dog, cat, mouse]
dtype: object

In the above example, we first created the data Series with fruit names and animal names.

Then, we used the str.split(',') method to split each string in data by commas.


Example 2: Limit the Number of Splits

import pandas as pd

# create a Series
data = pd.Series(['apple-banana-cherry', 'dog-cat-mouse', 'sun-moon-stars'])

# split each string only at the first hyphen result = data.str.split('-', n=1)
print(result)

Output

0    [apple, banana-cherry]
1    [dog, cat-mouse]
2    [sun, moon-stars]
dtype: object

Here, we have used the str.split() method on each string in the data Series, specifying a hyphen - as the separator.

The parameter n=1 limits the operation to perform only one split per string.

Hence, the result is a Series where each element is a list containing two strings, the part before the first hyphen and the remainder of the string.


Example 3: Split and Expand into DataFrame

import pandas as pd

# create a Series
data = pd.Series(['apple,banana,cherry', 'dog,cat,mouse', 'sun,moon,stars'])

# split each string by the comma and e # expand the result into separate DataFrame columns result = data.str.split(',', expand=True)
print(result)

Output

     0       1       2
0  apple  banana  cherry
1    dog     cat   mouse
2    sun    moon   stars

In this example, the expand=True parameter causes the split segments to be turned into separate columns in a DataFrame.

So, for a string like apple,banana,cherry, it would be split into three separate columns with the values apple, banana, and cherry in the first row of the DataFrame.

The same process applies to the other strings in the data Series.


Example 4: Split Using Regular Expression

import pandas as pd

# create a Series with dates in different formats
data = pd.Series(['2023-11-21', '11/21/2023', '21.11.2023'])

# regular expression pattern to match different date separators 
regex_pattern = r'[-/.]'

# use str.split() with the regex pattern to split each date string into parts result = data.str.split(regex_pattern, regex=True)
print(result)

Output

0    [2023, 11, 21]
1    [11, 21, 2023]
2    [21, 11, 2023]
dtype: object

In the above example, the regex pattern r'[-/.]' matches common date separators: hyphen -, forward slash /, and dot ..

And the str.split() method is used on the data Series with the specified regex pattern. The regex=True argument tells pandas to interpret the pattern as a regular expression.

Hence, the result Series will contain lists of date components, where each date string is split into separate parts based on the separators.

Note: To learn more about Regular Expressions, please visit Python RegEx.