Pandas first()

The first() method in Pandas is used to select the first n rows of data from each group of a DataFrame.

Example

import pandas as pd

# sample DataFrame
data = {
    'Group': ['A', 'B', 'A', 'B'],
    'Data': [1, 2, 3, 4]
}

df = pd.DataFrame(data)

# group by 'Group' and get the first row for each group first_rows = df.groupby('Group').first()
print(first_rows) ''' Output Data Group A 1 B 2 '''

first() Syntax

The syntax of the first() method in Pandas is:

df.first(offset)

first() Arguments

The first() method takes following arguments:

  • offset - offset length of the data that will be selected

first() Return Value

The first() method in Pandas returns a DataFrame object that contains the first n rows for each group, considering the index of the DataFrame is sorted.


Example1: Use first() for Grouped Data Selection

import pandas as pd

# create a sample DataFrame
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'C', 'C'],
    'Value': [1, 2, 3, 4, 5, 6, 7],
    'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-03-01', '2021-01-02', '2022-01-01', '2022-01-02']
}

df = pd.DataFrame(data)

# group by 'Group' column
grouped = df.groupby('Group')

# use first() to get the first entry for each group first_entries = grouped.first()
print(first_entries)

Output

             Value  Date
Group                   
A            1     2021-01-01
B            4     2021-03-01
C            6     2022-01-01

In the above example, we have created the df DataFrame and grouped df by the Group column using the groupby() method.

Then the first() method is applied to the grouped object, and the result is printed out, showing the first occurrence of each Group along with the corresponding Value and Date.


Example 2: First Entries of a Time Series Dataframe

import pandas as pd

# create a sample time series data
dates = pd.date_range('20210101', periods=6, freq='D')
data = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]}, index=dates)

# get the first three days of data first_days = data.first('3D')
print(first_days)

Output

            A
2021-01-01  1
2021-01-02  2
2021-01-03  3

Here, first we have created a range of 6 consecutive dates starting from January 1st, 2021, with a daily frequency D, using pd.date_range().

And, we created the data DataFrame with single columns A and the index is set to dates.

Then we used data.first('3D') to select the first three days of the time series.

Note: To learn more about how to create date ranges, please visit Pandas date_range().


Example 3: first() on Sorted Groups

import pandas as pd

data = {
    'Group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value': [2, 1, 4, 3, 6, 5],
    'Date': ['2021-01-02', '2021-01-01', '2021-01-02', '2021-01-01', '2021-01-02', '2021-01-01']
}

# create DataFrame and sort using sort_values()
df = pd.DataFrame(data).sort_values(by=['Group', 'Date'])

# group by 'Group' and get first entry per group grouped = df.groupby('Group') first_entries = grouped.first()
print(first_entries)

Output

       Value   Date
Group                  
A       1     2021-01-01
B       3     2021-01-01
C       5     2021-01-01

In the above example, the data is first sorted by Group and then by Date columns using sort_values(). It is then grouped by the Group column.

The first() method is applied to each group, which will select the first occurrence of each group based on the sorted order.

Note: To learn more about how we sort values, please visit Pandas sort_values().