The sort_values() method in Pandas is used to sort a DataFrame by one or more columns.
Example
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28]}
df = pd.DataFrame(data)
# sort df by 'Age' column in ascending order
df_sorted = df.sort_values(by='Age')
print(df_sorted)
'''
Output
Name Age
2 Charlie 22
0 Alice 25
3 David 28
1 Bob 30
'''
sort_values() Syntax
The syntax of the sort_values() method in Pandas is:
df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)
sort_values() Arguments
The sort_values() method takes following arguments:
by- column name or a list of column names by which we want to sort the DataFrameaxis(optional) - specifies if we want to sort by rows or columnsascending(optional) - boolean or a list of booleans that determines the sorting orderinplace(optional) - boolean that determines whether to sort the DataFrame in place or return a new sorted DataFramekind(optional) - specifies the sorting algorithm to usena_position(optional) - determines whereNaNvalues should be placed during sortingignore_index(optional) - boolean that determines whether to reset the index of the resulting DataFrame
sort_values() Return Value
The sort_values() method in Pandas returns a new DataFrame that contains the sorted data based on the specified criteria.
Example1: Sort Column in Descending Order
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28]}
df = pd.DataFrame(data)
# sort df by 'Age' column in descending order
df_sorted = df.sort_values(by='Age', ascending=False)
print(df_sorted)
Output
Name Age
1 Bob 30
3 David 28
0 Alice 25
2 Charlie 22
In the above example, we have used the sort_values() method to sort the df DataFrame by the Age column in descending order ascending=False.
This means that the individuals will be arranged in the DataFrame with the oldest person at the top.
Example 2: Sort DataFrame by Multiple Columns
import pandas as pd
data = {'Name': ['Eve', 'Frank', 'Grace', 'Hank'],
'Age': [28, 22, 30, 25],
'Score': [75, 80, 85, 90]}
df = pd.DataFrame(data)
# sort DataFrame by 'Age' and then by 'Score' (Both in ascending order)
df1 = df.sort_values(by=['Age', 'Score'])
print("Sorting by 'Age' (ascending) and then by 'Score' (ascending):\n")
print(df1.to_string(index=False))
print()
# sort DataFrame by 'Age' in ascending order, and then by 'Score' in descending order
df2 = df.sort_values(by=['Age', 'Score'], ascending=[True, False])
print("Sorting by 'Age' (ascending) and then by 'Score' (descending):\n")
print(df2.to_string(index=False))
Output
Sorting by 'Age' (ascending) and then by 'Score' (ascending): Name Age Score Frank 22 80 Hank 25 90 Eve 28 75 Grace 30 85 Sorting by 'Age' (ascending) and then by 'Score' (descending): Name Age Score Frank 22 80 Hank 25 90 Eve 28 75 Grace 30 85
Here,
- df1 shows the default sorting behavior (both columns
AgeandScoreare in ascending order). - df2 shows custom sorting, where
Ageis in ascending andScoreis in descending order.
Example 3: Sort DataFrame Based on Rows or Columns
import pandas as pd
data = {'A': [3, 1, 2, 4],
'B': [9, 7, 8, 6]}
df = pd.DataFrame(data)
# sort the DataFrame by rows based on column 'A' values in ascending order
df_sorted_rows = df.sort_values(by='A', axis=0)
print("Sorted by rows based on 'A' values:")
print(df_sorted_rows)
# sort the DataFrame by columns based on the values in the first row (index 0)
df_sorted_columns = df.sort_values(by=0, axis=1, ascending=False, ignore_index=True)
print("\nSorted by columns based on values in the first row:")
print(df_sorted_columns)
Output
Sorted by rows based on 'A' values:
A B
1 1 7
2 2 8
0 3 9
3 4 6
Sorted by columns based on values in the first row:
0 1
0 9 3
1 7 1
2 8 2
3 6 4
In the above example, we first sorted the df DataFrame by rows axis=0 based on the values in column A in ascending order.
Then, we sorted the same DataFrame by column axis=1 based on the values in the first row index 0 in descending order.
Here, the ignore_index=True parameter is used when sorting by column A. As a result, the original row indices (0, 1, 2, 3) are discarded, and the sorted DataFrame has a new sequential row index (0, 1, 2, 3).
This can be helpful when you want to maintain a clean, sequential index after sorting your DataFrame.
Example 4: Specify Sorting Algorithm to Sort DataFrame
Pandas by default uses the quicksort algorithm for the sort_values() method. If we want to specify a different sorting algorithm, you can use the kind parameter.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28]}
df = pd.DataFrame(data)
# sort df by 'Age' column using merge sort algorithm
df_sorted = df.sort_values(by='Age', kind='mergesort')
print(df_sorted)
Output
Name Age
2 Charlie 22
0 Alice 25
3 David 28
1 Bob 30
Here, the kind='mergesort' parameter is used to specify the merge sort algorithm for sorting the DataFrame by the Age column in ascending order.
Note: We can replace 'mergesort' with other available sorting algorithms like 'quicksort', 'heapsort', or 'stable' as needed.
Example 5: Determine the Placement of Missing Values During Sorting Operation
The na_position argument is used to determine the placement of missing values during the sorting operation.
na_position='last'(default) - missing values are placed at the end of the sorted columnas_index='first'- missing values are placed at the beginning of the sorted column
Let's look at an example.
import pandas as pd
data = {'A': [3, 1, 2, None, 4],
'B': [9, None, 8, 6, 7]}
df = pd.DataFrame(data)
# sort df by column 'A' in ascending order with missing values at the end
df_sorted_last = df.sort_values(by='A', na_position='last')
print("Sorted by 'A' with missing values at the end:")
print(df_sorted_last)
# sort df by column 'B' in ascending order with missing values at the beginning
df_sorted_first = df.sort_values(by='B', na_position='first')
print("\nSorted by 'B' with missing values at the beginning:")
print(df_sorted_first)
Output
Sorted by 'A' with missing values at the end:
A B
1 1.0 NaN
2 2.0 8.0
0 3.0 9.0
4 4.0 7.0
3 NaN 6.0
Sorted by 'B' with missing values at the beginning:
A B
1 1.0 NaN
3 NaN 6.0
4 4.0 7.0
2 2.0 8.0
0 3.0 9.0
Here,
df.sort_values(by='A', na_position='last')- missing values are placed at the end of columnAdf.sort_values(by='B', na_position='first')- missing values are placed at the beginning of columnB