Sorting is a fundamental operation in data manipulation and analysis that involves arranging data in a specific order.
Sorting is crucial for tasks such as organizing data for better readability, identifying patterns, making comparisons, and facilitating further analysis.
Sort DataFrame in Pandas
In Pandas, we can use the sort_values()
function to sort a DataFrame. For example,
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [28, 22, 25]}
df = pd.DataFrame(data)
# sort DataFrame by Age in ascending order
sorted_df = df.sort_values(by='Age')
print(sorted_df.to_string(index=False))
Output
Name Age
Bob 22
Charlie 25
Alice 28
In the above example, df.sort_values(by='Age')
sorts the df DataFrame based on the values in the Age column in ascending order. And the result is stored in the sorted_df variable.
To sort values in descending order, we use the ascending parameter as:
sorted_df = df.sort_values(by='Age', ascending=False)
The output would be:
Name Age
Alice 28
Charlie 25
Bob 22
Note: The .to_string(index=False)
is used to display values without the index.
Sort Pandas DataFrame by Multiple Columns
We can also sort DataFrame by multiple columns in Pandas. When we sort a Pandas DataFrame by multiple columns, the sorting is done with a priority given to the order of the columns listed.
To sort by multiple columns in Pandas, you can pass the desired columns as a list to the by
parameter in the sort_values()
method. Here's how we do it.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 22, 30, 22],
'Score': [85, 90, 75, 80]}
df = pd.DataFrame(data)
# 1. Sort DataFrame by 'Age' and then by 'Score' (Both in ascending order)
df1 = df.sort_values(by=['Age', 'Score'])
print("Sorting by 'Age' (ascending) and then by 'Score' (ascending):\n")
print(df1.to_string(index=False))
print()
# 2. Sort DataFrame by 'Age' in ascending order, and then by 'Score' in descending order
df2 = df.sort_values(by=['Age', 'Score'], ascending=[True, False])
print("Sorting by 'Age' (ascending) and then by 'Score' (descending):\n")
print(df2.to_string(index=False))
Output
Name Age Score
Bob 22 90
David 22 80
Alice 25 85
Charlie 30 75
Here,
- df1 shows the default sorting behavior (both columns in ascending order).
- df2 shows custom sorting, where
Age
is in ascending andScore
is in descending order.
Sort Pandas Series
In Pandas, we can use the sort_values()
function to sort a Series. For example,
import pandas as pd
ages = pd.Series([28, 22, 25], name='Age')
# sort Series in ascending order
sorted_ages = ages.sort_values()
print(sorted_ages.to_string(index=False))
Output
22 25 28
Here, ages.sort_values()
sorts the ages Series in ascending order. The sorted result is assigned to the sorted_ages variable.
#index Sort Pandas DataFrame Using sort_index()
We can also sort by the index of a DataFrame in Pandas using the sort_index()
function.
The sort_index()
function is used to sort a DataFrame or Series by its index. This is useful for organizing data in a logical order, improving query performance, and ensuring consistent data representation.
Let's look at an example.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [28, 22, 25]}
# create a DataFrame with a non-sequential index
df = pd.DataFrame(data, index=[2, 0, 1])
print("Original DataFrame:")
print(df.to_string(index=True))
print("\n")
# sort DataFrame by index in ascending order
sorted_df = df.sort_index()
print("Sorted DataFrame by index:")
print(sorted_df.to_string(index=True))
Output
Original DataFrame:
Name Age
2 Alice 28
0 Bob 22
1 Charlie 25
Sorted DataFrame by index:
Name Age
0 Bob 22
1 Charlie 25
2 Alice 28
In the above example, we have created the df DataFrame with a non-sequential index from the data dictionary.
The index
parameter is specified as [2, 0, 1]
, meaning that the rows will not have a default sequential index (0, 1, 2), but rather the provided non-sequential index.
Then we sorted the df DataFrame by its index in ascending order using the sort_index()
method.