Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas Correlation

Correlation is a statistical concept that quantifies the degree to which two variables are related to each other.

Correlation can be calculated in Pandas using the corr() function.

Let's look at an example.

import pandas as pd

# create dataframe
data = {
    "Temperature": [22, 25, 32, 28, 30],
    "Ice_Cream_Sales": [105, 120, 135, 130, 125]
}

df = pd.DataFrame(data)

# calculate correlation matrix
print(df.corr())

Output

                 Temperature  Ice_Cream_Sales
Temperature         1.000000         0.923401
Ice_Cream_Sales     0.923401         1.000000

In this example, we used the corr() method on the DataFrame df to calculate the correlation coefficients between the columns.

The output is a correlation matrix that displays the correlation coefficients between all pairs of columns in the dataframe. In this case, there are only two columns, so the matrix is 2x2.

Here, the correlation coefficient between Temperature and Ice_Cream_Sales is 0.923401, which is positive. This indicates that as the temperature increases, the ice cream sales also increase.

The coefficient value of 1.000000 along the diagonal represents the correlation of each column with itself.

Positive and Negative Correlation

Positive correlation refers to a relationship between two variables where they both tend to change in the same direction. When one variable increases, the other variable also tends to increase, and when one variable decreases, the other variable also tends to decrease.

Graph Showing Positive Correlation Between Temperature and Ice Cream Sales — Positive Correlation

In the figure above, we can clearly see that ice cream sales increase with the increase in temperature. We can say that there is a positive correlation between temperature and ice cream sales.

Negative correlation, on the other hand, refers to a relationship between two variables where they tend to change in opposite directions. When one variable increases, the other variable tends to decrease, and vice versa.

Graph Showing Negative Correlation Between Temperature and Coffee Sales — Negative Correlation

In the figure above, coffee sales decrease with increase in temperature. We can say that there is a negative correlation between temperature and coffee sales.

Example: Correlation Between Two Columns

Instead of finding the whole correlation matrix, we can specify the columns to calculate correlation between them.

import pandas as pd

# create dataframe
data = {
    "Temperature": [22, 25, 32, 28, 30],
    "Ice_Cream_Sales": [105, 120, 135, 130, 125]
}

df = pd.DataFrame(data)

# calculate correlation coefficient
correlation = df['Temperature'].corr(df["Ice_Cream_Sales"])

print(correlation)

Output

0.9234007664064656

In this example, we calculated correlation between Temperature and Ice_Cream_Sales.

The syntax for doing so is:

df['column1'].corr(df['column2'])

Example: Missing Values

DataFrame may contain missing values (NaN). The corr() function completely ignores the rows with NaN values.

import pandas as pd
import numpy as np

# create a dataframe
data = {
    "Temperature": [22, 25, 32, 28, 30],
    "Coffee_Sales": [158, 145, np.nan, np.nan, 140]
}

df = pd.DataFrame(data)

# calculate correlation between Temperature and Ice_Cream_sales
correlation1 = df["Temperature"].corr(df["Coffee_Sales"])

print("With NaN values")
print(df)
print(f"correlation = {correlation1}")
print()

# remove missing values
df.dropna(inplace=True)

# calculate correlation between Temperature and Ice_Cream_sales
correlation2 = df["Temperature"].corr(df["Coffee_Sales"])

print("Without NaN values")
print(df)
print(f"correlation = {correlation2}")
print()

Output

With NaN values
   Temperature  Coffee_Sales
0           22         158.0
1           25         145.0
2           32           NaN
3           28           NaN
4           30         140.0
correlation = -0.923177938058926

Without NaN values
   Temperature  Coffee_Sales
0           22         158.0
1           25         145.0
4           30         140.0
correlation = -0.923177938058926

Notice that the correlation value is the same before and after removing the NaN values. This means that the NaN values are completely ignored by corr().

We used the NumPy Library to generate NaN values.

Correlation Methods in Pandas

We can calculate correlation using three different methods in Pandas:

Pearson Method (Default): evaluates the linear relationship between two continuous variables
Kendall Method: measures the ordinal association between two measured quantities
Spearman Method: evaluates the monotonic relationship between two continuous or ordinal variables

By default, corr() computes the Pearson correlation coefficient, which measures the linear relationship between two variables.

Example: Pearson, Kendall and Spearman Methods

import pandas as pd

# create dataframe
data = {
    "Temperature": [22, 25, 32, 28, 30],
    "Ice_Cream_Sales": [105, 120, 135, 130, 125]
}

df = pd.DataFrame(data)

# calculate different correlation coefficients
pearson = df['Temperature'].corr(df["Ice_Cream_Sales"])
kendall = df['Temperature'].corr(df["Ice_Cream_Sales"], method='kendall')
spearman = df['Temperature'].corr(df["Ice_Cream_Sales"], method='spearman')

# display different correlation coefficient
print(f"Pearson's Coefficient: {pearson}")
print(f"Kendall's Coefficient: {kendall}")
print(f"Spearman's Coefficient: {spearman}")

Output

Pearson's Coefficient: 0.9234007664064656
Kendall's Coefficient: 0.7999999999999999
Spearman's Coefficient: 0.8999999999999998

Here, Pearson's Coefficient has the highest value, which signifies that the correlation is mostly linear.

Perfect, Good & Bad Correlation

We can interpret the correlation values as:

Perfect Correlation

A perfect positive correlation implies that for every increase in one variable, there is a proportionate increase in the other variable, indicated by a coefficient of +1.

A perfect negative correlation, represented by -1, signifies that an increase in one variable leads to a proportionate decrease in the other.

Graph Showing — Perfect Negative Correlation

Good Correlation

A good correlation can range from 0.5 to 0.9 (positive or negative) and generally indicates a strong relationship between the variables, but it doesn't mean the relationship is perfect.

Bad Correlation

A bad correlation is typically close to zero, indicating that there is no relationship or any form of dependence between the two variables.

Introduction
Positive and Negative Correlation
Example: Correlation Between Two Columns
Example: Missing Values
Correlation Methods in Pandas
Example: Pearson, Kendall and Spearman Methods
Perfect, Good & Bad Correlation

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python
programmer.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Correlation

Positive and Negative Correlation

Example: Correlation Between Two Columns

Example: Missing Values

Correlation Methods in Pandas

Example: Pearson, Kendall and Spearman Methods

Perfect, Good & Bad Correlation

Perfect Correlation

Good Correlation

Bad Correlation

Table of Contents

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python programmer.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Correlation

Positive and Negative Correlation

Example: Correlation Between Two Columns

Example: Missing Values

Correlation Methods in Pandas

Example: Pearson, Kendall and Spearman Methods

Perfect, Good & Bad Correlation

Perfect Correlation

Good Correlation

Bad Correlation

Table of Contents

Become a certified Python
programmer.