The read_csv()
function in Pandas is used to convert a CSV file into a DataFrame.
Example
Let's suppose that sample_data.csv
contains the following content:
Employee ID,First Name,Last Name,Department,Position,Salary
101,John,Doe,Marketing,Manager,50000
102,Jane,Smith,Sales,Associate,35000
103,Michael,Johnson,Finance,Analyst,45000
104,Emily,Williams,HR,Coordinator,40000
Now, let's write code to read the above csv file using read_csv()
.
import pandas as pd
# load data from a CSV file
df = pd.read_csv('sample_data.csv')
print(df)
'''
Output
Employee ID First Name Last Name Department Position Salary
0 101 John Doe Marketing Manager 50000
1 102 Jane Smith Sales Associate 35000
2 103 Michael Johnson Finance Analyst 45000
3 104 Emily Williams HR Coordinator 40000
'''
read_csv() Syntax
The syntax for the read_csv()
function in Pandas is:
pd.read_csv(filepath_or_buffer, sep=',', header='infer', names=None, index_col=None, usecols=None, dtype=None, skiprows=None, nrows=None, na_values=None, parse_dates=False)
read_csv() Arguments
The read_csv()
function takes the following common arguments:
filepath_or_buffer
: the path to the file or a file-like objectsep
ordelimiter
(optional): the delimiter to useheader
(optional): row number to use as column namesnames
(optional): list of column names to use.index_col
(optional): column(s) to set as indexusecols
(optional): return a subset of the columnsdtype
(optional): type for data or column(s)nrows
(optional): number of rows of file to readna_values
(optional): additional strings to recognize asNaN
parse_dates
(optional): boolean or list of integers or names or list of lists or dictionaries
read_csv() Return Value
The read_csv()
function returns a DataFrame containing the data read from the CSV file.
Example 1: Basic CSV Reading
Let's suppose that sample_data.csv
contains the following content:
Employee ID,First Name,Last Name,Department,Position,Salary
101,John,Doe,Marketing,Manager,50000
102,Jane,Smith,Sales,Associate,35000
103,Michael,Johnson,Finance,Analyst,45000
104,Emily,Williams,HR,Coordinator,40000
Now, let's write code to read the above csv file using read_csv()
.
import pandas as pd
# load data from a CSV file
df = pd.read_csv('sample_data.csv')
print(df)
Output
Employee ID First Name Last Name Department Position Salary 0 101 John Doe Marketing Manager 50000 1 102 Jane Smith Sales Associate 35000 2 103 Michael Johnson Finance Analyst 45000 3 104 Emily Williams HR Coordinator 40000
In this example, we read data from sample_data.csv
and print the DataFrame.
Example 2: Skipping Rows and Setting Index Column
For this example, let's use the same csv file used in the first example (with comma as delimiter) .
import pandas as pd
# skip the first row and set the first column as the index
df = pd.read_csv('sample_data.csv', skiprows=1, index_col=0)
print(df)
Output
101 John Doe Marketing Manager 50000 102 Jane Smith Sales Associate 35000 103 Michael Johnson Finance Analyst 45000 104 Emily Williams HR Coordinator 40000
Here, we skipped the first row, so the second row is automatically inferred to be the header. Also, we used the first column to be the index using index_col=0
.
Example 3: Reading Selected Columns with Data Types
For this example, let's use the same file sample_data.csv
.
import pandas as pd
# read specific columns and set their data types
df = pd.read_csv('sample_data.csv', usecols=['First Name', 'Salary'], dtype={'First Name': str, 'Salary': float})
print(df)
Output
First Name Salary 0 John 50000.0 1 Jane 35000.0 2 Michael 45000.0 3 Emily 40000.0
This example reads only the First Name
and Salary
columns from the file and sets the data type for each column.
Note: When working with large CSV files, you might want to consider parameters such as chunksize
for reading the file in chunks, or an iterator
to read the file piece by piece.
Example 4: Specifying Delimiter and Column Names
For this example, let's suppose that sample_data.csv
has the following content:
Employee ID;First Name;Last Name;Department;Position;Salary
101;John;Doe;Marketing;Manager;50000
102;Jane;Smith;Sales;Associate;35000
103;Michael;Johnson;Finance;Analyst;45000
104;Emily;Williams;HR;Coordinator;40000
Notice the use of ;
as the delimiter. Now, let's read the CSV file separated by a delimiter.
import pandas as pd
# specify a delimiter and column names
df = pd.read_csv('sample_data.csv', delimiter=';', names=['ID', 'Name', 'Surname', 'Dept', 'Position', 'Salary'], header=0)
print(df)
Output
ID Name Surname Dept Position Salary 0 101 John Doe Marketing Manager 50000 1 102 Jane Smith Sales Associate 35000 2 103 Michael Johnson Finance Analyst 45000 3 104 Emily Williams HR Coordinator 40000
In this example, we specified the delimiter to be ;
. We also specified the column names manually using the names
argument.
Here, the header=0
argument indicates that row 0 is the header.