The to_datetime()
method in Pandas is used to convert various types of date formats into a standardized datetime format.
Example
import pandas as pd
# create a Series with date strings in 'YYYYMMDD' format
date_series = pd.Series(['20200101', '20200201', '20200301'])
# convert string dates to datetime objects using pd.to_datetime
converted_dates = pd.to_datetime(date_series, format='%Y%m%d')
print(converted_dates)
'''
Output
0 2020-01-01
1 2020-02-01
2 2020-03-01
dtype: datetime64[ns]
'''
to_datetime() Syntax
The syntax of the to_datetime()
method in Pandas is:
Pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None)
to_datetime() Arguments
The to_datetime()
method takes following arguments:
arg
- an object to convert to a datetimeerrors
(optional) - specifies how to handle errors for unparsable datesdayfirst
(optional) - ifTrue
, parses dates with the day firstyearfirst
(optional) - ifTrue
, parses dates with the year firstutc
(optional) - ifTrue
, returns a UTC DatetimeIndexformat
(optional) - string format to parse the dateunit
(optional) - the unit of thearg
for epoch times
to_datetime() Return Value
The to_datetime()
method returns a datetime object.
Example 1: Convert String Dates to Datetime Objects
import pandas as pd
# create a Series with date strings in 'YYYYMMDD' format
date_series = pd.Series(['20201010', '20201111', '20201212'])
# convert string dates to datetime objects using pd.to_datetime
converted_dates = pd.to_datetime(date_series)
print(converted_dates)
Output
0 2020-10-10 1 2020-11-11 2 2020-12-12 dtype: datetime64[ns]
In the above example, we have used the pd.to_datetime()
method to convert string dates into datetime objects.
The resulting datetime objects are then printed, showing the converted dates.
Example 2: Handle Date Parsing Errors with to_datetime()
import pandas as pd
# create a Series with some valid and some invalid date strings
date_series = pd.Series(['20200101', 'invalid date', '20200301', 'another invalid'])
# use 'coerce' in the errors argument to handle invalid dates
converted_dates = pd.to_datetime(date_series, format='%Y%m%d', errors='coerce')
print(converted_dates)
Output
0 2020-01-01 1 NaT 2 2020-03-01 3 NaT dtype: datetime64[ns]
In this example, date_series includes both valid dates in the YYYYMMDD
format and strings that are not valid dates.
Then we used pd.to_datetime()
with errors='coerce'
. This ensures that instead of raising an error for the invalid dates, Pandas converts them to NaT
.
The result is a Series where valid dates are correctly parsed, and invalid dates are represented as NaT
.
Example 3: Use of dayfirst and yearfirst in to_datetime()
import pandas as pd
# create a Series with ambiguous date strings
date_series = pd.Series(['01-02-2020', '03-04-2021', '05-06-2022'])
# parse dates with dayfirst=True
dates_dayfirst = pd.to_datetime(date_series, dayfirst=True)
# parse dates with yearfirst=True
dates_yearfirst = pd.to_datetime(date_series, yearfirst=True)
print("Dates with dayfirst=True:\n", dates_dayfirst)
print("\nDates with yearfirst=True:\n", dates_yearfirst)
Output
Dates with dayfirst=True: 0 2020-02-01 1 2021-04-03 2 2022-06-05 dtype: datetime64[ns] Dates with yearfirst=True: 0 2020-01-02 1 2021-03-04 2 2022-05-06 dtype: datetime64[ns]
Here, with
dayfirst=True
, which tells Pandas to interpret the first number as the day, and onceyearfirst=True
, which tells Pandas to interpret the first number as the year
Example 4: Convert datetime to UTC (Coordinated Universal Time)
import pandas as pd
# create a Series with date strings
date_series = pd.Series(['2021-01-01 12:00:00', '2021-06-01 15:30:00', '2021-12-31 23:59:59'])
# convert string dates to UTC datetime objects using pd.to_datetime
converted_dates_utc = pd.to_datetime(date_series, utc=True)
print(converted_dates_utc)
Output
0 2021-01-01 12:00:00+00:00 1 2021-06-01 15:30:00+00:00 2 2021-12-31 23:59:59+00:00 dtype: datetime64[ns, UTC]
In this example, we used the pd.to_datetime()
method to convert these string dates into datetime objects.
The utc
parameter is set to True
to convert the dates into UTC timezone.
Example 5: Use of unit Argument in to_datetime()
The unit
argument in to_datetime()
method specifies the time unit for epoch time conversions.
The common units include D
(days), s
(seconds), ms
(milliseconds), us
(microseconds), and ns
(nanoseconds).
Let's look at an example.
import pandas as pd
# create a Series with epoch times (in seconds)
epoch_series = pd.Series([1609459200, 1612137600, 1614556800])
# convert the epoch times to datetime objects
# the 'unit' argument is set to 's' for seconds
converted_dates = pd.to_datetime(epoch_series, unit='s')
print(converted_dates)
Output
0 2021-01-01 1 2021-02-01 2 2021-03-01 dtype: datetime64[ns]
Here, epoch_series is the Series containing epoch times. These are Unix timestamps representing the number of seconds since January 1, 1970.
pd.to_datetime()
is used with the unit
argument set to s
to indicate that the input numbers are in seconds.
The method converts these epoch times to standard datetime objects, which are then printed.
Example 6: to_datetime() With Custom Format
import pandas as pd
# create a dataframe with date strings in custom format
df = pd.DataFrame({'date': ['2021/22/01', '2022/13/01', '2023/30/03']})
# convert the 'date' column to datetime with custom format
df['date'] = pd.to_datetime(df['date'], format='%Y/%d/%m')
print(df)
Output
date 0 2021-01-22 1 2022-01-13 2 2023-03-30
In this example, we converted the date column from string (in YY/DD/MM
format) to DateTime data type.