python-course.eu

37. Time Series in Pandas and Python

By Bernd Klein. Last modified: 26 Apr 2023.

Introduction

Our next chapter of our Pandas Tutorial deals with time series. A time series is a series of data points, which are listed (or indexed) in time order. Usually, a time series is a sequence of values, which are equally spaced points in time. Everything which consists of measured data connected with the corresponding time can be seen as a time series. Measurements can be taken irregularly, but in most cases time series consist of fixed frequencies. This means that data is measured or taken in a regular pattern, i.e. for example every 5 milliseconds, every 10 seconds, or very hour. Often time series are plotted as line charts.

Time Series Pandas

In this chapter of our tutorial on Python with Pandas, we will introduce the tools from Pandas dealing with time series. You will learn how to cope with large time series and how modify time series.

Before you continue reading it might be useful to go through our tutorial on the standard Python modules dealing with time processing, i.e. datetime, time and calendar:

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See: Live Python courses overview

Enrol here

Time Series in Pandas and Python

We could define a Pandas Series, which is built with an index consisting of time stamps.

import numpy as np
import pandas as pd

from datetime import datetime, timedelta as delta
ndays = 10
start = datetime(2017, 3, 31)
dates = [start - delta(days=x) for x in range(0, ndays)]

values = [25, 50, 15, 67, 70, 9, 28, 30, 32, 12]

ts = pd.Series(values, index=dates)
ts

OUTPUT:

2017-03-31    25
2017-03-30    50
2017-03-29    15
2017-03-28    67
2017-03-27    70
2017-03-26     9
2017-03-25    28
2017-03-24    30
2017-03-23    32
2017-03-22    12
dtype: int64

Let's check the type of the newly created time series:

type(ts)

OUTPUT:

pandas.core.series.Series

What does the index of a time series look like? Let's see:

ts.index

OUTPUT:

DatetimeIndex(['2017-03-31', '2017-03-30', '2017-03-29', '2017-03-28',
               '2017-03-27', '2017-03-26', '2017-03-25', '2017-03-24',
               '2017-03-23', '2017-03-22'],
              dtype='datetime64[ns]', freq=None)

We will create now another time series:

values2 = [32, 54, 18, 61, 72, 19, 21, 33, 29, 17]

ts2 = pd.Series(values2, index=dates)

It is possible to use arithmetic operations on time series like we did with other series. We can for example add the two previously created time series:

ts + ts2

OUTPUT:

2017-03-31     57
2017-03-30    104
2017-03-29     33
2017-03-28    128
2017-03-27    142
2017-03-26     28
2017-03-25     49
2017-03-24     63
2017-03-23     61
2017-03-22     29
dtype: int64

Arithmetic mean between both Series, i.e. the values of the series:

(ts + ts2) / 2

OUTPUT:

2017-03-31    28.5
2017-03-30    52.0
2017-03-29    16.5
2017-03-28    64.0
2017-03-27    71.0
2017-03-26    14.0
2017-03-25    24.5
2017-03-24    31.5
2017-03-23    30.5
2017-03-22    14.5
dtype: float64

As with other series the indices don't have to be the same.

import pandas as pd

from datetime import datetime, timedelta as delta

ndays = 10

start = datetime(2017, 3, 31)
dates = [start - delta(days=x) for x in range(0, ndays)]

start2 = datetime(2017, 3, 26)
dates2 = [start2 - delta(days=x) for x in range(0, ndays)]

values = [25, 50, 15, 67, 70, 9, 28, 30, 32, 12]
values2 = [32, 54, 18, 61, 72, 19, 21, 33, 29, 17]

ts = pd.Series(values, index=dates)
ts2 = pd.Series(values2, index=dates2)

ts + ts2

OUTPUT:

2017-03-17     NaN
2017-03-18     NaN
2017-03-19     NaN
2017-03-20     NaN
2017-03-21     NaN
2017-03-22    84.0
2017-03-23    93.0
2017-03-24    48.0
2017-03-25    82.0
2017-03-26    41.0
2017-03-27     NaN
2017-03-28     NaN
2017-03-29     NaN
2017-03-30     NaN
2017-03-31     NaN
dtype: float64

Create Date Ranges

The date_range method of the pandas module can be used to generate a DatetimeIndex:

import pandas as pd

index = pd.date_range('12/24/1970', '01/03/1971')
index

OUTPUT:

DatetimeIndex(['1970-12-24', '1970-12-25', '1970-12-26', '1970-12-27',
               '1970-12-28', '1970-12-29', '1970-12-30', '1970-12-31',
               '1971-01-01', '1971-01-02', '1971-01-03'],
              dtype='datetime64[ns]', freq='D')

We have passed a start and an end date to date_range in our previous example. It is also possible to pass only a start or an end date to the function. In this case, we have to determine the number of periods to generate by setting the keyword parameter 'periods':

index = pd.date_range(start='12/24/1970', periods=4)
print(index)

OUTPUT:

DatetimeIndex(['1970-12-24', '1970-12-25', '1970-12-26', '1970-12-27'], dtype='datetime64[ns]', freq='D')
index = pd.date_range(end='12/24/1970', periods=3)
print(index)

OUTPUT:

DatetimeIndex(['1970-12-22', '1970-12-23', '1970-12-24'], dtype='datetime64[ns]', freq='D')

We can also create time frequencies, which consists only of business days for example by setting the keyword parameter 'freq' to the string 'B':

index = pd.date_range('2017-04-07', '2017-04-13', freq="B")
print(index)

OUTPUT:

DatetimeIndex(['2017-04-07', '2017-04-10', '2017-04-11', '2017-04-12',
               '2017-04-13'],
              dtype='datetime64[ns]', freq='B')

In the following example, we create a time frequency which contains the month ends between two dates. We can see that the year 2016 contained the 29th of February, because it was a leap year:

index = pd.date_range('2016-02-25', '2016-07-02', freq="M")
index

OUTPUT:

DatetimeIndex(['2016-02-29', '2016-03-31', '2016-04-30', '2016-05-31',
               '2016-06-30'],
              dtype='datetime64[ns]', freq='M')

Other aliases:

Alias Description
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
BM business month end frequency
MS month start frequency
BMS business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
H hourly frequency
T minutely frequency
S secondly frequency
L milliseonds
U microseconds
index = pd.date_range('2017-02-05', '2017-04-13', freq="W-Mon")
index

OUTPUT:

DatetimeIndex(['2017-02-06', '2017-02-13', '2017-02-20', '2017-02-27',
               '2017-03-06', '2017-03-13', '2017-03-20', '2017-03-27',
               '2017-04-03', '2017-04-10'],
              dtype='datetime64[ns]', freq='W-MON')

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See: Live Python courses overview

Upcoming online Courses

Enrol here