Numerical & Scientific Computing with Python: Tutorial on Time Series

Python, Pandas and Time Series

Introduction

Time Series Pandas

Our next chapter of our Pandas Tutorial deals with time series. A time series is a series of data points, which are listed (or indexed) in time order. Usually, a time series is a sequence of values, which are equally spaced points in time. Everything which consists of measured data connected with the corresponding time can be seen as a time series. Measurements can be taken irregularly, but in most cases time series consist of fixed frequencies. This means that data is measured or taken in a regular pattern, i.e. for example every 5 milliseconds, every 10 seconds, or very hour. Often time series are plotted as line charts.

In this chapter of our tutorial on Python with Pandas, we will introduce the tools from Pandas dealing with time series. You will learn how to cope with large time series and how modify time series.

Before you continue reading it might be useful to go through our tutorial on the standard Python modules dealing with time processing, i.e. datetime, time and calendar:

Time Series in Pandas and Python

We could define a Pandas Series, which is built with an index consisting of time stamps.

import numpy as np
import pandas as pd
from datetime import datetime, timedelta as delta
ndays = 10
start = datetime(2017, 3, 31)
dates = [start - delta(days=x) for x in range(0, ndays)]
values = [25, 50, 15, 67, 70, 9, 28, 30, 32, 12]
ts = pd.Series(values, index=dates)
ts
We received the following output:
2017-03-31    25
2017-03-30    50
2017-03-29    15
2017-03-28    67
2017-03-27    70
2017-03-26     9
2017-03-25    28
2017-03-24    30
2017-03-23    32
2017-03-22    12
dtype: int64
type(ts)
The previous Python code returned the following output:
pandas.core.series.Series
ts.index
This gets us the following output:
DatetimeIndex(['2017-03-31', '2017-03-30', '2017-03-29', '2017-03-28',
               '2017-03-27', '2017-03-26', '2017-03-25', '2017-03-24',
               '2017-03-23', '2017-03-22'],
              dtype='datetime64[ns]', freq=None)
values2 = [32, 54, 18, 61, 72, 19, 21, 33, 29, 17]
ts2 = pd.Series(values2, index=dates)

It is possible to use arithmetic operations on time series like we did with other series. We can for example at our two time series:

ts + ts2
The previous Python code returned the following:
2017-03-31     57
2017-03-30    104
2017-03-29     33
2017-03-28    128
2017-03-27    142
2017-03-26     28
2017-03-25     49
2017-03-24     63
2017-03-23     61
2017-03-22     29
dtype: int64

Arithmetic mean between both Series, i.e. the values of the series:

(ts + ts2) / 2
The previous code returned the following output:
2017-03-31    28.5
2017-03-30    52.0
2017-03-29    16.5
2017-03-28    64.0
2017-03-27    71.0
2017-03-26    14.0
2017-03-25    24.5
2017-03-24    31.5
2017-03-23    30.5
2017-03-22    14.5
dtype: float64

As with other series the indices don't have to be the same.

import pandas as pd
from datetime import datetime, timedelta as delta
ndays = 10
start = datetime(2017, 3, 31)
dates = [start - delta(days=x) for x in range(0, ndays)]
start2 = datetime(2017, 3, 26)
dates2 = [start2 - delta(days=x) for x in range(0, ndays)]
values = [25, 50, 15, 67, 70, 9, 28, 30, 32, 12]
values2 = [32, 54, 18, 61, 72, 19, 21, 33, 29, 17]
ts = pd.Series(values, index=dates)
ts2 = pd.Series(values2, index=dates2)
ts + ts2
The previous Python code returned the following output:
2017-03-17     NaN
2017-03-18     NaN
2017-03-19     NaN
2017-03-20     NaN
2017-03-21     NaN
2017-03-22    84.0
2017-03-23    93.0
2017-03-24    48.0
2017-03-25    82.0
2017-03-26    41.0
2017-03-27     NaN
2017-03-28     NaN
2017-03-29     NaN
2017-03-30     NaN
2017-03-31     NaN
dtype: float64

Create Date Ranges

The date_range method of the pandas module can be used to generate a DatetimeIndex:

import pandas as pd
index = pd.date_range('12/24/1970', '01/03/1971')
index
After having executed the Python code above we received the following output:
DatetimeIndex(['1970-12-24', '1970-12-25', '1970-12-26', '1970-12-27',
               '1970-12-28', '1970-12-29', '1970-12-30', '1970-12-31',
               '1971-01-01', '1971-01-02', '1971-01-03'],
              dtype='datetime64[ns]', freq='D')

We have passed a start and an end date to date_range in our previous example. It is also possible to pass only a start or an end date to the function. In this case, we have to determine the number of periods to generate by setting the keyword parameter 'periods':

index = pd.date_range(start='12/24/1970', periods=4)
print(index)
DatetimeIndex(['1970-12-24', '1970-12-25', '1970-12-26', '1970-12-27'], dtype='datetime64[ns]', freq='D')
index = pd.date_range(end='12/24/1970', periods=3)
print(index)
DatetimeIndex(['1970-12-22', '1970-12-23', '1970-12-24'], dtype='datetime64[ns]', freq='D')

We can also create time frequencies, which consists only of business days for example by setting the keyword parameter 'freq' to the string 'B':

index = pd.date_range('2017-04-07', '2017-04-13', freq="B")
print(index)
DatetimeIndex(['2017-04-07', '2017-04-10', '2017-04-11', '2017-04-12',
               '2017-04-13'],
              dtype='datetime64[ns]', freq='B')

In the following example, we create a time frequency which contains the month ends between two dates. We can see that the year 2016 contained the 29th of February, because it was a leap year:

index = pd.date_range('2016-02-25', '2016-07-02', freq="M")
index
We received the following output:
DatetimeIndex(['2016-02-29', '2016-03-31', '2016-04-30', '2016-05-31',
               '2016-06-30'],
              dtype='datetime64[ns]', freq='M')

Other aliases:

Alias Description
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
BM business month end frequency
MS month start frequency
BMS business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
H hourly frequency
T minutely frequency
S secondly frequency
L milliseonds
U microseconds
In [ ]: