Pandas Tutorial: Multi-level Indexing

Introduction

Multi Level Indexing

We learned the basic concepts of Pandas in our previous chapter of our tutorial on Pandas. We introduced the data structures

We also learned how to create and manipulate the Series and DataFrame objects in numerous Python programs.

Now it is time to learn some further aspects of theses data structures in this chapter of our tutorial.

We will start with advanced indexing possibilities in Pandas.

Advanced or Multi-Level Indexing

Advanced or multi-level indexing is available both for Series and for DataFrames. It is a fascinating way of working with higher dimensional data, using Pandas data structures. It's an efficient way to store and manipulate arbitrarily high dimension data in 1-dimensional (Series) and 2-dimensional tabular (DataFrame) structures. In other words, we can work with higher dimensional data in lower dimensions. It's time to present an example in Python:

import pandas as pd
cities = ["Vienna", "Vienna", "Vienna",
          "Hamburg", "Hamburg", "Hamburg",
          "Berlin", "Berlin", "Berlin",
          "Zürich", "Zürich", "Zürich"]
index = [cities, ["country", "area", "population",
                  "country", "area", "population",
                  "country", "area", "population",
                  "country", "area", "population"]]
print(index)
[['Vienna', 'Vienna', 'Vienna', 'Hamburg', 'Hamburg', 'Hamburg', 'Berlin', 'Berlin', 'Berlin', 'Zürich', 'Zürich', 'Zürich'], ['country', 'area', 'population', 'country', 'area', 'population', 'country', 'area', 'population', 'country', 'area', 'population']]
data = ["Austria", 414.60,    1805681,
        "Germany", 755.00,    1760433,
        "Germany", 891.85,    3562166,
        "Switzerland", 87.88, 378884]
city_series = pd.Series(data, index=index)
print(city_series)
Vienna   country           Austria
         area                414.6
         population        1805681
Hamburg  country           Germany
         area                  755
         population        1760433
Berlin   country           Germany
         area               891.85
         population        3562166
Zürich   country       Switzerland
         area                87.88
         population         378884
dtype: object

We can access the data of a city in the following way:

print(city_series["Vienna"])
country       Austria
area            414.6
population    1805681
dtype: object

We can also access the information about the country, area or population of a city. We can do this in two ways:

print(city_series["Vienna"]["area"])
414.6

The other way to accomplish it:

print(city_series["Vienna", "area"])
414.6

We can also get the content of multiple cities at the same time by using a list of city names as the key:

city_series["Hamburg",:]
The previous Python code returned the following:
country       Germany
area              755
population    1760433
dtype: object

If the index is sorted, we can also apply a slicing operation:

city_series = city_series.sort_index()
print("city_series with sorted index:")
print(city_series)
print("\n\nSlicing the city_series:")
city_series["Berlin":"Vienna"]
city_series with sorted index:
Berlin   area               891.85
         country           Germany
         population        3562166
Hamburg  area                  755
         country           Germany
         population        1760433
Vienna   area                414.6
         country           Austria
         population        1805681
Zürich   area                87.88
         country       Switzerland
         population         378884
dtype: object
Slicing the city_series:
This gets us the following output:
Berlin   area           891.85
         country       Germany
         population    3562166
Hamburg  area              755
         country       Germany
         population    1760433
Vienna   area            414.6
         country       Austria
         population    1805681
dtype: object

In the next example, we show that it is possible to access the inner keys as well:

print(city_series[:, "area"])
Berlin     891.85
Hamburg       755
Vienna      414.6
Zürich      87.88
dtype: object

Swapping MultiIndex Levels

It is possible to swap the levels of a MultiIndex with the method swaplevel:

swaplevel(self, i=-2, j=-1, copy=True) Swap levels i and j in a MultiIndex

Parameters
----------
i, j : int, string (can be mixed)
       Level of index to be swapped. Can pass level name as string.
       The indexes 'i' and 'j' are optional, and default to
       the two innermost levels of the index
Returns
-------
swapped : Series
city_series = city_series.swaplevel()
city_series.sort_index(inplace=True)
city_series
This gets us the following output:
area        Berlin          891.85
            Hamburg            755
            Vienna           414.6
            Zürich           87.88
country     Berlin         Germany
            Hamburg        Germany
            Vienna         Austria
            Zürich     Switzerland
population  Berlin         3562166
            Hamburg        1760433
            Vienna         1805681
            Zürich          378884
dtype: object