Numerical & Scientific Computing with Python: Pandas Tutorial Continuation

Pandas Tutorial Continuation

Introduction

More On Pandas Image

We learned the basic concepts of Pandas in our previous chapter of our tutorial on Pandas. We introduced the data structures

We also learned how to create and manipulate the Series and DataFrame objects.

We will learn some further aspects of theses data structures in this chapter.

We will start with advanced indexing possibilities in Pandas.

Advanced or Multi-Level Indexing

Advanced or multi-level indexing is available both for Series and for DataFrames. It is a fascinating way of working with higher dimensional data, using Pandas data structures. It's an efficient way to store and manipulate arbitrarily high dimension data in 1-dimensional (Series) and 2-dimensional tabular (DataFrame) structures. In other words, we can work with higher dimensional data in lower dimensions. It's time to present an example:

import pandas as pd
cities = ["Vienna", "Vienna", "Vienna",
          "Hamburg", "Hamburg", "Hamburg",
          "Berlin", "Berlin", "Berlin",
          "Zürich", "Zürich", "Zürich"]
index = [cities, ["country", "area", "population",
                  "country", "area", "population",
                  "country", "area", "population",
                  "country", "area", "population"]]
print(index)
[['Vienna', 'Vienna', 'Vienna', 'Hamburg', 'Hamburg', 'Hamburg', 'Berlin', 'Berlin', 'Berlin', 'Zürich', 'Zürich', 'Zürich'], ['country', 'area', 'population', 'country', 'area', 'population', 'country', 'area', 'population', 'country', 'area', 'population']]
data = ["Austria", 414.60,     1805681,
        "Germany",   755.00,     1760433,
        "Germany",   891.85,     3562166,
        "Switzerland", 87.88, 378884]
city_series = pd.Series(data, index=index)
print(city_series)
Vienna   country           Austria
         area                414.6
         population        1805681
Hamburg  country           Germany
         area                  755
         population        1760433
Berlin   country           Germany
         area               891.85
         population        3562166
Zürich   country       Switzerland
         area                87.88
         population         378884
dtype: object

We can access the data of a city in the following way:

print(city_series["Vienna"])
country       Austria
area            414.6
population    1805681
dtype: object

We can also access the information about the country, area or population of a city. We can do this in two ways:

print(city_series["Vienna"]["area"])
414.6

The other way to accomplish it:

print(city_series["Vienna", "area"])
414.6

We can also get the content of multiple cities at the same time by using a list of city names as the key:

print(city_series[["Hamburg", "Berlin"]])
Hamburg  country       Germany
         area              755
         population    1760433
Berlin   country       Germany
         area           891.85
         population    3562166
dtype: object

If the index is sorted, we can also apply a slicing operation:

city_series = city_series.sort_index()
print("city_series with sorted index:")
print(city_series)
print("\n\nSlicing the city_series:")
print(city_series["Berlin":"Vienna"])
city_series with sorted index:
Berlin   area               891.85
         country           Germany
         population        3562166
Hamburg  area                  755
         country           Germany
         population        1760433
Vienna   area                414.6
         country           Austria
         population        1805681
Zürich   area                87.88
         country       Switzerland
         population         378884
dtype: object
Slicing the city_series:
Berlin   area           891.85
         country       Germany
         population    3562166
Hamburg  area              755
         country       Germany
         population    1760433
Vienna   area            414.6
         country       Austria
         population    1805681
dtype: object

In the next example, we show that it is possible to access the inner keys as well:

print(city_series[:, "area"])
Berlin     891.85
Hamburg       755
Vienna      414.6
Zürich      87.88
dtype: object

Swapping MultiIndex Levels

It is possible to swap the levels of a MultiIndex with the method swaplevel:

swaplevel(self, i=-2, j=-1, copy=True) Swap levels i and j in a MultiIndex

Parameters
----------
i, j : int, string (can be mixed)
       Level of index to be swapped. Can pass level name as string.
       The indexes 'i' and 'j' are optional, and default to
       the two innermost levels of the index
Returns
-------
swapped : Series

</code>

city_series.swaplevel()
After having executed the Python code above we received the following result:
area        Berlin          891.85
country     Berlin         Germany
population  Berlin         3562166
area        Hamburg            755
country     Hamburg        Germany
population  Hamburg        1760433
area        Vienna           414.6
country     Vienna         Austria
population  Vienna         1805681
area        Zürich           87.88
country     Zürich     Switzerland
population  Zürich          378884
dtype: object