python-course.eu

28. Pandas Pivot

By Bernd Klein. Last modified: 26 Apr 2023.

The pivot function is used to reshape a given DataFrame into a different shape. Pivot is has three parameters:

pivot(self, index=None, columns=None, values=None) -> 'DataFrame'

pivot

A call to pivot returns a reshaped DataFrame. The organization of the new DataFrame is done by setting the three parameters accordingly:

The values for these parameters correspond to column names or index of the DataFrame wich is to be reshaped.

We will define now a DataFrame with for columns which we will use to demonstrate the way of working of the pivot method:

import pandas as pd

d = {'A': ['kırmızı', 'yeşil', 'mavi', 'kırmızı', 'yeşil', 'mavi'],
     'B': ['bir', 'iki', 'bir',  'iki',  'bir', 'iki'],
     'C': [345, 325, 898, 989, 23, 143],
     'D': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(d)
df
A B C D
0 kırmızı bir 345 1
1 yeşil iki 325 2
2 mavi bir 898 3
3 kırmızı iki 989 4
4 yeşil bir 23 5
5 mavi iki 143 6

We can choose no one of the columns as the new index and another columns as the new column names. We use A as the index and B for the new column names in the following examples. If no argument is given for the third parameter values, pivot will automatically use all the remaining columns for the dat and the result will have hierarchically indexed columns.

df2 = df.pivot(index='A',
               columns='B')

df2
C D
B bir iki bir iki
A
kırmızı 345 989 1 4
mavi 898 143 3 6
yeşil 23 325 5 2

The above call corresponds to the following one, in which we provide a list of value columns:

df2 = df.pivot(index='A',
               columns='B',
               values=['C', 'D'])

In the next call, we just use one column for the values:

df3 = df.pivot(index='A',
               columns='B',
               values='C')

df3
B bir iki
A
kırmızı 345 989
mavi 898 143
yeşil 23 325

Another Example:

d = {"products": ["product1", "product1", "product2", "product3"],
     "colours": ["blue", "green", "blue", "green"],
     "customer_price": [2345.89, 2390.50, 1820.00, 3100.00],
     "non_customer_price": [2445.89, 2495.50, 1980.00, 3400.00]}

df = pd.DataFrame(d)
print(df)

OUTPUT:

   products colours  customer_price  non_customer_price
0  product1    blue         2345.89             2445.89
1  product1   green         2390.50             2495.50
2  product2    blue         1820.00             1980.00
3  product3   green         3100.00             3400.00
df2 = df.pivot(index='products',
              columns='colours',
              values='customer_price')

print(df2)

OUTPUT:

colours      blue   green
products                 
product1  2345.89  2390.5
product2  1820.00     NaN
product3      NaN  3100.0
df3 = df.pivot(index='products', 
               columns='colours', 
               values=['customer_price', 'non_customer_price'])

print(df3)

OUTPUT:

         customer_price         non_customer_price        
colours            blue   green               blue   green
products                                                  
product1        2345.89  2390.5            2445.89  2495.5
product2        1820.00     NaN            1980.00     NaN
product3            NaN  3100.0                NaN  3400.0
help(pd.DataFrame.pivot)

OUTPUT:

Help on function pivot in module pandas.core.frame:

pivot(self, index=None, columns=None, values=None) -> 'DataFrame'
    Return reshaped DataFrame organized by given index / column values.
    
    Reshape data (produce a "pivot" table) based on column values. Uses
    unique values from specified `index` / `columns` to form axes of the
    resulting DataFrame. This function does not support data
    aggregation, multiple values will result in a MultiIndex in the
    columns. See the :ref:`User Guide <reshaping>` for more on reshaping.
    
    Parameters
    ----------
    index : str or object or a list of str, optional
        Column to use to make new frame's index. If None, uses
        existing index.
    
        .. versionchanged:: 1.1.0
           Also accept list of index names.
    
    columns : str or object or a list of str
        Column to use to make new frame's columns.
    
        .. versionchanged:: 1.1.0
           Also accept list of columns names.
    
    values : str, object or a list of the previous, optional
        Column(s) to use for populating new frame's values. If not
        specified, all remaining columns will be used and the result will
        have hierarchically indexed columns.
    
    Returns
    -------
    DataFrame
        Returns reshaped DataFrame.
    
    Raises
    ------
    ValueError:
        When there are any `index`, `columns` combinations with multiple
        values. `DataFrame.pivot_table` when you need to aggregate.
    
    See Also
    --------
    DataFrame.pivot_table : Generalization of pivot that can handle
        duplicate values for one index/column pair.
    DataFrame.unstack : Pivot based on the index values instead of a
        column.
    wide_to_long : Wide panel to long format. Less flexible but more
        user-friendly than melt.
    
    Notes
    -----
    For finer-tuned control, see hierarchical indexing documentation along
    with the related stack/unstack methods.
    
    Examples
    --------
    >>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
    ...                            'two'],
    ...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
    ...                    'baz': [1, 2, 3, 4, 5, 6],
    ...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
    >>> df
        foo   bar  baz  zoo
    0   one   A    1    x
    1   one   B    2    y
    2   one   C    3    z
    3   two   A    4    q
    4   two   B    5    w
    5   two   C    6    t
    
    >>> df.pivot(index='foo', columns='bar', values='baz')
    bar  A   B   C
    foo
    one  1   2   3
    two  4   5   6
    
    >>> df.pivot(index='foo', columns='bar')['baz']
    bar  A   B   C
    foo
    one  1   2   3
    two  4   5   6
    
    >>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
          baz       zoo
    bar   A  B  C   A  B  C
    foo
    one   1  2  3   x  y  z
    two   4  5  6   q  w  t
    
    You could also assign a list of column names or a list of index names.
    
    >>> df = pd.DataFrame({
    ...        "lev1": [1, 1, 1, 2, 2, 2],
    ...        "lev2": [1, 1, 2, 1, 1, 2],
    ...        "lev3": [1, 2, 1, 2, 1, 2],
    ...        "lev4": [1, 2, 3, 4, 5, 6],
    ...        "values": [0, 1, 2, 3, 4, 5]})
    >>> df
        lev1 lev2 lev3 lev4 values
    0   1    1    1    1    0
    1   1    1    2    2    1
    2   1    2    1    3    2
    3   2    1    2    4    3
    4   2    1    1    5    4
    5   2    2    2    6    5
    
    >>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
    lev2    1         2
    lev3    1    2    1    2
    lev1
    1     0.0  1.0  2.0  NaN
    2     4.0  3.0  NaN  5.0
    
    >>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
          lev3    1    2
    lev1  lev2
       1     1  0.0  1.0
             2  2.0  NaN
       2     1  4.0  3.0
             2  NaN  5.0
    
    A ValueError is raised if there are any duplicates.
    
    >>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
    ...                    "bar": ['A', 'A', 'B', 'C'],
    ...                    "baz": [1, 2, 3, 4]})
    >>> df
       foo bar  baz
    0  one   A    1
    1  one   A    2
    2  two   B    3
    3  two   C    4
    
    Notice that the first two rows are the same for our `index`
    and `columns` arguments.
    
    >>> df.pivot(index='foo', columns='bar', values='baz')
    Traceback (most recent call last):
       ...
    ValueError: Index contains duplicate entries, cannot reshape

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See: Live Python courses overview

Upcoming online Courses

Enrol here