28. Pandas Pivot
By Bernd Klein. Last modified: 26 Apr 2023.
The pivot function is used to reshape a given DataFrame into a different shape. Pivot is has three parameters:
pivot(self, index=None, columns=None, values=None) -> 'DataFrame'
A call to pivot returns a reshaped DataFrame. The organization of the new DataFrame is done by setting the three parameters accordingly:
The values for these parameters correspond to column names or index of the DataFrame wich is to be reshaped.
We will define now a DataFrame with for columns which we will use to demonstrate the way of working of the pivot
method:
import pandas as pd
d = {'A': ['kırmızı', 'yeşil', 'mavi', 'kırmızı', 'yeşil', 'mavi'],
'B': ['bir', 'iki', 'bir', 'iki', 'bir', 'iki'],
'C': [345, 325, 898, 989, 23, 143],
'D': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(d)
df
A | B | C | D | |
---|---|---|---|---|
0 | kırmızı | bir | 345 | 1 |
1 | yeşil | iki | 325 | 2 |
2 | mavi | bir | 898 | 3 |
3 | kırmızı | iki | 989 | 4 |
4 | yeşil | bir | 23 | 5 |
5 | mavi | iki | 143 | 6 |
We can choose no one of the columns as the new index and another columns as the new column names. We use A
as the index and B
for the new column names in the following examples.
If no argument is given for the third parameter values
, pivot will automatically use all the remaining columns for the dat and the result will have hierarchically indexed columns.
df2 = df.pivot(index='A',
columns='B')
df2
C | D | |||
---|---|---|---|---|
B | bir | iki | bir | iki |
A | ||||
kırmızı | 345 | 989 | 1 | 4 |
mavi | 898 | 143 | 3 | 6 |
yeşil | 23 | 325 | 5 | 2 |
The above call corresponds to the following one, in which we provide a list of value columns:
df2 = df.pivot(index='A',
columns='B',
values=['C', 'D'])
In the next call, we just use one column for the values:
df3 = df.pivot(index='A',
columns='B',
values='C')
df3
B | bir | iki |
---|---|---|
A | ||
kırmızı | 345 | 989 |
mavi | 898 | 143 |
yeşil | 23 | 325 |
Another Example:
d = {"products": ["product1", "product1", "product2", "product3"],
"colours": ["blue", "green", "blue", "green"],
"customer_price": [2345.89, 2390.50, 1820.00, 3100.00],
"non_customer_price": [2445.89, 2495.50, 1980.00, 3400.00]}
df = pd.DataFrame(d)
print(df)
OUTPUT:
products colours customer_price non_customer_price 0 product1 blue 2345.89 2445.89 1 product1 green 2390.50 2495.50 2 product2 blue 1820.00 1980.00 3 product3 green 3100.00 3400.00
df2 = df.pivot(index='products',
columns='colours',
values='customer_price')
print(df2)
OUTPUT:
colours blue green products product1 2345.89 2390.5 product2 1820.00 NaN product3 NaN 3100.0
df3 = df.pivot(index='products',
columns='colours',
values=['customer_price', 'non_customer_price'])
print(df3)
OUTPUT:
customer_price non_customer_price colours blue green blue green products product1 2345.89 2390.5 2445.89 2495.5 product2 1820.00 NaN 1980.00 NaN product3 NaN 3100.0 NaN 3400.0
help(pd.DataFrame.pivot)
OUTPUT:
Help on function pivot in module pandas.core.frame: pivot(self, index=None, columns=None, values=None) -> 'DataFrame' Return reshaped DataFrame organized by given index / column values. Reshape data (produce a "pivot" table) based on column values. Uses unique values from specified `index` / `columns` to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the :ref:`User Guide <reshaping>` for more on reshaping. Parameters ---------- index : str or object or a list of str, optional Column to use to make new frame's index. If None, uses existing index. .. versionchanged:: 1.1.0 Also accept list of index names. columns : str or object or a list of str Column to use to make new frame's columns. .. versionchanged:: 1.1.0 Also accept list of columns names. values : str, object or a list of the previous, optional Column(s) to use for populating new frame's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns. Returns ------- DataFrame Returns reshaped DataFrame. Raises ------ ValueError: When there are any `index`, `columns` combinations with multiple values. `DataFrame.pivot_table` when you need to aggregate. See Also -------- DataFrame.pivot_table : Generalization of pivot that can handle duplicate values for one index/column pair. DataFrame.unstack : Pivot based on the index values instead of a column. wide_to_long : Wide panel to long format. Less flexible but more user-friendly than melt. Notes ----- For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods. Examples -------- >>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', ... 'two'], ... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], ... 'baz': [1, 2, 3, 4, 5, 6], ... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) >>> df foo bar baz zoo 0 one A 1 x 1 one B 2 y 2 one C 3 z 3 two A 4 q 4 two B 5 w 5 two C 6 t >>> df.pivot(index='foo', columns='bar', values='baz') bar A B C foo one 1 2 3 two 4 5 6 >>> df.pivot(index='foo', columns='bar')['baz'] bar A B C foo one 1 2 3 two 4 5 6 >>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo']) baz zoo bar A B C A B C foo one 1 2 3 x y z two 4 5 6 q w t You could also assign a list of column names or a list of index names. >>> df = pd.DataFrame({ ... "lev1": [1, 1, 1, 2, 2, 2], ... "lev2": [1, 1, 2, 1, 1, 2], ... "lev3": [1, 2, 1, 2, 1, 2], ... "lev4": [1, 2, 3, 4, 5, 6], ... "values": [0, 1, 2, 3, 4, 5]}) >>> df lev1 lev2 lev3 lev4 values 0 1 1 1 1 0 1 1 1 2 2 1 2 1 2 1 3 2 3 2 1 2 4 3 4 2 1 1 5 4 5 2 2 2 6 5 >>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values") lev2 1 2 lev3 1 2 1 2 lev1 1 0.0 1.0 2.0 NaN 2 4.0 3.0 NaN 5.0 >>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values") lev3 1 2 lev1 lev2 1 1 0.0 1.0 2 2.0 NaN 2 1 4.0 3.0 2 NaN 5.0 A ValueError is raised if there are any duplicates. >>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'], ... "bar": ['A', 'A', 'B', 'C'], ... "baz": [1, 2, 3, 4]}) >>> df foo bar baz 0 one A 1 1 one A 2 2 two B 3 3 two C 4 Notice that the first two rows are the same for our `index` and `columns` arguments. >>> df.pivot(index='foo', columns='bar', values='baz') Traceback (most recent call last): ... ValueError: Index contains duplicate entries, cannot reshape
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses