28. Pandas Pivot
By Bernd Klein. Last modified: 21 Feb 2024.
In the context of data manipulation, "pivoting" refers to the transformation of data from one form to another, typically involving the rotation or reorganization of the data around certain key variables. This rotation is akin to the movement around a pivot point in mechanical systems.
In Python, the pandas library provides the pivot function, which operates similarly to pivot tables in spreadsheet software. It allows users to reshape data frames by specifying index, column, and value variables, thereby pivoting the data around those variables. In other words: The pivot function is used to reshape a given DataFrame into a different shape. Pivot has three parameters:
The pivot function generally takes three main parameters:
-
index
: This parameter specifies the column whose unique values will be used as new index labels for the resulting DataFrame. These index values will represent the rows of the new DataFrame. -
columns
: Here, you specify the column whose unique values will be used as the new column headers for the resulting DataFrame. These column values will represent the columns of the new DataFrame. -
values
: This parameter indicates the column whose values will populate the cells of the new DataFrame. These are the actual data values that you want to reshape and organize according to the specified index and columns.
Additionally, there is an optional parameter:
aggfunc
: This parameter is used to specify how to aggregate duplicate values if there are multiple values for the same index/column pair. It could be a function (such as mean, sum, etc.) or a dictionary mapping columns to functions.
A call to pivot returns a reshaped DataFrame. The organization of the new DataFrame is done by setting the three parameters accordingly:
The values for these parameters correspond to column names or index of the DataFrame wich is to be reshaped.
We will define now a DataFrame with for columns which we will use to demonstrate the way of working of the pivot
method:
import pandas as pd
# Sample data
data = {
'Color': ['red', 'green', 'blue', 'red', 'green', 'blue'],
'Number': ['one', 'two', 'one', 'two', 'one', 'two'],
'Time': [345, 325, 898, 989, 23, 143],
'Point': [1, 2, 3, 4, 5, 6]
}
# Create DataFrame
df = pd.DataFrame(data)
df
Color | Number | Time | Point | |
---|---|---|---|---|
0 | red | one | 345 | 1 |
1 | green | two | 325 | 2 |
2 | blue | one | 898 | 3 |
3 | red | two | 989 | 4 |
4 | green | one | 23 | 5 |
5 | blue | two | 143 | 6 |
We can choose no one of the columns as the new index and another columns as the new column names. We use A
as the index and B
for the new column names in the following examples.
If no argument is given for the third parameter values
, pivot will automatically use all the remaining columns for the dat and the result will have hierarchically indexed columns.
df2 = df.pivot(index='Color',
columns='Number')
df2
Time | Point | |||
---|---|---|---|---|
Number | one | two | one | two |
Color | ||||
blue | 898 | 143 | 3 | 6 |
green | 23 | 325 | 5 | 2 |
red | 345 | 989 | 1 | 4 |
The above call corresponds to the following one, in which we provide a list of value columns:
df3 = df.pivot(index='Color',
columns='Number',
values=['Time', 'Point'])
df3
Time | Point | |||
---|---|---|---|---|
Number | one | two | one | two |
Color | ||||
blue | 898 | 143 | 3 | 6 |
green | 23 | 325 | 5 | 2 |
red | 345 | 989 | 1 | 4 |
In the next call, we just use one column for the values:
df3 = df.pivot(index='Color',
columns='Number',
values=['Time'])
df3
Time | ||
---|---|---|
Number | one | two |
Color | ||
blue | 898 | 143 |
green | 23 | 325 |
red | 345 | 989 |
Another Example:
import pandas as pd
data = {
'Product': ['Fancy Chair', 'Fancy Chair', 'Luxury Sofa', 'Designer Table', 'Luxury Sofa'],
'Color': ['Blue', 'Green', 'Blue', 'Green', 'Red'],
'Customer Price': [2345.89, 2390.50, 1820.00, 3100.00, 2750.00],
'Non-Customer Price': [2445.89, 2495.50, 1980.00, 3400.00, 2850.00]
}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
OUTPUT:
Product Color Customer Price Non-Customer Price 0 Fancy Chair Blue 2345.89 2445.89 1 Fancy Chair Green 2390.50 2495.50 2 Luxury Sofa Blue 1820.00 1980.00 3 Designer Table Green 3100.00 3400.00 4 Luxury Sofa Red 2750.00 2850.00
df2 = df.pivot(index='Product',
columns='Color',
values='Customer Price')
print(df2)
OUTPUT:
Color Blue Green Red Product Designer Table NaN 3100.0 NaN Fancy Chair 2345.89 2390.5 NaN Luxury Sofa 1820.00 NaN 2750.0
df3 = df.pivot(index='Product',
columns='Color',
values=['Customer Price', 'Non-Customer Price'])
print(df3)
OUTPUT:
Customer Price Non-Customer Price \ Color Blue Green Red Blue Green Product Designer Table NaN 3100.0 NaN NaN 3400.0 Fancy Chair 2345.89 2390.5 NaN 2445.89 2495.5 Luxury Sofa 1820.00 NaN 2750.0 1980.00 NaN Color Red Product Designer Table NaN Fancy Chair NaN Luxury Sofa 2850.0
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses