Reshaping Pandas Dataframes using melt() And pivot()

In data analysis and manipulation, reshaping data is a common task. Pandas, a powerful data manipulation library in Python, provides efficient and intuitive methods to reshape data. Two of the most useful functions for reshaping are melt() and pivot()(unmelt). In this blog, we will explore how to use these functions to melt and unmelt dataframes, providing detailed examples along the way.

Melt and Pivot

Before diving into the details, let’s clarify what melting and pivoting dataframes mean:

  • Melting a dataframe refers to transforming it from a wide format to a long format. This means converting columns into rows, making the dataframe longer and narrower.
  • Pivoting (or unmelting) a dataframe is the reverse operation, where we transform it from a long format back to a wide format, converting rows into columns.

The Melt Function

The melt() function in Pandas is used to convert a wide dataframe into a long dataframe. This is particularly useful when you have data spread across multiple columns that you want to stack into a single column.

Syntax
Markdown
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
Parameters
  • frame: The dataframe to melt.
  • id_vars: Columns to keep as identifier variables.
  • value_vars: Columns to unpivot. If not specified, uses all columns not set as id_vars.
  • var_name: Name to use for the variable column.
  • value_name: Name to use for the value column.
  • col_level: If columns are a MultiIndex, this is the level to melt.
  • ignore_index: If True, the original index is ignored.
Example

Consider a simple dataframe that contains an ID column and two columns representing different variables, A and B:

Python
import pandas as pd

data = {
    'id': [1, 2, 3],
    'A': [10, 15, 20],
    'B': [5, 10, 15]
}

df = pd.DataFrame(data)
print(df)

This dataframe looks like this:

Markdown
   id   A   B
0   1  10   5
1   2  15  10
2   3  20  15

The melt() function is used to transform this wide dataframe into a long format. The id column is specified as the identifier variable, while columns A and B are to be melted into two columns: one for the variable names (variable) and one for the values (value):

Python
melted_df = pd.melt(df, id_vars=['id'], value_vars=['A', 'B'], var_name='variable', value_name='value')
print(melted_df)

The resulting dataframe is:

Markdown
   id variable  value
0   1        A     10
1   2        A     15
2   3        A     20
3   1        B      5
4   2        B     10
5   3        B     15

Here, the columns A and B have been stacked into a single variable column, with their corresponding values in the value column. Each row now uniquely identifies a combination of id and variable.

The Pivot Function

The pivot() function is used to reshape data (produce a “pivot” table) based on column values. It is the inverse of melt(), converting a long dataframe back to a wide format.

Syntax
Python
DataFrame.pivot(index=None, columns=None, values=None)
Parameters
  • index: Column(s) to use to make new frame’s index. If None, uses existing index.
  • columns: Column(s) to use to make new frame’s columns.
  • values: Column(s) to use for populating new frame’s values.
Example

To illustrate the reverse process, let’s use the pivot() function to transform the melted dataframe back to its original wide format. The pivot() function is called on the melted dataframe, specifying id as the index, variable as the columns, and value as the values to populate the new dataframe:

Python
unmelted_df = melted_df.pivot(index='id', columns='variable', values='value')
print(unmelted_df)

The output is:

Markdown
variable   A   B
id              
1         10   5
2         15  10
3         20  15

To match the original format exactly, we can reset the index:

Python
unmelted_df = unmelted_df.reset_index()
print(unmelted_df)

Output:

Markdown
variable   id   A   B
0           1  10   5
1           2  15  10
2           3  20  15

Now, the dataframe is back to its original wide format, with each row representing a unique id and the values of A and B in separate columns.

Practical Use Case

Consider a scenario where you have data on multiple products across different months, and you want to reshape this data for analysis.

Original Data
Python
data = {
    'Product': ['A', 'B', 'C'],
    'Jan': [30, 20, 50],
    'Feb': [35, 25, 55],
    'Mar': [40, 30, 60]
}

df = pd.DataFrame(data)
print(df)

This dataframe represents sales figures for three products (A, B, and C) across three months (January, February, and March):

Markdown
  Product  Jan  Feb  Mar
0       A   30   35   40
1       B   20   25   30
2       C   50   55   60
Melting the Data

To analyze this data more effectively, you might want to convert it to a long format where each row represents the sales of a single product in a specific month. This is done using the melt() function:

Python
melted_df = pd.melt(df, id_vars=['Product'], var_name='Month', value_name='Sales')
print(melted_df)

The melted dataframe now looks like this:

Markdown
  Product Month  Sales
0       A   Jan     30
1       B   Jan     20
2       C   Jan     50
3       A   Feb     35
4       B   Feb     25
5       C   Feb     55
6       A   Mar     40
7       B   Mar     30
8       C   Mar     60

Here, the data is now in a long format, with separate columns for Product, Month, and Sales. Each row uniquely identifies the sales of a product in a specific month.

Pivoting the Data

If you need to convert this long format data back to its original wide format, use the pivot() function:

Python
pivoted_df = melted_df.pivot(index='Product', columns='Month', values='Sales')
print(pivoted_df)

Output:

Markdown
Month    Jan  Feb  Mar
Product               
A         30   35   40
B         20   25   30
C         50   55   60

By resetting the index, you can get the dataframe back to its original form:

Python
pivoted_df = pivoted_df.reset_index()
print(pivoted_df)

Output:

Markdown
Month Product  Jan  Feb  Mar
0           A   30   35   40
1           B   20   25   30
2           C   50   55   60
Conclusion

The melt() and pivot() functions in Pandas provide powerful ways to reshape your data, making it easier to analyze and visualize. Understanding how to use these functions effectively can significantly enhance your data manipulation capabilities. Whether you need to convert a wide dataframe to a long format or vice versa, these tools have you covered.

Happy coding!

Also Explore:

Leave a Comment