Pandas melt() Function – Explained with examples

Pandas is a powerful library in Python for data manipulation and analysis. One of the essential functions in Pandas is melt(), which is used for transforming a DataFrame from a wide format to a long format. This can be particularly useful for data visualization and analysis, where long format data is often required.

In this blog post, we will explore the pandas.melt() function, its syntax, parameters, and several examples to understand its practical applications.

What is pandas.melt()?

The pandas.melt() function unpivots a DataFrame from a wide format to a long format. In other words, it melts the DataFrame into a format where each row represents a single observation. This transformation is the reverse of a pivot operation.

Syntax
Python
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
Parameters
  • frame: The DataFrame to melt.
  • id_vars: Columns to use as identifier variables.
  • value_vars: Columns to unpivot. If not specified, uses all columns that are not set as id_vars.
  • var_name: Name to use for the ‘variable’ column. If None, uses variable.
  • value_name: Name to use for the ‘value’ column. If None, uses value.
  • col_level: If columns are a MultiIndex, this determines which level is melted.
  • ignore_index: If True, the original index is ignored.
Example 1: Basic Usage

Let’s start with a simple example to understand the basic usage of pandas.melt().

Python
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': ['foo', 'bar', 'baz'],
    'B': [1, 2, 3],
    'C': [4, 5, 6]
})

print("Original DataFrame:")
print(df)

# Melt the DataFrame
melted_df = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])

print("\nMelted DataFrame:")
print(melted_df)

Output:

Markdown
Original DataFrame:
     A  B  C
0  foo  1  4
1  bar  2  5
2  baz  3  6

Melted DataFrame:
     A variable  value
0  foo        B      1
1  bar        B      2
2  baz        B      3
3  foo        C      4
4  bar        C      5
5  baz        C      6

In this example, we start with a simple DataFrame consisting of three columns: ‘A’, ‘B’, and ‘C’. The pandas.melt() function is used to transform this DataFrame from a wide format to a long format by specifying ‘A’ as the identifier variable and ‘B’ and ‘C’ as the value variables. The result is a melted DataFrame where each row represents a unique combination of the identifier and value variables.

Example 2: Customizing ‘var_name’ and ‘value_name’

You can customize the names of the ‘variable’ and ‘value’ columns using the var_name and value_name parameters.

Python
melted_df = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'], var_name='Variable', value_name='Value')

print("\nMelted DataFrame with Custom Names:")
print(melted_df)

Output:

Markdown
Melted DataFrame with Custom Names:
     A Variable  Value
0  foo        B      1
1  bar        B      2
2  baz        B      3
3  foo        C      4
4  bar        C      5
5  baz        C      6

This example demonstrates how to customize the names of the ‘variable’ and ‘value’ columns in the melted DataFrame. By using the var_name and value_name parameters, we change the default column names to ‘Variable’ and ‘Value’, respectively. This can make the melted DataFrame more readable and better suited to the context of the analysis.

Example 3: Melting with Multiple Identifier Variables

You can also melt a DataFrame with multiple identifier variables.

Python
df = pd.DataFrame({
    'ID': [1, 2, 3],
    'Year': [2020, 2021, 2022],
    'Math': [85, 90, 95],
    'Science': [80, 89, 94]
})

print("Original DataFrame:")
print(df)

melted_df = pd.melt(df, id_vars=['ID', 'Year'], value_vars=['Math', 'Science'], var_name='Subject', value_name='Score')

print("\nMelted DataFrame with Multiple Identifier Variables:")
print(melted_df)

Output:

Markdown
Original DataFrame:
   ID  Year  Math  Science
0   1  2020    85       80
1   2  2021    90       89
2   3  2022    95       94

Melted DataFrame with Multiple Identifier Variables:
   ID  Year  Subject  Score
0   1  2020     Math     85
1   2  2021     Math     90
2   3  2022     Math     95
3   1  2020  Science     80
4   2  2021  Science     89
5   3  2022  Science     94

Here, we work with a DataFrame that includes multiple identifier variables (‘ID’ and ‘Year’). The pandas.melt() function is applied to unpivot the ‘Math’ and ‘Science’ columns, creating a long format DataFrame. Each row now contains the unique combination of ‘ID’, ‘Year’, and the corresponding ‘Subject’ and ‘Score’, making it easier to analyze the data across different subjects and years.

Example 4: Ignoring the Index

By default, pandas.melt() ignores the index of the original DataFrame. You can change this behavior by setting ignore_index=False.

Python
df = pd.DataFrame({
    'A': ['foo', 'bar', 'baz'],
    'B': [1, 2, 3],
    'C': [4, 5, 6]
}, index=['x', 'y', 'z'])

print("Original DataFrame:")
print(df)

melted_df = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)

print("\nMelted DataFrame with Index:")
print(melted_df)

Output:

Markdown
Original DataFrame:
     A  B  C
x  foo  1  4
y  bar  2  5
z  baz  3  6

Melted DataFrame with Index:
     A variable  value
x  foo        B      1
y  bar        B      2
z  baz        B      3
x  foo        C      4
y  bar        C      5
z  baz        C      6

In this example, the original DataFrame has a custom index. By default, pandas.melt() ignores the original index, but we can change this behavior by setting ignore_index=False. This retains the original index in the melted DataFrame, which can be useful when the index contains meaningful information that should be preserved during the transformation.

When and why to use pandas.melt()
  1. Data Visualization: Many plotting libraries require long format data for creating certain types of plots.
  2. Data Analysis: Long format data is easier to manipulate and analyze, especially for grouping and summarizing.
  3. Data Cleaning: Preparing data for machine learning or statistical analysis often requires long format for handling missing values and feature engineering.

Using pandas.melt() ensures your data is in the right structure for these tasks, making it a crucial tool in data preprocessing.

Conclusion

The pandas.melt() function is a versatile tool for transforming DataFrames from wide to long format, making data more suitable for analysis and visualization. By understanding its parameters and how to use them, you can efficiently manipulate your data for various applications.

Feel free to experiment with the examples provided and explore more use cases to get a deeper understanding of pandas.melt().

Also Explore:

Leave a Comment