The stack() method in Pandas is a powerful tool for reshaping DataFrames. It is primarily used to pivot the columns of a DataFrame into its index, resulting in a more compact form. This method is particularly useful for converting wide DataFrames into long formats, which are often easier to work with in various data analysis tasks.
In this blog, we will cover the following topics:
- Introduction to
stack
- Basic usage of
stack
- Using
stack
with multi-level columns - Handling missing values with
stack
- Practical examples
Introduction to stack
The stack
method pivots the columns of a DataFrame into the index. This method is the opposite of the unstack
method, which pivots the index levels into columns. The stack
method primarily operates on DataFrames with hierarchical columns (MultiIndex).
Syntax
DataFrame.stack(level=-1, dropna=True)
level
: The level(s) to stack. Defaults to the innermost level (-1).dropna
: Whether to drop rows in the resulting DataFrame/Series with missing values. Defaults toTrue
.
2. Basic Usage of stack
Let’s start with a simple example to understand the basic usage of stack
.
import pandas as pd
# Creating a simple DataFrame
df = pd.DataFrame({
'A': {0: 'a', 1: 'b', 2: 'c'},
'B': {0: 1, 1: 3, 2: 5},
'C': {0: 2, 1: 4, 2: 6}
})
# Applying stack method
stacked_df = df.stack()
print(stacked_df)
Output:
0 A a
B 1
C 2
1 A b
B 3
C 4
2 A c
B 5
C 6
dtype: object
In this example, the stack
method has pivoted the columns A
, B
, and C
into the index, creating a Series with a MultiIndex.
3. Using stack
with Multi-Level Columns
The stack
method is especially useful when dealing with DataFrames that have MultiIndex columns.
import pandas as pd
# Creating a DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples([('A', 'cat'), ('A', 'dog'), ('B', 'cat'), ('B', 'dog')])
df_multi = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], columns=columns)
print(df_multi)
# Applying stack method
stacked_df_multi = df_multi.stack(level=0)
print(stacked_df_multi)
Output:
A B
cat dog cat dog
0 1 2 3 4
1 5 6 7 8
stacked_df_multi:
cat dog
0 A 1 2
B 3 4
1 A 5 6
B 7 8
In this example, the stack
method has pivoted the first level of the columns into the index, resulting in a more compact DataFrame.
4. Handling Missing Values with stack
By default, the stack
method drops rows with missing values. However, this behavior can be controlled using the dropna
parameter.
import pandas as pd
# Creating a DataFrame with missing values
df_missing = pd.DataFrame({
'A': {0: 'a', 1: None, 2: 'c'},
'B': {0: 1, 1: 3, 2: None},
'C': {0: 2, 1: 4, 2: 6}
})
# Applying stack method with dropna=False
stacked_df_missing = df_missing.stack(dropna=False)
print(stacked_df_missing)
Output:
0 A a
B 1
C 2
1 A NaN
B 3
C 4
2 A c
B NaN
C 6
dtype: object
In this example, the stack
method retains the rows with missing values because dropna
is set to False
.
5. Practical Examples
Converting a Wide DataFrame to Long Format
import pandas as pd
# Creating a wide DataFrame
df_wide = pd.DataFrame({
'ID': [1, 2, 3],
'Math': [90, 80, 85],
'Science': [85, 80, 95],
'English': [78, 88, 92]
})
# Setting 'ID' as the index
df_wide.set_index('ID', inplace=True)
# Applying stack method to convert to long format
df_long = df_wide.stack().reset_index()
df_long.columns = ['ID', 'Subject', 'Score']
print(df_long)
Output:
ID Subject Score
0 1 Math 90
1 1 Science 85
2 1 English 78
3 2 Math 80
4 2 Science 80
5 2 English 88
6 3 Math 85
7 3 Science 95
8 3 English 92
In this practical example, we converted a wide DataFrame into a long format using the stack
method, which is useful for various data analysis tasks and visualizations.
Conclusion
The stack
method in Pandas is a versatile tool for reshaping DataFrames. It allows you to pivot columns into the index, making your data more compact and easier to work with in long format. Whether you are dealing with simple DataFrames or those with hierarchical columns, the stack
method can significantly enhance your data manipulation capabilities.
By mastering the stack
method, you can efficiently transform and analyze your data, making your data science workflow more effective and streamlined.
Also Explore: