Pandas DataFrame reindex() Method – Explained with Examples

The reindex() method in Pandas is a powerful tool that allows you to change the row and column labels of a DataFrame. This method can be used to reorder, add, or remove indices, and it is particularly useful for aligning data with a particular set of labels.

What is the reindex() Method?

The reindex() method conforms a DataFrame to a new index, filling missing values if necessary. This method is highly versatile and can be used for a variety of tasks, including:

  • Reordering rows or columns
  • Inserting missing indices
  • Dropping indices
  • Aligning with another DataFrame
Syntax
Python
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
Parameters
  • labels: array-like, optional. New labels / index to conform the DataFrame to.
  • index: array-like, optional. New labels / index to conform the DataFrame to.
  • columns: array-like, optional. New labels / columns to conform the DataFrame to.
  • axis: {0 or ‘index’, 1 or ‘columns’}, optional. Target axis to reindex.
  • method: {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, optional. Method to use for filling holes in reindexed DataFrame.
  • copy: bool, default True. Return a new DataFrame, even if the passed indexes are the same.
  • level: int or level name, optional. Broadcast across a level, matching Index values on the passed MultiIndex level.
  • fill_value: scalar, default NaN. Value to use for missing values.
  • limit: int, optional. Maximum number of consecutive elements to forward/backward fill.
  • tolerance: optional. Maximum distance between original and new labels for inexact matches.

Examples

Let’s explore some examples to understand how reindex() works in different scenarios.

Example 1: Reordering Rows and Columns

Suppose we have a DataFrame with unsorted indices, and we want to reorder them.

Python
import pandas as pd

# Creating a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data, index=['c', 'a', 'b'])

print("Original DataFrame:")
print(df)

# Reindexing the DataFrame
df_reindexed = df.reindex(index=['a', 'b', 'c'], columns=['B', 'A'])

print("\nReindexed DataFrame:")
print(df_reindexed)
Output
Original DataFrame:
   A  B
c  1  4
a  2  5
b  3  6

Reindexed DataFrame:
   B  A
a  5  2
b  6  3
c  4  1

Output Explanation:

  • Original DataFrame: The initial DataFrame has indices [‘c’, ‘a’, ‘b’] and columns [‘A’, ‘B’].
  • Reindexed DataFrame: The DataFrame is reindexed to have indices [‘a’, ‘b’, ‘c’] and columns [‘B’, ‘A’]. The values are reordered accordingly.
Example 2: Adding Missing Indices

We can also add new indices that were not present in the original DataFrame. Missing values will be filled with NaN by default.

Python
# Reindexing to add new indices
df_reindexed = df.reindex(index=['a', 'b', 'c', 'd'])

print("\nDataFrame with added index:")
print(df_reindexed)
Output
DataFrame with added index:
     A    B
a  2.0  5.0
b  3.0  6.0
c  1.0  4.0
d  NaN  NaN

Output Explanation:

  • DataFrame with added index: The new DataFrame includes the new index ‘d’ which was not in the original DataFrame. The values for ‘d’ are NaN because they do not exist in the original DataFrame.
Example 3: Filling Missing Values

We can specify a value to fill in the missing values using the fill_value parameter.

Python
# Reindexing with a fill value
df_reindexed = df.reindex(index=['a', 'b', 'c', 'd'], fill_value=0)

print("\nDataFrame with fill value:")
print(df_reindexed)
Output
DataFrame with fill value:
   A  B
a  2  5
b  3  6
c  1  4
d  0  0

Output Explanation:

  • DataFrame with fill value: The new DataFrame includes the new index ‘d’. The missing values for ‘d’ are filled with 0 as specified by the fill_value parameter.
Example 4.1: Forward and Backward Filling

We can use the method parameter to forward fill or backward fill missing values.

Python
# Creating a sample DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])

print("Original DataFrame:")
print(df)

# Forward fill
df_ffill = df.reindex(index=['a', 'b', 'c', 'd'], method='ffill')

print("\nForward Fill:")
print(df_ffill)

# Backward fill
df_bfill = df.reindex(index=['a', 'b', 'c', 'd'], method='bfill')

print("\nBackward Fill:")
print(df_bfill)
Output
Original DataFrame:
   A  B
a  1  4
b  2  5
c  3  6

Forward Fill:
   A  B
a  1  4
b  2  5
c  3  6
d  3  6

Backward Fill:
     A    B
a  1.0  4.0
b  2.0  5.0
c  3.0  6.0
d  NaN  NaN

Output Explanation:

  • Original DataFrame: The initial DataFrame has missing values (NaN) for some entries.
  • Forward Fill: Missing values are filled forward using the last valid observation. For the new index ‘d’, it uses the value from ‘c’.
  • Backward Fill: Missing values are filled backward using the next valid observation. For the new index ‘d’, no valid observation is available, so it remains NaN.
Example 4.2: Forward and Backward Filling

Let’s create a more detailed example with a larger DataFrame to demonstrate forward and backward filling using the reindex() method. Suppose we have a DataFrame representing sales data for specific days, but some days are missing. We want to use forward fill and backward fill methods to handle these missing days.

Step 1: Creating the DataFrame
Python
import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing days
data = {
    'Date': ['2023-07-01', '2023-07-02', '2023-07-05', '2023-07-06', '2023-07-09', '2023-07-10'],
    'Sales': [200, 210, np.nan, 215, np.nan, 220]
}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

print("Original DataFrame:")
print(df)

Output

Original DataFrame:
            Sales
Date             
2023-07-01  200.0
2023-07-02  210.0
2023-07-05    NaN
2023-07-06  215.0
2023-07-09    NaN
2023-07-10  220.0

Output Explanation:

  • Original DataFrame: The initial DataFrame shows sales data for specific days, with some days missing.
Step 2: Reindexing to Fill Missing Dates

We will reindex the DataFrame to include all dates in the range and then use forward fill and backward fill methods to handle the missing values.

Python
# Creating a date range to include all dates
date_range = pd.date_range(start='2023-07-01', end='2023-07-10')

# Reindexing the DataFrame to include all dates
df_reindexed = df.reindex(date_range)

print("\nDataFrame with Missing Dates:")
print(df_reindexed)

Output

DataFrame with Missing Dates:
            Sales
2023-07-01  200.0
2023-07-02  210.0
2023-07-03    NaN
2023-07-04    NaN
2023-07-05    NaN
2023-07-06  215.0
2023-07-07    NaN
2023-07-08    NaN
2023-07-09    NaN
2023-07-10  220.0

Output Explanation:

  • DataFrame with Missing Dates: The DataFrame now includes all dates from July 1 to July 10, 2023. The missing dates are filled with NaN.
Step 3: Forward Fill
Python
# Forward fill
df_ffill = df_reindexed.ffill()

print("\nForward Fill:")
print(df_ffill)

Output

Forward Fill:
            Sales
2023-07-01  200.0
2023-07-02  210.0
2023-07-03  210.0
2023-07-04  210.0
2023-07-05  210.0
2023-07-06  215.0
2023-07-07  215.0
2023-07-08  215.0
2023-07-09  215.0
2023-07-10  220.0

Output Explanation:

  • Forward Fill: The missing values are filled forward using the last valid observation. For instance, the value for July 3, 2023, is filled with 210.0 from July 2, 2023. This method propagates the last valid value forward until a new valid value is encountered.
Step 4: Backward Fill
Python
# Backward fill
df_bfill = df_reindexed.bfill()

print("\nBackward Fill:")
print(df_bfill)

Output

Backward Fill:
            Sales
2023-07-01  200.0
2023-07-02  210.0
2023-07-03  215.0
2023-07-04  215.0
2023-07-05  215.0
2023-07-06  215.0
2023-07-07  220.0
2023-07-08  220.0
2023-07-09  220.0
2023-07-10  220.0

Output Explanation:

  • Backward Fill: The missing values are filled backward using the next valid observation. For example, the value for July 3, 2023, is filled with 215.0 from July 6, 2023. This method propagates the next valid value backward until a new valid value is encountered.
Conclusion

The reindex() method in Pandas allows you to manipulate the indices of a DataFrame to match a specific set of labels. Whether you need to reorder, add, or fill missing indices, reindex() provides a straightforward way to achieve this. Understanding how to use reindex() effectively can greatly enhance your data manipulation capabilities in Pandas.


This blog post provided a detailed explanation of the reindex() method in Pandas with practical examples. If you have any questions or need further clarification, feel free to leave a comment below.

Also Explore:

Leave a Comment