Pandas DataFrame where() – Explained with Examples

Pandas, the powerful Python data analysis library, offers a wide range of tools and functions to manipulate data efficiently. One such function is where(), which is used to apply conditions to a DataFrame. This blog will explain the where() function in detail and provide practical examples to demonstrate its usage.

Understanding ‘where()’

The where() function in Pandas is used to replace values in a DataFrame based on a condition. It returns a DataFrame of the same shape as the original, but with entries replaced based on the condition. The basic syntax is:

Python
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Parameters
  • cond: The condition to check. It can be a DataFrame or array-like structure. The condition should be boolean.
  • other: The value to use to replace entries where the condition is False. The default is NaN.
  • inplace: If True, performs the operation in place and returns None.
  • axis: The axis along which to apply the where() method.
  • level: Used with multi-level DataFrames to apply the function along a particular level.
  • errors: Specifies how to handle errors. Default is raise.
  • try_cast: If True, tries to cast the result back to the input type.

Examples

Let’s dive into some practical examples to understand how where() works.

Example 1: Basic Usage

Consider the following DataFrame:

Python
import pandas as pd

data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}

df = pd.DataFrame(data)
print(df)

Output:

Markdown
   A  B   C
0  1  5   9
1  2  6  10
2  3  7  11
3  4  8  12

We want to replace values in column ‘A’ that are less than 3 with NaN(default):

Python
df_where = df.where(df['A'] >= 3)
print(df_where)

Output:

Markdown
     A    B   C
0  NaN  NaN NaN
1  NaN  NaN NaN
2  3.0  7.0 11.0
3  4.0  8.0 12.0

In this example, all values in rows where the condition df['A'] >= 3 is not met are replaced with NaN.

Note that, if any element in ‘A’ fails the condition (i.e., does not satisfy the condition), the corresponding row in the resulting DataFrame will be replaced with NaN.

Example 2: Using ‘other’ Parameter

Suppose we want to replace the values that do not meet the condition with a specific value, say 0 , we can use other parameter:

Python
df_where_other = df.where(df['A'] >= 3, other=0)
print(df_where_other)

Output:

Markdown
   A  B   C
0  0  0   0
1  0  0   0
2  3  7  11
3  4  8  12
Example 3: Using on Multiple Conditions

Let’s consider an example for conditional replacement based on multiple conditions. In this example, we’ll work with a DataFrame where we replace values based on more than one condition.

Python
data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}

df = pd.DataFrame(data)
print("Original df")
print(df)

# Condition for column 'A'
condition_A = df['A'] >= 2

# Condition for column 'B'
condition_B = df['B'] <= 6

# Condition for column 'C'
condition_C = (df['C'] !=12) 

# Applying conditions using where() method
df_where_multi = df.where(condition_A & condition_B & condition_C, other=0)
print("Modified df")
print(df_where_multi)

Here, three conditions are defined:

  • condition_A checks if the values in column ‘A’ are greater than or equal to 2.
  • condition_B checks if the values in column ‘B’ are less than or equal to 6.
  • condition_C checks if the values in column ‘C’ are not equal to 12.

This combined condition checks if all individual conditions are met for each row. The where() method replaces values that do not meet the condition with 0 (specified by other=0).

The output DataFrame df_where_multi is:

Markdown
   A  B   C
0  0  0   0
1  2  6  10
2  0  0   0
3  0  0   0

Explanation:

  • For row 0, not all conditions are met (condition_A is False, condition_B is True, and condition_C is True), so all values are replaced with 0.
  • Then, For row 1, all conditions are met (condition_A is True, condition_B is True, and condition_C is True), so the values remain unchanged.
  • For row 2, not all conditions are met (condition_A is True, condition_B is False, and condition_C is True), so all values are replaced with 0.
  • For row 3, not all conditions are met (condition_A is True, condition_B is False, and condition_C is False), so all values are replaced with 0.
Example 4: In-place Modification

If you want to modify the original DataFrame in place without creating a new DataFrame, set inplace=True:

Python
df.where(df['A'] >= 3, inplace=True)
print(df)

Output:

Markdown
     A    B    C
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  3.0  7.0 11.0
3  4.0  8.0 12.0
Example 5: Applying ‘where()’ Element-Wise

When you apply a condition directly to the DataFrame df without specifying a column (df['A'], for example), Pandas applies the condition element-wise across the entire DataFrame. This means each element in the DataFrame is evaluated against the condition independently.

Python
import pandas as pd

data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, 16]
}

df = pd.DataFrame(data)

# Applying condition element-wise to the entire DataFrame
df_where_element_wise = df.where(df <= 10, other=0)
print(df_where_element_wise)

Output:

Python
   A  B   C  D
0  1  5   9  0
1  2  6  10  0
2  3  7   0  0
3  4  8   0  0

Explanation:

  • Each element in the DataFrame df is checked independently against the condition (df <= 10).
  • If an element meets the condition (is less than or equal to 10), it remains unchanged.
  • If an element does not meet the condition (is greater than 10), it is replaced with 0.
Conclusion

The where() function in Pandas is a versatile tool for conditionally replacing values in a DataFrame. It can handle a variety of conditions, provide flexibility with the other parameter, and even modify DataFrames in place. Understanding and using where() effectively can enhance your data manipulation capabilities in Pandas.

By practicing with different examples and conditions, you’ll become more adept at leveraging this powerful function to meet your data analysis needs. Happy coding!

Also Explore:

Leave a Comment