Pandas DataFrame isin() Method – Explained with Examples

The isin() method in Pandas is a powerful tool used to filter data in a DataFrame. It allows you to check whether each element in the DataFrame is contained in a given list-like object (e.g., list, set, series). This method returns a DataFrame of the same shape as the original, but with boolean values indicating whether each element is in the provided list.

Syntax
Python
DataFrame.isin(values)

Parameters:

  • values: This can be a single value, a list, a set, a dictionary, or another DataFrame.

Returns:

  • A DataFrame of booleans indicating if each element is in the values.

Usage Examples

Let’s dive into some examples to understand how the isin() method works.

Example 1: Using isin() with a List

Suppose we have a DataFrame containing information about various fruits.

Python
import pandas as pd

data = {
    'Fruit': ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry'],
    'Quantity': [10, 5, 7, 8, 6]
}

df = pd.DataFrame(data)
print(df)

Output:

Markdown
         Fruit  Quantity
0        Apple        10
1       Banana         5
2       Cherry         7
3         Date         8
4   Elderberry         6

We want to filter the DataFrame to find out if the fruits are in the list ['Apple', 'Date', 'Elderberry'].

Python
fruits_to_check = ['Apple', 'Date', 'Elderberry']
result = df['Fruit'].isin(fruits_to_check)
print(result)

Output:

Markdown
0     True
1    False
2    False
3     True
4     True
Name: Fruit, dtype: bool

In this example, isin() checks each fruit in the ‘Fruit’ column against the list ['Apple', 'Date', 'Elderberry']. It returns a Series of boolean values indicating whether each fruit is in the list. To filter the DataFrame based on this result, we use boolean indexing, which keeps only the rows where the condition is True.

To filter the DataFrame based on the isin() result, we can use boolean indexing.

Python
filtered_df = df[df['Fruit'].isin(fruits_to_check)]
print(filtered_df)

Output:

Markdown
         Fruit  Quantity
0        Apple        10
3         Date         8
4   Elderberry         6

This example filters df to include only rows where the ‘Fruit’ column’s values are in the fruits_to_check list. df['Fruit'].isin(fruits_to_check) returns a boolean Series indicating which rows match, and df[...] uses this Series to filter the DataFrame.

Example 2: Using isin() with a Dictionary

You can also use a dictionary with isin(). Each key in the dictionary corresponds to a column name, and the values are lists of items you want to check for in those columns.

Python
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)
print(df)

Output:

Markdown
      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston

Let’s check for specific ages and cities.

Python
conditions = {
    'Age': [24, 32],
    'City': ['New York', 'Chicago']
}

result = df.isin(conditions)
print(result)

Output:

Markdown
    Name    Age    City
0  False   True    True
1  False  False   False
2  False  False    True
3  False   True   False

To filter rows where any condition is met, we can use the any method along with axis=1.

Python
filtered_df = df[result.any(axis=1)]
print(filtered_df)

Output:

Markdown
     Name  Age      City
0   Alice   24  New York
2 Charlie   22   Chicago
3   David   32   Houston

Explanation:

In this example, isin() is used with a dictionary to check multiple columns. The dictionary specifies the conditions for the ‘Age’ and ‘City’ columns. The method returns a DataFrame of boolean values. To filter rows where any condition is met, we use the any method along with axis=1 to keep only the rows where any of the specified conditions are True.

Example 3: Using isin() with Another DataFrame

You can use isin() to compare two DataFrames.

Python
data1 = {
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
}

data2 = {
    'ID': [3, 4, 5, 6],
    'Name': ['Charlie', 'David', 'Edward', 'Frank']
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
print(df1)
print(df2)

Output for df1:

Markdown
# df1
   ID     Name
0   1    Alice
1   2      Bob
2   3  Charlie
3   4    David

# df2
   ID     Name
0   3  Charlie
1   4    David
2   5   Edward
3   6    Frank

Let’s find the common IDs between df1 and df2.

Python
common_ids = df1['ID'].isin(df2['ID'])
print(common_ids)

Output:

Markdown
0    False
1    False
2     True
3     True
Name: ID, dtype: bool

Filter df1 to get only the rows with common IDs.

Python
common_df = df1[common_ids]
print(common_df)

Output:

Markdown
   ID     Name
2   3  Charlie
3   4    David

Explanation:

In this example, isin() checks each ID in df1 against the IDs in df2. It returns a Series of boolean values indicating whether each ID is in df2. We then use boolean indexing to filter df1 and keep only the rows where the ID is common in both DataFrames.

Conclusion

The isin() method is a versatile tool for filtering and comparing data in Pandas DataFrames. Whether you’re checking for values in a list, dictionary, or another DataFrame, isin() makes it easy to identify and filter data based on your criteria. By understanding and utilizing this method, you can efficiently manage and analyze your data in Pandas.

Happy coding!

Also Explore:

Leave a Comment