The isin() method in Pandas is a powerful tool used to filter data in a DataFrame. It allows you to check whether each element in the DataFrame is contained in a given list-like object (e.g., list, set, series). This method returns a DataFrame of the same shape as the original, but with boolean values indicating whether each element is in the provided list.
Syntax
DataFrame.isin(values)
Parameters:
- values: This can be a single value, a list, a set, a dictionary, or another DataFrame.
Returns:
- A DataFrame of booleans indicating if each element is in the
values
.
Usage Examples
Let’s dive into some examples to understand how the isin()
method works.
Example 1: Using isin() with a List
Suppose we have a DataFrame containing information about various fruits.
import pandas as pd
data = {
'Fruit': ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry'],
'Quantity': [10, 5, 7, 8, 6]
}
df = pd.DataFrame(data)
print(df)
Output:
Fruit Quantity
0 Apple 10
1 Banana 5
2 Cherry 7
3 Date 8
4 Elderberry 6
We want to filter the DataFrame to find out if the fruits are in the list ['Apple', 'Date', 'Elderberry']
.
fruits_to_check = ['Apple', 'Date', 'Elderberry']
result = df['Fruit'].isin(fruits_to_check)
print(result)
Output:
0 True
1 False
2 False
3 True
4 True
Name: Fruit, dtype: bool
In this example, isin()
checks each fruit in the ‘Fruit’ column against the list ['Apple', 'Date', 'Elderberry']
. It returns a Series of boolean values indicating whether each fruit is in the list. To filter the DataFrame based on this result, we use boolean indexing, which keeps only the rows where the condition is True
.
To filter the DataFrame based on the isin()
result, we can use boolean indexing.
filtered_df = df[df['Fruit'].isin(fruits_to_check)]
print(filtered_df)
Output:
Fruit Quantity
0 Apple 10
3 Date 8
4 Elderberry 6
This example filters df
to include only rows where the ‘Fruit’ column’s values are in the fruits_to_check
list. df['Fruit'].isin(fruits_to_check)
returns a boolean Series indicating which rows match, and df[...]
uses this Series to filter the DataFrame.
Example 2: Using isin() with a Dictionary
You can also use a dictionary with isin()
. Each key in the dictionary corresponds to a column name, and the values are lists of items you want to check for in those columns.
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
Let’s check for specific ages and cities.
conditions = {
'Age': [24, 32],
'City': ['New York', 'Chicago']
}
result = df.isin(conditions)
print(result)
Output:
Name Age City
0 False True True
1 False False False
2 False False True
3 False True False
To filter rows where any condition is met, we can use the any
method along with axis=1.
filtered_df = df[result.any(axis=1)]
print(filtered_df)
Output:
Name Age City
0 Alice 24 New York
2 Charlie 22 Chicago
3 David 32 Houston
Explanation:
In this example, isin()
is used with a dictionary to check multiple columns. The dictionary specifies the conditions for the ‘Age’ and ‘City’ columns. The method returns a DataFrame of boolean values. To filter rows where any condition is met, we use the any
method along with axis=1
to keep only the rows where any of the specified conditions are True
.
Example 3: Using isin() with Another DataFrame
You can use isin()
to compare two DataFrames.
data1 = {
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
data2 = {
'ID': [3, 4, 5, 6],
'Name': ['Charlie', 'David', 'Edward', 'Frank']
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
print(df1)
print(df2)
Output for df1
:
# df1
ID Name
0 1 Alice
1 2 Bob
2 3 Charlie
3 4 David
# df2
ID Name
0 3 Charlie
1 4 David
2 5 Edward
3 6 Frank
Let’s find the common IDs between df1
and df2
.
common_ids = df1['ID'].isin(df2['ID'])
print(common_ids)
Output:
0 False
1 False
2 True
3 True
Name: ID, dtype: bool
Filter df1
to get only the rows with common IDs.
common_df = df1[common_ids]
print(common_df)
Output:
ID Name
2 3 Charlie
3 4 David
Explanation:
In this example, isin()
checks each ID in df1
against the IDs in df2
. It returns a Series of boolean values indicating whether each ID is in df2
. We then use boolean indexing to filter df1
and keep only the rows where the ID is common in both DataFrames.
Conclusion
The isin()
method is a versatile tool for filtering and comparing data in Pandas DataFrames. Whether you’re checking for values in a list, dictionary, or another DataFrame, isin()
makes it easy to identify and filter data based on your criteria. By understanding and utilizing this method, you can efficiently manage and analyze your data in Pandas.
Happy coding!
Also Explore: