Pandas query() Method – Explained with examples

In data analysis and manipulation, efficiently filtering data is crucial. The Pandas library in Python offers a powerful and intuitive method for this purpose: the query() method. This method allows you to filter DataFrame rows based on a query expression, providing a more readable and concise alternative to traditional indexing.

What is query() Method

The query() method in Pandas enables you to filter DataFrame rows using a string expression. This method leverages the DataFrame’s column names directly within the expression, allowing for clean and readable code. The syntax for the query() method is:

Python
DataFrame.query(expr, inplace=False, **kwargs)
Parameters
  • expr: A string expression to evaluate. This expression must be a valid Python expression and can include DataFrame column names.
  • inplace: A boolean value indicating whether to modify the DataFrame in place. The default is False, meaning the method returns a new DataFrame.
  • kwargs: Additional keyword arguments to pass to the underlying DataFrame.eval() method.
Return Value

The query() method returns a DataFrame that matches the query expression.

Using the ‘query()’ Method

Example 1: Basic Filtering

Consider a DataFrame df containing information about a group of students:

Python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 23],
    'Score': [85, 62, 90, 70]
}

df = pd.DataFrame(data)

The dataframe will look like,

Markdown
      Name  Age  Score
0    Alice   24     85
1      Bob   27     62
2  Charlie   22     90
3    David   23     70

To filter students with a score greater than 80, you can use the query() method as follows:

Python
high_scorers = df.query('Score > 80')
print(high_scorers)

Output:

Markdown
      Name  Age  Score
0    Alice   24     85
2  Charlie   22     90

In this example, we have a DataFrame with information about students, including their names, ages, and scores. By using the query() method, we filter the DataFrame to include only those students who have a score greater than 80. This results in a new DataFrame containing only the students who meet the specified criteria, making it easy to identify high scorers.

Example 2: Filtering with Multiple Conditions

You can also combine multiple conditions using logical operators like & (and), | (or), and ~ (not). For instance, to filter students older than 22 and with a score greater than 70:

Python
filtered_students = df.query('Age > 22 & Score > 70')
print(filtered_students)

Output:

Markdown
    Name  Age  Score
0  Alice   24     85

This example demonstrates how to apply multiple conditions simultaneously. We filter the students to find those who are older than 22 and have a score greater than 70. By using the logical & operator, we combine these conditions in the query() method, resulting in a new DataFrame that includes only the students who satisfy both criteria.

Example 3: Using Variables in the Query

If you have variables that you want to include in your query, you can use the @ symbol to reference them. For example, to filter students based on a minimum score stored in a variable:

Python
min_score = 70
qualified_students = df.query('Score > @min_score')
print(qualified_students)

Output:

Markdown
      Name  Age  Score
0    Alice   24     85
2  Charlie   22     90
Example 4: Handling Column Names with Spaces

If your DataFrame has column names with spaces, you can use backticks (`) to enclose them in the query expression. For example:

Python
data = {
    'Student Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Student Age': [24, 27, 22, 23],
    'Test Score': [85, 62, 90, 70]
}

df = pd.DataFrame(data)

filtered_students = df.query('`Test Score` > 80')
print(filtered_students)

Output:

Markdown
  Student Name  Student Age  Test Score
0        Alice           24          85
2      Charlie           22          90
Benefits of Using the query() Method
  1. Readability: The query() method allows for more readable and concise code compared to traditional indexing methods.
  2. Flexibility: It supports complex expressions with logical operators and can reference variables directly within the query.
  3. Performance: The query() method can be more efficient for large datasets, as it leverages the underlying numexpr library for optimized evaluation.
Conclusion

The Pandas query() method is a powerful tool for filtering DataFrame rows based on a string expression. Its intuitive syntax, combined with the ability to handle complex conditions and reference variables, makes it a valuable addition to any data analyst’s toolkit. By mastering the query() method, you can write cleaner and more efficient data filtering code, enhancing the overall readability and performance of your data manipulation tasks.

Also Explore:

Leave a Comment