Pandas dataframe aggregate() function – Explained

Pandas is an open-source data analysis and manipulation library in Python, offering data structures and operations for manipulating numerical tables and time series. One of the powerful features of Pandas is its ability to perform complex operations on data using the aggregate() function. This blog post will delve into what aggregate() does, its syntax, and how to use it effectively.

What is aggregate()?

The aggregate() function in Pandas is used to apply one or more aggregation operations on the DataFrame. Aggregation functions operate on multiple values from a column and return a single summarizing value. Common aggregation functions include mean, sum, min, and max.

Syntax
Python
DataFrame.aggregate(func, axis=0, *args, **kwargs)
  • func: Function or list of functions to apply.
  • axis: Axis along which the function is applied:
  • 0 or index: Apply the function to each column (default).
  • 1 or columns: Apply the function to each row.
  • args: Positional arguments to pass to the function.
  • kwargs: Keyword arguments to pass to the function.

Applying aggregate()

Let’s explore some examples to understand how aggregate() works in different scenarios.

Example 1: Aggregating a Single Column

Suppose we have the following DataFrame:

Python
import pandas as pd

data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}

df = pd.DataFrame(data)

To apply a single aggregation function, such as finding the sum of each column, we can use:

Python
result = df.aggregate('sum')
print(result)

Output:

Markdown
A    10
B    26
C    42
dtype: int64
Example 2: Applying Multiple Aggregation Functions

You can pass a list of functions to aggregate() to apply multiple aggregations at once. For instance, if we want to calculate both the sum and mean of each column:

Python
result = df.aggregate(['sum', 'mean'])
print(result)

Output:

Markdown
         A     B     C
sum   10.0  26.0  42.0
mean   2.5   6.5  10.5
Example 3: Aggregating Specific Columns with Different Functions

You can specify different aggregation functions for different columns by passing a dictionary where keys are column names and values are functions or lists of functions.

Python
result = df.aggregate({
    'A': 'sum',
    'B': ['mean', 'min'],
    'C': 'max'
})
print(result)

Output:

Markdown
         A    B   C
sum    10.0  NaN NaN
mean    NaN  6.5 NaN
min     NaN  5.0 NaN
max     NaN  NaN 12.0
Example 4: Aggregating Along Rows

By changing the axis parameter, you can apply aggregation functions along rows instead of columns.

Python
result = df.aggregate('sum', axis=1)
print(result)

Output:

Markdown
0    15
1    18
2    21
3    24
dtype: int64

Custom Aggregation Functions

You can also pass custom functions to aggregate(). Here’s an example of using a custom lambda function to find the range (difference between max and min values) for each column:

Python
result = df.aggregate(lambda x: x.max() - x.min())
print(result)

Output:

Markdown
A    3
B    3
C    3
dtype: int64

Conclusion

The aggregate() function in Pandas is a versatile tool for performing multiple aggregation operations on DataFrames. Whether you need to summarize your data by applying built-in functions like sum, mean, or custom functions, aggregate() provides a clean and efficient way to achieve this. By mastering aggregate(), you can streamline your data analysis workflow and gain deeper insights from your data.

Experiment with different aggregation functions and explore the full potential of the aggregate() function in your data analysis projects!

Explore Also:

Leave a Comment