How to apply functions on Rows and Columns using ‘apply()’ in Pandas

When working with data in Python, the pandas library is an essential tool for data manipulation and analysis. One of the powerful features of pandas is the apply() function, which allows you to apply a function along either the rows or columns of a DataFrame. In this blog, we’ll explore how to use apply() method for row and column operations in DataFrames.

Introduction to pandas ‘apply()’

The apply() function in pandas is a versatile method that enables you to apply a custom function along an axis of a DataFrame. The axis parameter determines whether the function is applied along rows (axis=1) or columns (axis=0).

Syntax:

Python
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
  • func: The function to apply to each column or row.
  • axis: Axis along which the function is applied. 0 or ‘index’ for columns and 1 or ‘columns’ for rows.
  • raw: Determines if row or column should be passed as an ndarray object (defaults to False).
  • result_type: ‘expand’, ‘reduce’, ‘broadcast’, None (default). ‘expand’ returns a DataFrame if the function returns a Series, ‘reduce’ returns a Series if possible, and ‘broadcast’ propagates the function’s return value to each element of the DataFrame.
  • args and **kwds: Additional positional and keyword arguments to pass to the function.

Applying Functions to Columns

Let’s start by applying a function to each column in a DataFrame. We’ll use a simple example DataFrame and demonstrate how to calculate the sum of each column.

Python
import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)

# Define a function to sum the elements of a column
def column_sum(column):
    return column.sum()

# Apply the function to each column
column_sums = df.apply(column_sum, axis=0)
print(column_sums)

Output:

Markdown
A    10
B    26
C    42
dtype: int64

In this example, the column_sum function is applied to each column of the DataFrame, resulting in a Series containing the sum of each column.

NOTE: sum() is a built-in function provided by the pandas library for Series objects which adds up all the values in the row and returns the total.


Applying Functions to Rows

Next, let’s apply a function to each row in the DataFrame. We’ll demonstrate how to calculate the mean of each row.

Python
# Define a function to calculate the mean of a row
def row_mean(row):
    return row.mean()

# Apply the function to each row
row_means = df.apply(row_mean, axis=1)
print(row_means)

Output:

Markdown
0     5.0
1     6.0
2     7.0
3     8.0
dtype: float64

In this example, the row_mean function is applied to each row of the DataFrame, resulting in a Series containing the mean of each row.

NOTE:mean() is a built-in function provided by the pandas library for Series objects calculates the mean (average) of all elements in the row.


Using Lambda Functions with apply()

You can also use lambda functions with apply() for concise operations. Here’s an example of applying a lambda function to add a constant value to each element in the DataFrame.

Python
# Add 10 to each element using a lambda function
df_add_10 = df.apply(lambda x: x + 10)
print(df_add_10)

Output:

Markdown
    A   B   C
0  11  15  19
1  12  16  20
2  13  17  21
3  14  18  22

Conditional Operations with apply()

You can use apply() for more complex operations, such as conditional logic. For example, let’s create a new column that categorizes each row based on the sum of its values.

Python
# Define a function to categorize rows
def categorize_row(row):
    if row.sum() > 20:
        return 'High'
    else:
        return 'Low'

# Apply the function to each row and create a new column
df['Category'] = df.apply(categorize_row, axis=1)
print(df)

Output:

Markdown
   A  B   C Category
0  1  5   9      Low
1  2  6  10      Low
2  3  7  11      Low
3  4  8  12     High

In this example, the categorize_row function is applied to each row, and a new column Category is created based on the sum of the row values.

Conclusion

The pandas.apply() function is a powerful tool for performing row and column operations in DataFrames. Whether you need to apply simple mathematical functions or complex conditional logic, apply() provides a flexible and efficient way to manipulate your data. By mastering this function, you can significantly enhance your data analysis capabilities with pandas.

Experiment with different functions and DataFrame structures to fully harness the potential of pandas.apply() in your data projects! Happy Coding!!!

Also Explore:

Leave a Comment