Pandas rank() function – Explained with examples

Pandas is an immensely powerful data manipulation library in Python, especially when dealing with structured data. One of its useful functions is rank(), which allows you to rank data within a DataFrame. Ranking data can be crucial in various analyses, such as determining the relative standing of values within a dataset.

What is rank()?

The rank() function assigns ranks to entries in a DataFrame based on their values. The rank of a value is its position in a sorted list of values. The function returns a DataFrame of the same shape, with each element replaced by its rank.

Syntax
Python
DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
Parameters
  1. axis: {0 or ‘index’, 1 or ‘columns’}, default 0
    • The axis along which to rank. 0 or ‘index’ for rows, 1 or ‘columns’ for columns.
  2. method: {‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
    • Specifies the method to assign ranks to equal values:
      • ‘average’: Assigns the average rank to the tied values.
      • ‘min’: Assigns the minimum rank to the tied values.
      • ‘max’: Assigns the maximum rank to the tied values.
      • ‘first’: Assigns ranks in the order they appear in the array.
      • ‘dense’: Like ‘min’, but ranks increase by 1 between groups.
  3. numeric_only: bool, default None
    • If True, only float, int, and boolean data types are considered.
  4. na_option: {‘keep’, ‘top’, ‘bottom’}, default ‘keep’
    • Specifies how to handle NA values:
      • ‘keep’: Leaves NA values in place.
      • ‘top’: Assigns the smallest rank to NA values if ascending, largest if descending.
      • ‘bottom’: Assigns the largest rank to NA values if ascending, smallest if descending.
  5. ascending: bool, default True
    • If True, ranks in ascending order. If False, ranks in descending order.
  6. pct: bool, default False
    • If True, computes percentage ranks. Each value will be in the range [0, 1].

Example Usage

Let’s walk through some examples to understand how rank() works in practice.

Example 1: Basic Ranking
Python
import pandas as pd

# Create a simple DataFrame
df = pd.DataFrame({'A': [10, 20, 20, 40], 'B': [30, 20, 10, 40]})

# Rank the DataFrame
ranked_df = df.rank()
print(ranked_df)

Output:

Markdown
     A    B
0  1.0  3.0
1  2.5  2.0
2  2.5  1.0
3  4.0  4.0

In this example, the rank() function is applied to a DataFrame df. Each value in the DataFrame is assigned a rank based on its value within the column. Duplicate values in column ‘A’ (the 20s) receive the average rank of 2.5.

Example 2: Using Different Methods
Python
# Min method
ranked_df_min = df.rank(method='min')
print(ranked_df_min)

Output:

Markdown
     A    B
0  1.0  3.0
1  2.0  2.0
2  2.0  1.0
3  4.0  4.0

This example demonstrates the use of the ‘min’ method for ranking. When applying the rank() function with method='min', duplicate values receive the minimum rank instead of the average. Thus, the 20s in column ‘A’ are both assigned a rank of 2.

Example 3: Ranking in Descending Order
Python
ranked_df_desc = df.rank(ascending=False)
print(ranked_df_desc)

Output:

Markdown
     A    B
0  4.0  2.0
1  2.5  3.0
2  2.5  4.0
3  1.0  1.0

Here, the DataFrame is ranked in descending order by setting ascending=False in the rank() function. This results in the highest values receiving the lowest ranks, and vice versa. For instance, the highest value (40) in column ‘A’ gets a rank of 1.

Example 4: Percentage Ranks
Python
ranked_df_pct = df.rank(pct=True)
print(ranked_df_pct)

Output:

Markdown
       A     B
0   0.25  0.75
1  0.625  0.50
2  0.625  0.25
3   1.00  1.00

In this example, the rank() function is used with the pct=True parameter, which calculates the percentage rank of each value within its column. The ranks are expressed as percentages, indicating the relative standing of each value within the column on a scale from 0 to 1.

Practical Applications

The rank() function is particularly useful in scenarios where relative positioning is more important than absolute values. Some practical applications include:

  • Financial Analysis: Ranking stocks based on performance metrics.
  • Sports Analytics: Ranking players or teams based on performance stats.
  • Academic Grading: Ranking students based on scores.
Conclusion

The rank() function in Pandas is a versatile tool for ranking data within a DataFrame. By understanding its parameters and methods, you can effectively utilize it to suit your specific analysis needs. Whether you are dealing with ties, handling missing values, or computing percentage ranks, rank() provides a robust solution to manage and analyze ranked data efficiently.

Leave a Comment