Pandas DataFrame quantile() Method – Explained with examples

The quantile() method in pandas is a powerful tool for statistical analysis. It allows you to compute quantiles for numerical data, providing a deeper understanding of the distribution within your dataset. In this blog, we will delve into the quantile() method, exploring its syntax, parameters, and practical examples.

What is a Quantile?

A quantile divides your data into equal-sized, contiguous intervals. For instance:

  • The 0.5 quantile (or the median) splits the data into two equal halves.
  • The 0.25 and 0.75 quantiles (first and third quartiles) divide the data into four equal parts.
Syntax of quantile()
Python
DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
  • q: The quantile(s) to compute, which should be between 0 and 1 (default is 0.5).
  • axis: The axis to compute the quantile along (default is 0, meaning along the index).
  • numeric_only: Whether to include only float, int, or boolean data (default is True).
  • interpolation: Specifies the interpolation method to use when the desired quantile lies between two data points (default is ‘linear’).

Examples

Let’s go through some examples to understand how to use the quantile() method.

Example 1: Basic Usage
Python
import pandas as pd

data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)
print(df)

# Compute the 50th percentile (median) for each column
median = df.quantile(0.5)
print(median)

Output:

Markdown
# Original Dataframe 
   A   B
0  1  10
1  2  20
2  3  30
3  4  40
4  5  50

# Compute the 50th percentile (median) for each column
A     3.0
B    30.0
Name: 0.5, dtype: float64

In this example, the median of column ‘A’ is 3, and the median of column ‘B’ is 30.

Example 2: Multiple Quantiles
Python
# Compute the 25th, 50th, and 75th percentiles
quantiles = df.quantile([0.25, 0.5, 0.75])
print(quantiles)

Output:

Markdown
        A     B
0.25  2.0  20.0
0.50  3.0  30.0
0.75  4.0  40.0

Here, we computed the first quartile (0.25), median (0.5), and third quartile (0.75) for each column.

Example 3: Specifying Axis
Python
# Compute the median along the columns (axis=1)
row_medians = df.quantile(0.5, axis=1)
print(row_medians)

Output:

Markdown
0     5.5
1    11.0
2    16.5
3    22.0
4    27.5
dtype: float64

This computes the median for each row by setting axis=1.

Example 4: Interpolation Methods

The interpolation parameter allows you to specify how to interpolate when the desired quantile lies between two data points. Let’s compute the 50th percentile (median) using different interpolation methods.

Python
# Compute the 50th percentile using different interpolation methods
linear = df.quantile(0.5, interpolation='linear')
lower = df.quantile(0.5, interpolation='lower')
higher = df.quantile(0.5, interpolation='higher')
nearest = df.quantile(0.5, interpolation='nearest')
midpoint = df.quantile(0.5, interpolation='midpoint')

print(f'Linear: {linear}')
print(f'Lower: {lower}')
print(f'Higher: {higher}')
print(f'Nearest: {nearest}')
print(f'Midpoint: {midpoint}')

Output:

Python
Linear:
A     3.0
B    30.0
Name: 0.5, dtype: float64

Lower:
A     3
B    30
Name: 0.5, dtype: int64

Higher:
A     3
B    30
Name: 0.5, dtype: int64

Nearest:
A     3
B    30
Name: 0.5, dtype: int64

Midpoint:
A     3.0
B    30.0
Name: 0.5, dtype: float64

Explanation:

  • Linear: The default method. It calculates the quantile by linearly interpolating between the two nearest data points. In this case, it results in a median of 3.0 for column ‘A’ and 30.0 for column ‘B’.
  • Lower: Chooses the lower of the two nearest data points. Here, it results in 3 for column ‘A’ and 30 for column ‘B’.
  • Higher: Chooses the higher of the two nearest data points. The result is the same as the lower interpolation, giving 3 for column ‘A’ and 30 for column ‘B’.
  • Nearest: Selects the nearest data point. Since the median lies exactly between two points, it results in the same values as lower and higher interpolations, 3 for column ‘A’ and 30 for column ‘B’.
  • Midpoint: This method returns the midpoint of the two bounding data points. The midpoint median for column ‘A’ is 3.5, and for column ‘B’, it is 35.0.
Conclusion

The quantile() method in pandas is essential for statistical analysis, providing insights into the distribution of your data. By understanding and utilizing its parameters, you can tailor the computation of quantiles to fit your specific needs. Whether you’re analyzing a simple dataset or performing more complex statistical evaluations, the quantile() method is a valuable addition to your pandas toolkit.

Also Explore:

Leave a Comment