Pandas Dataframe rolling() method – Explained with examples

When working with time series data in Python, one often needs to perform operations on a rolling or moving window basis. The rolling() method in Pandas is designed specifically for this purpose, allowing you to apply functions over a rolling window. This is particularly useful for smoothing time series data, calculating moving averages, and other similar tasks.

What is rolling() method?

The rolling() method provides rolling window calculations over a DataFrame. It returns a Rolling object, which you can then use to apply various statistical functions like mean, sum, standard deviation, etc., over the defined rolling window.

What is a Rolling Window?

A rolling window, also known as a moving window, is a subset of data points that moves along the data set to perform calculations on different segments of the data. The window “rolls” through the data, shifting one step at a time, allowing you to observe trends and patterns by smoothing out short-term fluctuations.

For example, with a window size of 3:

  • For the data points [1, 2, 3, 4, 5], the rolling windows would be [1, 2, 3], [2, 3, 4], and [3, 4, 5].
  • Each window is used to perform the desired calculation (mean, sum, etc.) on the contained values.
Syntax of rolling() method
Python
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
  • window: Size of the moving window. This is a mandatory parameter.
  • min_periods: Minimum number of observations in the window required to have a value (otherwise the result is NaN). Default is the same as the window size.
  • center: If True, set the labels at the center of the window. Default is False.
  • win_type: Provide a window type, such as ‘boxcar’, ‘triang’, ‘blackman’, etc.
  • on: For a DataFrame, specifying the column name to use as the time series.
  • axis: The axis along which to apply the rolling window (0 or ‘index’, 1 or ‘columns’). Default is 0.
  • closed: Define which side of the window interval is closed. The options are ‘right’, ‘left’, ‘both’, ‘neither’.

Example Usage

Example 1: Rolling Mean

The rolling mean is calculated by taking the average of a specified number of consecutive values in the DataFrame. This “window” moves across the data, calculating the mean for each subset of values. Let’s examine the calculation for each row with a window size of 3 using the example from before:

Python
import pandas as pd

# Create a sample DataFrame
data = {'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data)

# Calculate the rolling mean with a window size of 3
df['rolling_mean'] = df['value'].rolling(window=3).mean()

print(df)

Output:

   value  rolling_mean
0      1           NaN
1      2           NaN
2      3           2.0
3      4           3.0
4      5           4.0
5      6           5.0
6      7           6.0
7      8           7.0
8      9           8.0
9     10           9.0

In this example, we calculate the rolling mean with a window size of 3. The output shows that for the first two rows, the rolling mean is NaN because there are not enough data points to form a complete window. From the third row onwards, each value is the average of the current and the two preceding values.

Detailed Calculation:
  • First Row (index 0): There are not enough data points to form a window of size 3, so the result is NaN.
  • Second Row (index 1): Still not enough data points for a complete window of size 3, so the result is NaN.
  • Third Row (index 2):
    • Window: [1, 2, 3]
    • Mean: (1 + 2 + 3) / 3 = 2.0
  • Fourth Row (index 3):
    • Window: [2, 3, 4]
    • Mean: (2 + 3 + 4) / 3 = 3.0
  • Fifth Row (index 4):
    • Window: [3, 4, 5]
    • Mean: (3 + 4 + 5) / 3 = 4.0
  • Sixth Row (index 5):
    • Window: [4, 5, 6]
    • Mean: (4 + 5 + 6) / 3 = 5.0
  • Seventh Row (index 6):
    • Window: [5, 6, 7]
    • Mean: (5 + 6 + 7) / 3 = 6.0
  • Eighth Row (index 7):
    • Window: [6, 7, 8]
    • Mean: (6 + 7 + 8) / 3 = 7.0
  • Ninth Row (index 8):
    • Window: [7, 8, 9]
    • Mean: (7 + 8 + 9) / 3 = 8.0
  • Tenth Row (index 9):
    • Window: [8, 9, 10]
    • Mean: (8 + 9 + 10) / 3 = 9.0
Example 2: Rolling Sum

You can also calculate the rolling sum.

Python
# Calculate the rolling sum with a window size of 3
df['rolling_sum'] = df['value'].rolling(window=3).sum()

print(df)

Output:

   value  rolling_mean  rolling_sum
0      1           NaN          NaN
1      2           NaN          NaN
2      3           2.0          6.0
3      4           3.0          9.0
4      5           4.0         12.0
5      6           5.0         15.0
6      7           6.0         18.0
7      8           7.0         21.0
8      9           8.0         24.0
9     10           9.0         27.0

Here, the rolling sum with a window size of 3 is computed. Similar to the rolling mean, the first two rows are NaN due to insufficient data points. Starting from the third row, each value represents the sum of the current value and the two preceding values, providing a cumulative view over a rolling window.

Detailed Calculation:
  • First Row (index 0): There are not enough data points to form a window of size 3, so the result is NaN.
  • Second Row (index 1): Still not enough data points for a complete window of size 3, so the result is NaN.
  • Third Row (index 2):
    • Window: [1, 2, 3]
    • Sum: 1 + 2 + 3 = 6
  • Fourth Row (index 3):
    • Window: [2, 3, 4]
    • Sum: 2 + 3 + 4 = 9
  • Fifth Row (index 4):
    • Window: [3, 4, 5]
    • Sum: 3 + 4 + 5 = 12
  • Sixth Row (index 5):
    • Window: [4, 5, 6]
    • Sum: 4 + 5 + 6 = 15
  • Seventh Row (index 6):
    • Window: [5, 6, 7]
    • Sum: 5 + 6 + 7 = 18
  • Eighth Row (index 7):
    • Window: [6, 7, 8]
    • Sum: 6 + 7 + 8 = 21
  • Ninth Row (index 8):
    • Window: [7, 8, 9]
    • Sum: 7 + 8 + 9 = 24
  • Tenth Row (index 9):
    • Window: [8, 9, 10]
    • Sum: 8 + 9 + 10 = 27

The rolling_sum column is populated with the sum of the values within each rolling window. The window “rolls” over the data by one position at a time, providing a moving sum across the DataFrame.

Example 3: Rolling Standard Deviation

Another common use case is the rolling standard deviation.

Python
# Calculate the rolling standard deviation with a window size of 3
df['rolling_std'] = df['value'].rolling(window=3).std()

print(df)

Output:

   value  rolling_mean  rolling_sum  rolling_std
0      1           NaN          NaN          NaN
1      2           NaN          NaN          NaN
2      3           2.0          6.0          1.0
3      4           3.0          9.0          1.0
4      5           4.0         12.0          1.0
5      6           5.0         15.0          1.0
6      7           6.0         18.0          1.0
7      8           7.0         21.0          1.0
8      9           8.0         24.0          1.0
9     10           9.0         27.0          1.0

In this case, we calculate the rolling standard deviation with a window size of 3. The first two rows have NaN because there are not enough data points. From the third row onwards, each value shows the standard deviation of the current value and the two preceding values, indicating the variability within each rolling window.

For an instance, let’s consider Fourth Row (index 3), with a window size of 3, the rolling standard deviation calculation includes the values at indices 1, 2, and 3.

  • Window: [2, 3, 4]
  • Mean: (2 + 3 + 4) / 3 = 3.0
  • Variance: [(2 – 3)² + (3 – 3)² + (4 – 3)²] / 2 (Note: Using Bessel’s correction, n-1 in the denominator)
    • = [(2 – 3)² + (3 – 3)² + (4 – 3)²] / 2
    • = [1 + 0 + 1] / 2
    • = 2 / 2 = 1
  • Standard Deviation: √(1) = 1.0

So, the rolling standard deviation for the 4th row (index 3) is 1.0, which aligns with the output shown.


Advanced Usage

Example 4: Centered Window

You can center the rolling window labels by setting center=True.

Python
# Calculate the rolling mean with a window size of 3 and center the window
df['rolling_mean_centered'] = df['value'].rolling(window=3, center=True).mean()

print(df)

Output:

   value  rolling_mean  rolling_sum  rolling_std  rolling_mean_centered
0      1           NaN          NaN          NaN                    NaN
1      2           NaN          NaN          NaN                    2.0
2      3           2.0          6.0          1.0                    3.0
3      4           3.0          9.0          1.0                    4.0
4      5           4.0         12.0          1.0                    5.0
5      6           5.0         15.0          1.0                    6.0
6      7           6.0         18.0          1.0                    7.0
7      8           7.0         21.0          1.0                    8.0
8      9           8.0         24.0          1.0                    9.0
9     10           9.0         27.0          1.0                    NaN

This example demonstrates calculating the rolling mean with a centered window of size 3. The output shows NaN for the first and last rows because a centered window cannot be formed. The remaining rows present the mean of the current value and its immediate neighbors, with the window centered around each row.

In an instance, for the 4th row (index 3), with a window size of 3 and centered window, the rolling mean calculation includes the values at indices 2, 3, and 4.

  • Window: [3, 4, 5]
  • Mean: (3 + 4 + 5) / 3 = 12 / 3 = 4.0

So, the centered window rolling mean for the 4th row (index 3) is 4.0.


Example 5: Rolling Window with Minimum Periods

Specify min_periods to determine the minimum number of observations required to calculate a value.

Python
# Calculate the rolling mean with a window size of 3 and min_periods of 1
df['rolling_mean_min_prds'] = df['value'].rolling(window=3, min_periods=1).mean()

print(df)

Output:

   value  rolling_mean  rolling_sum  rolling_std  rolling_mean_centered   rolling_mean_min_prds
0      1           NaN          NaN          NaN                    NaN                     1.0
1      2           NaN          NaN          NaN                    2.0                     1.5
2      3           2.0          6.0          1.0                    3.0                     2.0
3      4           3.0          9.0          1.0                    4.0                     3.0
4      5           4.0         12.0          1.0                    5.0                     4.0
5      6           5.0         15.0          1.0                    6.0                     5.0
6      7           6.0         18.0          1.0                    7.0                     6.0
7      8           7.0         21.0          1.0                    8.0                     7.0
8      9           8.0         24.0          1.0                    9.0                     8.0
9     10           9.0         27.0          1.0                    NaN                     9.0

Here, we compute the rolling mean with a window size of 3 and a minimum period of 1. Unlike previous examples, no rows have NaN because the minimum period condition allows calculations with fewer data points. The result shows the mean of available values within each window, starting with a single value and expanding as more data points are included.

For an instance, let’s consider the 4th row (index 3), with a window size of 3 and min_periods=1, the rolling mean calculation still includes the values at indices 1, 2, and 3 since there are at least 1 period within the window.

  • Window: [2, 3, 4]
  • Mean: (2 + 3 + 4) / 3 = 9 / 3 = 3.0

So, the rolling mean with minimum periods for the 4th row (index 3) is 3.0.


Conclusion

The rolling() method in Pandas is a powerful tool for performing rolling window calculations. By understanding its parameters and capabilities, you can effectively smooth time series data, calculate moving averages, sums, standard deviations, and more. Experiment with different window sizes, minimum periods, and centered windows to best fit your data analysis needs.

Also Explore:

Leave a Comment