When working with time series data in Python, one often needs to perform operations on a rolling or moving window basis. The rolling() method in Pandas is designed specifically for this purpose, allowing you to apply functions over a rolling window. This is particularly useful for smoothing time series data, calculating moving averages, and other similar tasks.
What is rolling()
method?
The rolling()
method provides rolling window calculations over a DataFrame. It returns a Rolling object, which you can then use to apply various statistical functions like mean, sum, standard deviation, etc., over the defined rolling window.
What is a Rolling Window?
A rolling window, also known as a moving window, is a subset of data points that moves along the data set to perform calculations on different segments of the data. The window “rolls” through the data, shifting one step at a time, allowing you to observe trends and patterns by smoothing out short-term fluctuations.
For example, with a window size of 3:
- For the data points [1, 2, 3, 4, 5], the rolling windows would be [1, 2, 3], [2, 3, 4], and [3, 4, 5].
- Each window is used to perform the desired calculation (mean, sum, etc.) on the contained values.
Syntax of rolling()
method
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
- window: Size of the moving window. This is a mandatory parameter.
- min_periods: Minimum number of observations in the window required to have a value (otherwise the result is NaN). Default is the same as the window size.
- center: If
True
, set the labels at the center of the window. Default isFalse
. - win_type: Provide a window type, such as ‘boxcar’, ‘triang’, ‘blackman’, etc.
- on: For a DataFrame, specifying the column name to use as the time series.
- axis: The axis along which to apply the rolling window (0 or ‘index’, 1 or ‘columns’). Default is 0.
- closed: Define which side of the window interval is closed. The options are ‘right’, ‘left’, ‘both’, ‘neither’.
Example Usage
Example 1: Rolling Mean
The rolling mean is calculated by taking the average of a specified number of consecutive values in the DataFrame. This “window” moves across the data, calculating the mean for each subset of values. Let’s examine the calculation for each row with a window size of 3 using the example from before:
import pandas as pd
# Create a sample DataFrame
data = {'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
df = pd.DataFrame(data)
# Calculate the rolling mean with a window size of 3
df['rolling_mean'] = df['value'].rolling(window=3).mean()
print(df)
Output:
value rolling_mean
0 1 NaN
1 2 NaN
2 3 2.0
3 4 3.0
4 5 4.0
5 6 5.0
6 7 6.0
7 8 7.0
8 9 8.0
9 10 9.0
In this example, we calculate the rolling mean with a window size of 3. The output shows that for the first two rows, the rolling mean is NaN
because there are not enough data points to form a complete window. From the third row onwards, each value is the average of the current and the two preceding values.
Detailed Calculation:
- First Row (index 0): There are not enough data points to form a window of size 3, so the result is
NaN
. - Second Row (index 1): Still not enough data points for a complete window of size 3, so the result is
NaN
. - Third Row (index 2):
- Window: [1, 2, 3]
- Mean: (1 + 2 + 3) / 3 = 2.0
- Fourth Row (index 3):
- Window: [2, 3, 4]
- Mean: (2 + 3 + 4) / 3 = 3.0
- Fifth Row (index 4):
- Window: [3, 4, 5]
- Mean: (3 + 4 + 5) / 3 = 4.0
- Sixth Row (index 5):
- Window: [4, 5, 6]
- Mean: (4 + 5 + 6) / 3 = 5.0
- Seventh Row (index 6):
- Window: [5, 6, 7]
- Mean: (5 + 6 + 7) / 3 = 6.0
- Eighth Row (index 7):
- Window: [6, 7, 8]
- Mean: (6 + 7 + 8) / 3 = 7.0
- Ninth Row (index 8):
- Window: [7, 8, 9]
- Mean: (7 + 8 + 9) / 3 = 8.0
- Tenth Row (index 9):
- Window: [8, 9, 10]
- Mean: (8 + 9 + 10) / 3 = 9.0
Example 2: Rolling Sum
You can also calculate the rolling sum.
# Calculate the rolling sum with a window size of 3
df['rolling_sum'] = df['value'].rolling(window=3).sum()
print(df)
Output:
value rolling_mean rolling_sum
0 1 NaN NaN
1 2 NaN NaN
2 3 2.0 6.0
3 4 3.0 9.0
4 5 4.0 12.0
5 6 5.0 15.0
6 7 6.0 18.0
7 8 7.0 21.0
8 9 8.0 24.0
9 10 9.0 27.0
Here, the rolling sum with a window size of 3 is computed. Similar to the rolling mean, the first two rows are NaN
due to insufficient data points. Starting from the third row, each value represents the sum of the current value and the two preceding values, providing a cumulative view over a rolling window.
Detailed Calculation:
- First Row (index 0): There are not enough data points to form a window of size 3, so the result is
NaN
. - Second Row (index 1): Still not enough data points for a complete window of size 3, so the result is
NaN
. - Third Row (index 2):
- Window: [1, 2, 3]
- Sum: 1 + 2 + 3 = 6
- Fourth Row (index 3):
- Window: [2, 3, 4]
- Sum: 2 + 3 + 4 = 9
- Fifth Row (index 4):
- Window: [3, 4, 5]
- Sum: 3 + 4 + 5 = 12
- Sixth Row (index 5):
- Window: [4, 5, 6]
- Sum: 4 + 5 + 6 = 15
- Seventh Row (index 6):
- Window: [5, 6, 7]
- Sum: 5 + 6 + 7 = 18
- Eighth Row (index 7):
- Window: [6, 7, 8]
- Sum: 6 + 7 + 8 = 21
- Ninth Row (index 8):
- Window: [7, 8, 9]
- Sum: 7 + 8 + 9 = 24
- Tenth Row (index 9):
- Window: [8, 9, 10]
- Sum: 8 + 9 + 10 = 27
The rolling_sum
column is populated with the sum of the values within each rolling window. The window “rolls” over the data by one position at a time, providing a moving sum across the DataFrame.
Example 3: Rolling Standard Deviation
Another common use case is the rolling standard deviation.
# Calculate the rolling standard deviation with a window size of 3
df['rolling_std'] = df['value'].rolling(window=3).std()
print(df)
Output:
value rolling_mean rolling_sum rolling_std
0 1 NaN NaN NaN
1 2 NaN NaN NaN
2 3 2.0 6.0 1.0
3 4 3.0 9.0 1.0
4 5 4.0 12.0 1.0
5 6 5.0 15.0 1.0
6 7 6.0 18.0 1.0
7 8 7.0 21.0 1.0
8 9 8.0 24.0 1.0
9 10 9.0 27.0 1.0
In this case, we calculate the rolling standard deviation with a window size of 3. The first two rows have NaN
because there are not enough data points. From the third row onwards, each value shows the standard deviation of the current value and the two preceding values, indicating the variability within each rolling window.
For an instance, let’s consider Fourth Row (index 3), with a window size of 3, the rolling standard deviation calculation includes the values at indices 1, 2, and 3.
- Window: [2, 3, 4]
- Mean: (2 + 3 + 4) / 3 = 3.0
- Variance: [(2 – 3)² + (3 – 3)² + (4 – 3)²] / 2 (Note: Using Bessel’s correction, n-1 in the denominator)
- = [(2 – 3)² + (3 – 3)² + (4 – 3)²] / 2
- = [1 + 0 + 1] / 2
- = 2 / 2 = 1
- Standard Deviation: √(1) = 1.0
So, the rolling standard deviation for the 4th row (index 3) is 1.0, which aligns with the output shown.
Advanced Usage
Example 4: Centered Window
You can center the rolling window labels by setting center=True
.
# Calculate the rolling mean with a window size of 3 and center the window
df['rolling_mean_centered'] = df['value'].rolling(window=3, center=True).mean()
print(df)
Output:
value rolling_mean rolling_sum rolling_std rolling_mean_centered
0 1 NaN NaN NaN NaN
1 2 NaN NaN NaN 2.0
2 3 2.0 6.0 1.0 3.0
3 4 3.0 9.0 1.0 4.0
4 5 4.0 12.0 1.0 5.0
5 6 5.0 15.0 1.0 6.0
6 7 6.0 18.0 1.0 7.0
7 8 7.0 21.0 1.0 8.0
8 9 8.0 24.0 1.0 9.0
9 10 9.0 27.0 1.0 NaN
This example demonstrates calculating the rolling mean with a centered window of size 3. The output shows NaN
for the first and last rows because a centered window cannot be formed. The remaining rows present the mean of the current value and its immediate neighbors, with the window centered around each row.
In an instance, for the 4th row (index 3), with a window size of 3 and centered window, the rolling mean calculation includes the values at indices 2, 3, and 4.
- Window: [3, 4, 5]
- Mean: (3 + 4 + 5) / 3 = 12 / 3 = 4.0
So, the centered window rolling mean for the 4th row (index 3) is 4.0.
Example 5: Rolling Window with Minimum Periods
Specify min_periods
to determine the minimum number of observations required to calculate a value.
# Calculate the rolling mean with a window size of 3 and min_periods of 1
df['rolling_mean_min_prds'] = df['value'].rolling(window=3, min_periods=1).mean()
print(df)
Output:
value rolling_mean rolling_sum rolling_std rolling_mean_centered rolling_mean_min_prds
0 1 NaN NaN NaN NaN 1.0
1 2 NaN NaN NaN 2.0 1.5
2 3 2.0 6.0 1.0 3.0 2.0
3 4 3.0 9.0 1.0 4.0 3.0
4 5 4.0 12.0 1.0 5.0 4.0
5 6 5.0 15.0 1.0 6.0 5.0
6 7 6.0 18.0 1.0 7.0 6.0
7 8 7.0 21.0 1.0 8.0 7.0
8 9 8.0 24.0 1.0 9.0 8.0
9 10 9.0 27.0 1.0 NaN 9.0
Here, we compute the rolling mean with a window size of 3 and a minimum period of 1. Unlike previous examples, no rows have NaN
because the minimum period condition allows calculations with fewer data points. The result shows the mean of available values within each window, starting with a single value and expanding as more data points are included.
For an instance, let’s consider the 4th row (index 3), with a window size of 3 and min_periods=1
, the rolling mean calculation still includes the values at indices 1, 2, and 3 since there are at least 1 period within the window.
- Window: [2, 3, 4]
- Mean: (2 + 3 + 4) / 3 = 9 / 3 = 3.0
So, the rolling mean with minimum periods for the 4th row (index 3) is 3.0.
Conclusion
The rolling()
method in Pandas is a powerful tool for performing rolling window calculations. By understanding its parameters and capabilities, you can effectively smooth time series data, calculate moving averages, sums, standard deviations, and more. Experiment with different window sizes, minimum periods, and centered windows to best fit your data analysis needs.
Also Explore: