What is Time Series Sampling? | Explained with Examples

Time series data is a sequence of data points collected or recorded at specific time intervals. Sampling time series data involves selecting specific time intervals or periods from the dataset. This can be useful for various purposes, such as reducing data volume, analyzing trends, or creating training and test sets for machine learning models.

Common Time Series Sampling Methods
  1. Downsampling: Reducing the frequency of the time series data (e.g., converting hourly data to daily data).
  2. Upsampling: Increasing the frequency of the time series data (e.g., converting daily data to hourly data).
  3. Rolling Windows: Creating overlapping or non-overlapping windows of data to capture trends or patterns.

Let’s explore each of these methods with examples.

1. Downsampling

Downsampling reduces the frequency of the time series data by aggregating data points over a specified time period.

Example: Downsampling

Consider a dataset with hourly data. We want to downsample it to daily data by taking the mean value for each day.

Python
import pandas as pd
import numpy as np

# Generate sample time series data
date_range = pd.date_range(start='2023-01-01', periods=100, freq='H')
data = {'Value': np.random.randn(100)}
df = pd.DataFrame(data, index=date_range)

# Downsample to daily frequency using mean
daily_data = df.resample('D').mean()
print(daily_data)

Output:

                Value
2023-01-01  -0.158193
2023-01-02  -0.024147
2023-01-03  -0.173947
2023-01-04  -0.180742
2023-01-05   0.090837

The example downsampled hourly data to daily data by taking the mean value for each day. This reduces the data frequency and helps in observing daily trends.

2. Upsampling

Upsampling increases the frequency of the time series data by interpolating or filling in new data points.

Example: Upsampling

Consider a dataset with daily data. We want to upsample it to hourly data by forward filling the values.

Python
# Generate sample time series data
date_range = pd.date_range(start='2023-01-01', periods=10, freq='D')
data = {'Value': np.random.randn(10)}
df = pd.DataFrame(data, index=date_range)

# Upsample to hourly frequency and forward fill the values
hourly_data = df.resample('H').ffill()
print(hourly_data.head(20))

Output:

                     Value
2023-01-01 00:00:00 -0.174627
2023-01-01 01:00:00 -0.174627
2023-01-01 02:00:00 -0.174627
2023-01-01 03:00:00 -0.174627
2023-01-01 04:00:00 -0.174627
2023-01-01 05:00:00 -0.174627
2023-01-01 06:00:00 -0.174627
2023-01-01 07:00:00 -0.174627
2023-01-01 08:00:00 -0.174627
2023-01-01 09:00:00 -0.174627
2023-01-01 10:00:00 -0.174627
2023-01-01 11:00:00 -0.174627
2023-01-01 12:00:00 -0.174627
2023-01-01 13:00:00 -0.174627
2023-01-01 14:00:00 -0.174627
2023-01-01 15:00:00 -0.174627
2023-01-01 16:00:00 -0.174627
2023-01-01 17:00:00 -0.174627
2023-01-01 18:00:00 -0.174627
2023-01-01 19:00:00 -0.174627

The example upsampled daily data to hourly data by forward filling the values. This increases the data frequency and fills in missing hourly values with the previous day’s data.

3. Rolling Windows

Rolling windows create overlapping or non-overlapping segments of data to capture trends or patterns over a specified window size.

Example: Rolling Windows

Consider a dataset with daily data. We want to calculate the rolling mean over a 3-day window.

Python
# Generate sample time series data
date_range = pd.date_range(start='2023-01-01', periods=10, freq='D')
data = {'Value': np.random.randn(10)}
df = pd.DataFrame(data, index=date_range)

# Calculate the rolling mean over a 3-day window
rolling_mean = df.rolling(window=3).mean()
print(rolling_mean)

Output:

                Value
2023-01-01       NaN
2023-01-02       NaN
2023-01-03  0.165314
2023-01-04  0.151204
2023-01-05 -0.249631
2023-01-06 -0.145820
2023-01-07 -0.264034
2023-01-08 -0.412276
2023-01-09 -0.416507
2023-01-10 -0.329924

The example calculated the rolling mean over a 3-day window for daily data. This helps in smoothing the data to observe underlying trends over specified periods.

Conclusion

Time series sampling methods in Pandas such as downsampling, upsampling, and rolling windows are powerful tools for analyzing and manipulating time series data. Downsampling helps reduce data volume by aggregating data over larger time intervals. Upsampling increases data frequency by interpolating or filling in values. Rolling windows capture trends and patterns over specified time intervals. Understanding and applying these techniques can enhance your ability to work with time series data effectively.

Experiment with these methods to see how they can be applied to your specific time series analysis tasks. Happy analyzing!

Also Explore:

Leave a Comment