analysis plays a critical role in the decision-making process for an array of industries and domains. Whether it's forecasting stock prices, predicting customer behavior, or analyzing sensor data, understanding and harnessing the patterns and trends within time-dependent data can provide invaluable insights. Time series However, time series analysis comes with its fair share of challenges, including handling seasonality, dealing with missing values, and selecting appropriate forecasting models. This article will outline some effective methods for addressing these challenges, offering practical strategies and techniques that can enhance the accuracy and reliability of time series analysis. Table of contents Cross-Validation for Time Series Decomposition and Transformations of Time Series Feature Engineering for Time Series Modelling in Time Series Cross-Validation for Time Series When performing cross-validation on time series data, we must respect the temporal order of observations. This means that classical methods like K-Fold cross-validation are not appropriate, as they randomly shuffle the data, destroying the temporal dependencies. There are two main methods for performing cross-validation on time series data: the "Rolling Forecast Origin" and the "Expanding Window" methods. Rolling Forecast Origin In this method, we use a sliding window of data for training, and a test set immediately following the training window. After each iteration, we "roll" the window forward in time. This is also known as "walk-forward" validation. In the next example, the data is split into 5 folds. In each iteration, the model is trained on the first fold, then the first and second folds, and so on, testing on the next fold each time. from sklearn.model_selection import TimeSeriesSplit import numpy as np # Assume 'X' and 'y' are your features and target tscv = TimeSeriesSplit(n_splits=5) for train_index, test_index in tscv.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] # Fit and evaluate your model here Expanding Window The expanding window method is similar to the rolling forecast origin method, but instead of using a fixed-size sliding window, the training set expands over time to include more data. In this example, we set to ensure the training set size expands in each iteration. The model is first trained on the first fold, then the first and second folds, and so on, testing on the next fold each time. max_train_size=None from sklearn.model_selection import TimeSeriesSplit import numpy as np # Assume 'X' and 'y' are your features and target tscv = TimeSeriesSplit(n_splits=5, max_train_size=None) for train_index, test_index in tscv.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] # Fit and evaluate your model here Decomposition and Transformation of Time Series The transformation of time series into constituent parts can significantly facilitate analysis. This process falls into two categories: decomposing and transforming. Decomposition of Time Series Time series decomposition is a technique used to disassemble a time series into its constituent components to facilitate the understanding and analysis of underlying patterns and trends. The main components of a time series are: Trend: This denotes the long-term movement or overall direction of the time series data. It represents the underlying pattern or the general course the data is following. Seasonality: These are repeating patterns or fluctuations in the time series data that occur at regular intervals, such as daily, weekly, monthly, or yearly. Seasonality is driven by external factors like weather, holidays, or other seasonal events. Irregular Component: Also known as noise or residual, this is the unexplained variation in the time series data after removing the trend, seasonality, and cyclical components. It represents random, unpredictable fluctuations that cannot be attributed to any systematic pattern. In Python, can be performed using the or libraries: Decomposition statsmodels statsforecast from statsforecast import StatsForecast from statsforecast.models import MSTL, AutoARIMA models = [ MSTL( season_length=[12 * 7], # seasonalities of the time series trend_forecaster=AutoARIMA(), # model used to forecast trend ) ] sf = StatsForecast( models=models, # model used to fit each time series freq="D", # frequency of the data ) sf = sf.fit(data) test = sf.fitted_[0, 0].model_ fig, ax = plt.subplots(1, 1, figsize=(10, 8)) test.plot(ax=ax, subplots=True, grid=True) plt.tight_layout() plt.show() Various models handle these components differently. Linear models and Prophet have built-in handling for all of these components. However, other models, such as Boosting Trees and Neural Networks, require explicit data transformation and feature engineering. One general approach after decomposition is to model each component separately. Once the individual forecasts for each component are obtained, they can be combined to form the final prediction. This approach allows for more granular analysis of the time series data and can often lead to improved forecasting results. Transformation of Time Series Transformations are often applied to time series data to stabilize the variance, reduce the impact of outliers, improve the overall forecasting accuracy, and make the series more conducive to modeling. Here are a few common types of transformations: : Differencing involves subtracting the previous value (t-1) from the current value (t). This transformation is primarily used to remove the trend and seasonality, making the data stationary—a requirement for certain time series models like ARIMA. Differencing In Python, differencing can be performed using the diff() function in pandas: import pandas as pd df['diff'] = df['col'].diff() : Log transformation is useful when dealing with data that exhibits exponential growth or has multiplicative seasonality. This transformation can help to stabilize variance and reduce heteroscedasticity. In library can be used to apply a log transformation: Log Transformation Python, the numpy import numpy as np df['log_col'] = np.log(df['col']) : Square root transformation is used to stabilize variance and minimize the impact of extreme values or outliers. This transformation can be done using numpy: Square Root Transformation df['sqrt_col'] = np.sqrt(df['col']) : This transformation includes both log and square root transformations as special cases and helps stabilize the variance and make the data more normally distributed. The Box-Cox transformation can be represented as Box-Cox Transformation The SciPy library in Python provides a function for the Box-Cox transformation: from scipy import stats df['boxcox_col'], lambda_param = stats.boxcox(df['col']) After applying these transformations, it's crucial to remember that any forecasts made with the transformed data will need to be retransformed to the original scale before interpretation. For instance, if a log transformation was applied, the forecasted values should be exponentiated. The choice of transformation depends on the specific problem, the characteristics of the data, and the assumptions of the subsequent modeling process. Each transformation has its strengths and weaknesses, and there is no one-size-fits-all solution. As such, it's recommended to experiment with different transformations and choose the one that results in the best model performance. Feature Engineering for Time Series Feature engineering in time series refers to the process of creating meaningful features or predictors from raw time series data to improve the performance of machine learning models. In contrast to tabular data problems, time series allows an additional dimension for moving around, that is, time. Some effective feature engineering techniques for time series include: : These capture past behavior or dependencies and are extremely useful in solutions when created by shifting the time series by a specific number of time steps (lags). Lag Features : These involve computing summary statistics (e.g., mean, median, standard deviation) over a moving or expanding the window of time to capture local patterns and trends. Rolling and Expanding Window Statistics : These help describe the behavior of each component for the model. Seasonal and Trend Features : Details like the hour of the day, day of the week, month, or quarter can help capture recurring patterns in the data. Date/Time-Based Features These can be useful to uncover periodic patterns in the data. Fourier Transformations: Use of Derivatives in Time Series Derivatives in time series refer to the rate of change of a time series variable with respect to time. They measure how a variable's value changes over time and can be used as features in time series analysis or machine learning models as they provide information about the underlying trends, seasonality, and other patterns in the data. Here are some ways derivatives can be used as features in time series analysis: First-Order Derivative: This represents the instantaneous rate of change of a variable with respect to time. It can help identify the direction and magnitude of the trend in the data. Second-Order Derivative: This represents the rate of change of the first-order derivative. It can help identify acceleration or deceleration in the trend and detect points of inflection. Seasonal Derivatives: These help capture seasonal variations in the data. Cross-Derivative: This helps to capture the influence of pairwise variables. All the other feature engineering approaches with code examples and explanations can be found in previous articles: https://hackernoon.com/must-know-base-tips-for-feature-engineering-with-time-series-data?embedable=true https://hackernoon.com/advanced-techniques-for-time-series-data-feature-engineering?embedable=true Modeling in Time Series Using baseline models in time series forecasting is a crucial step in the model evaluation as it helps establish a performance benchmark for more complex models. Exponential smoothing, ARIMA, and Prophet are all still excellent choices for baseline models. Exponential Smoothing In this example, we'll use the Holt-Winters method of exponential smoothing, which can handle trends and seasonality. The library provides an implementation of this method. statsmodels from statsmodels.tsa.api import ExponentialSmoothing # Assume 'df' is a pandas DataFrame and 'col' is the time series to be forecasted y = df['col'] # Fit the model model = ExponentialSmoothing(y, seasonal_periods=12, trend='add', seasonal='add') model_fit = model.fit() # Forecast the next 5 periods forecast = model_fit.forecast(5) ARIMA AutoRegressive Integrated Moving Average (ARIMA) is a popular method for forecasting univariate time series data. The library provides an implementation of ARIMA. statsmodels from statsmodels.tsa.arima.model import ARIMA # Fit the model model = ARIMA(y, order=(1,1,1)) # (p, d, q) parameters model_fit = model.fit() # Forecast the next 5 periods forecast = model_fit.forecast(steps=5) The parameters represent the order of the autoregressive, integrated, and moving average parts of the model, respectively. (p, d, q) Prophet Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It is developed by Facebook. from fbprophet import Prophet # The input to Prophet is always a DataFrame with two columns: ds and y. df = df.rename(columns={'date': 'ds', 'col': 'y'}) # Fit the model model = Prophet() model.fit(df) # Create DataFrame for future predictions future = model.make_future_dataframe(periods=5) # Forecast the next 5 periods forecast = model.predict(future) In these examples, the model parameters (such as the order of the ARIMA model and the type of trend and seasonality in the Exponential Smoothing model) are chosen somewhat arbitrarily. In practice, you would use model diagnostics, cross-validation, and possibly grid search to find the optimal parameters for your specific data. On the other hand, there are new libraries with various built-in models like Nixlab that offer different perspectives. However, as baseline models, these methods provide a good starting point for time series forecasting and cover a wide range of time series characteristics. Advanced models like boosting trees, Neural Networks (NNs), N-BEATS, and Temporal Fusion Transformers (TFT) can be employed after getting some baseline scores. Gradient Boosting When it comes to time series analysis, one of the primary advantages of gradient boosting is that it does not make any assumptions about the underlying data distribution or the relationships between variables. This makes it highly flexible and capable of modeling complex, non-linear relationships, which are often present in time series data. While the application of gradient boosting to time series forecasting is not as straightforward as traditional time series models like ARIMA, it can still be highly effective. The key is to frame the forecasting problem as a supervised learning problem by creating lagged features that capture the temporal dependencies in the data. Here's an example of how to use the XGBoost library, a highly efficient implementation of gradient boosting, for time series forecasting: import xgboost as xgb from sklearn.metrics import mean_squared_error # Assume 'X_train', 'y_train', 'X_test', and 'y_test' are your training and test datasets # Convert datasets into DMatrix (optimized data structure for XGBoost) dtrain = xgb.DMatrix(X_train, label=y_train) dtest = xgb.DMatrix(X_test) # Define the model parameters params = { 'max_depth': 3, 'eta': 0.01, 'objective': 'reg:squarederror', 'eval_metric': 'rmse' } # Train the model model = xgb.train(params, dtrain, num_boost_round=1000) # Make predictions y_pred = model.predict(dtest) # Evaluate the model mse = mean_squared_error(y_test, y_pred) print(f'Test RMSE: {np.sqrt(mse)}') Recurrent Neural Networks (RNNs) We'll use the LSTM (Long Short-Term Memory) variant of RNNs, which can capture long-term dependencies and is less prone to the vanishing gradient problem. import torch from torch import nn # Assume 'X_train' and 'y_train' are your training dataset # 'n_features' is the number of input features # 'n_steps' is the number of time steps in each sample X_train = torch.tensor(X_train, dtype=torch.float32) y_train = torch.tensor(y_train, dtype=torch.float32) class RNN(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(RNN, self).__init__() self.hidden_size = hidden_size self.rnn = nn.RNN(input_size, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, output_size) def forward(self, x): x, _ = self.rnn(x) x = self.fc(x[:, -1, :]) # we only want the last time step return x # Define the model model = RNN(n_features, 50, 1) # Define loss and optimizer criterion = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # Training loop for epoch in range(200): model.train() output = model(X_train) loss = criterion(output, y_train) optimizer.zero_grad() loss.backward() optimizer.step() # Make a prediction model.eval() x_input = ... # some new input data x_input = torch.tensor(x_input, dtype=torch.float32) x_input = x_input.view((1, n_steps, n_features)) yhat = model(x_input) N-BEATS N-BEATS is a state-of-the-art model for univariate time series forecasting, available in the PyTorch-based package. pytorch-forecasting from pytorch_forecasting import TimeSeriesDataSet, NBeats # Assume 'data' is a pandas DataFrame containing your time series data # 'group_ids' is the column(s) that identifies each time series # 'target' is the column you want to forecast # Create the dataset max_encoder_length = 36 max_prediction_length = 6 training_cutoff = data["time_idx"].max() - max_prediction_length context_length = max_encoder_length prediction_length = max_prediction_length training = TimeSeriesDataSet( data[lambda x: x.time_idx <= training_cutoff], time_idx="time_idx", target="target", group_ids=["group_ids"], max_encoder_length=context_length, max_prediction_length=prediction_length, ) # Create the dataloaders train_dataloader = training.to_dataloader(train=True, batch_size=128, num_workers=0) # Define the model pl.seed_everything(42) trainer = pl.Trainer(gpus=0, gradient_clip_val=0.1) net = NBeats.from_dataset(training, learning_rate=3e-2, weight_decay=1e-2, widths=[32, 512], backcast_loss_ratio=1.0) # Fit the model trainer.fit( net, train_dataloader=train_dataloader, ) # Predict raw_predictions, x = net.predict(val_dataloader, return_x=True) Temporal Fusion Transformers (TFT) TFT is also available in the package. It's a powerful model that can handle multivariate time series and meta-data. pytorch-forecasting # Import libraries import torch import pandas as pd from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, Baseline, Trainer from pytorch_forecasting.metrics import SMAPE from pytorch_forecasting.data.examples import get_stallion_data # Load example data data = get_stallion_data() data["time_idx"] = data["date"].dt.year * 12 + data["date"].dt.month data["time_idx"] -= data["time_idx"].min() # Define dataset max_prediction_length = 6 max_encoder_length = 24 training_cutoff = data["time_idx"].max() - max_prediction_length context_length = max_encoder_length prediction_length = max_prediction_length training = TimeSeriesDataSet( data[lambda x: x.time_idx <= training_cutoff], time_idx="time_idx", target="volume", group_ids=["agency"], min_encoder_length=context_length, max_encoder_length=context_length, min_prediction_length=prediction_length, max_prediction_length=prediction_length, static_categoricals=["agency"], static_reals=["avg_population_2017"], time_varying_known_categoricals=["month"], time_varying_known_reals=["time_idx", "minimum", "mean", "maximum"], time_varying_unknown_categoricals=[], time_varying_unknown_reals=["volume"], ) validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1) # Create dataloaders batch_size = 16 train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0) val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0) # Define model and trainer pl.seed_everything(42) trainer = Trainer( gpus=0, # clipping gradients is a hyperparameter and important to prevent divergance # of the gradient for recurrent neural networks gradient_clip_val=0.1, ) tft = TemporalFusionTransformer.from_dataset( training, learning_rate=0.03, hidden_size=16, lstm_layers=1, dropout=0.1, hidden_continuous_size=8, output_size=7, # 7 quantiles by default loss=SMAPE(), log_interval=10, reduce_on_plateau_patience=4, ) # Fit the model trainer.fit( tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader, ) # Evaluate the model raw_predictions, x = tft.predict(val_dataloader, mode="raw", return_x=True) Most of the tasks will be covered by Boosting trees due to their scalability and solid performance. However, if time and resources are not a constraint, Recurrent Neural Networks (RNNs), N-BEATS, and TFT have shown strong performance in time series tasks, despite their high computational cost and specific scalability. Conclusion In conclusion, time series analysis can benefit from various strategies. Starting with the transformation of data and tasks, to extract more information about the data's nature, followed by introducing time into your features and outcomes. Then, modeling can start with common baselines, continue with, and perhaps finalize with boosting, while not rushing with neural networks and transformers. Lastly, the use of Dickey-Fuller and Kwiatkowski-Phillips-Schmidt-Shin tests together can ensure the robustness of the results. import numpy as np import pandas as pd from statsmodels.tsa.stattools import adfuller, kpss # Assume 'data' is your time series data # Dickey-Fuller test result = adfuller(data) print('ADF Statistic: %f' % result[0]) print('p-value: %f' % result[1]) print('Critical Values:') for key, value in result[4].items(): print('\t%s: %.3f' % (key, value)) # KPSS test result = kpss(data, regression='c') print('\nKPSS Statistic: %f' % result[0]) print('p-value: %f' % result[1]) print('Critical Values:') for key, value in result[3].items(): print('\t%s: %.3f' % (key, value)) Also published here.