In the previous article, we explored key concepts and techniques for understanding and preprocessing time-series data. This included visualization, decomposition, stationarity assessment, autocorrelation analysis, and outlier detection. With a solid grasp of the data's characteristics, we are now prepared to model the time-series and generate forecasts.
For this task, we will leverage AutoGluon, an AutoML toolkit that streamlines the process of training and evaluating multiple models in parallel. AutoGluon's time-series capabilities allow us to quickly experiment with a diverse set of modeling techniques, identify the best-performing approaches, and explore their strengths and limitations.
We continue working with the hourly U.S. energy consumption dataset from 2015 to 2018. After importing the necessary libraries and preprocessing steps, we convert the data into the TimeSeriesDataFrame format required by AutoGluon:
import pandas as pd
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
df = pd.read_csv('AEP_hourly.csv', sep=',')
# ... preprocessing steps ...
single_series_ts = TimeSeriesDataFrame.from_data_frame(
df_single_series,
id_column='item_id',
timestamp_column='datetime'
)
To assess model performance, we split the data into training and test sets, reserving the most recent 30 days (720 hours) for testing:
train_df_single, test_df_single = single_series_ts.train_test_split(24*30)
We begin by modeling the electricity consumption as a single univariate time-series, excluding any additional covariate information:
predictor = TimeSeriesPredictor(
prediction_length=30*24,
target='elec_cons',
eval_metric='RMSE',
freq='H'
)
predictor.fit(
train_df_single,
presets='fast_training',
num_val_windows=6,
refit_full=True
)
AutoGluon's TimeSeriesPredictor
fits a variety of models in parallel, evaluating their performance on validation windows from the training data. This allows the toolkit to automatically select the best-performing models and stack/ensemble them for robust predictions. Please note that the configuration we have chosen for this example uses ‘fast_training’ configuration, and there are three more configurations that train the models as follows:
We can inspect the leaderboard of models evaluated during training and on the test set:
predictor.leaderboard(train_df_single)
predictor.leaderboard(test_df_single)
If you observe that the test score is lower than the training data, it indicates that the model is not fully optimized, suggesting the potential for further improvement through comprehensive hyperparameter tuning.
To quantify the accuracy, we compute error metrics like root mean squared error (RMSE), mean absolute percentage error (MAPE), and mean absolute scaled error (MASE) on the test set:
error_dict = predictor.evaluate(test_df_single, metrics=['MAPE', 'RMSE', 'MASE'])
While univariate models rely solely on the target series' historical values, we can potentially improve forecasting accuracy by incorporating additional covariate information. In our case, we can provide relevant datetime-derived features like the day of the week, week of the year, the hour of the day, etc. as known covariates:
known_covariates = ['weekday', 'week', 'day', 'hour', 'date', 'month', 'year']
full_df = TimeSeriesDataFrame.from_data_frame(
df,
id_column='item_id',
timestamp_column='datetime'
)
predictor_full = TimeSeriesPredictor(
prediction_length=30*24,
target='elec_cons',
eval_metric='RMSE',
known_covariates_names=known_covariates,
freq='H'
)
We then fit this multivariate model, again, utilizing AutoGluon's automated machine-learning capabilities:
predictor_full.fit(
train_df_full,
presets='fast_training',
num_val_windows=6,
refit_full=True
)
Inspecting the leaderboards and evaluating the test set allows us to assess the incremental impact of including covariate information compared to the univariate approach.
predictor.leaderboard(test_df_full)
The error metrics computed on the test sets quantify the predictive performance of the single and multivariate time-series models. We can analyze these results to understand the relative strengths of each approach and identify areas for further investigation or improvement.
Some potential findings and insights:
While this automated modeling process is efficient for rapid experimentation, careful data preparation, feature engineering, and human oversight remain critical for achieving optimal results. The leaderboard model performances provide a starting point for iterative improvements and deeper technique-specific tuning.
In this article, we demonstrated how to leverage AutoGluon's AutoML capabilities to streamline the process of training and evaluating diverse time-series forecasting models. We explored both univariate and multivariate modeling approaches, incorporating relevant covariate information to potentially boost predictive performance.
The empirical results and error analyses facilitated by AutoGluon enable practitioners to quickly identify high-performing model classes and gain insights into their respective strengths and limitations for the given data characteristics. This serves as a strong foundation for further tuning, ensembling, and developing bespoke solutions tailored to the specific time-series forecasting problem at hand.