It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on servers/IoT device performances are collected every hour for each of thousands of servers in order to identify servers/devices that are behaving unusually.
Python library tsfeature helps to compute a vector of features on each time series, measuring different characteristic-features of the series. The features may include lag correlation, the strength of seasonality, spectral entropy, etc.
In this blog, we discuss about different feature extraction techniques from a time-series and demonstrate with two different time-series.
One of the most commonly used mechanisms of Feature Extraction mechanisms in Data Science – Principal Component Analysis (PCA) is also used in the context of time-series. After applying Principal Component Analysis(Decomposition) on the features, various bivariate outlier detection methods can be applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on the highest density regions.
A change in the variance or volatility over time can cause problems when modeling time series with classical methods like ARIMA.
The ARCH or Autoregressive Conditional Heteroskedasticity method plays a vital role in time-series highly volatile models like a stock prediction to measure the change in variance that is time-dependent, such as increasing or decreasing volatility.
Below, we state some of the time-series features, functionality, and their description.
The following code snippet shows how we can extract relevant features with one line of code for each feature.
Source Code
tsf_hp = tf.holt_parameters(df2['# Direct_1'].values)
print(tsf_hp)
tsf_centrpy = tf.count_entropy(df2['# Direct_1'].values)
print(tsf_centrpy)
tsf_crossing_points =tf.crossing_points(df2['# Direct_1'].values)
print(tsf_centrpy)
tsf_entropy =tf.entropy(df2['# Direct_1'].values)
print(tsf_entropy)
tsf_flat_spots =tf.flat_spots(df2['# Direct_1'].values)
print(tsf_flat_spots)
tsf_frequency =tf.frequency(df2['# Direct_1'].values)
print(tsf_frequency)
tsf_heterogeneity = tf.heterogeneity(df2['# Direct_1'].values)
print(tsf_heterogeneity)
tsf_guerrero =tf.guerrero(df2['# Direct_1'].values)
print(tsf_guerrero)
tsf_hurst = tf.hurst(df2['# Direct_1'].values)
print(tsf_hurst)
tsf_hw_parameters = tf.hw_parameters(df2['# Direct_1'].values)
print(tsf_hw_parameters)
tsf_intv = tf.intervals(df2['# Direct_1'].values)
print(tsf_intv)
tsf_lmp = tf.lumpiness(df2['# Direct_1'].values)
print(tsf_lmp)
tsf_acf = tf.acf_features(df2['# Direct_1'].values)
print(tsf_acf)
tsf_arch_stat = tf.arch_stat(df2['# Direct_1'].values)
print(tsf_arch_stat)
tsf_pacf = tf.pacf_features(df2['# Direct_1'].values)
print(tsf_pacf)
tsf_sparsity = tf.sparsity(df2['# Direct_1'].values)
print(tsf_sparsity)
tsf_stability = tf.stability(df2['# Direct_1'].values)
print(tsf_stability)
tsf_stl_features = tf.stl_features(df2['# Direct_1'].values)
print(tsf_stl_features)
tsf_unitroot_kpss = tf.unitroot_kpss(df2['# Direct_1'].values)
print(tsf_unitroot_kpss)
tsf_unitroot_pp = tf.unitroot_pp(df2['# Direct_1'].values)
print(tsf_unitroot_pp)
The results section illustrates the values of extracted features from Fetal ECG.
Results Time Series -1 (Data from Fetal ECG)
The below figure illustrates a time series of data collected from Fetal ECG from where features have been extracted.
{'alpha': 0.9998016430979507, 'beta': 0.5262228301908355}
{'count_entropy': 1.783469256071135}
{'crossing_points': 436}
{'entropy': 0.6493414196542769}
{'flat_spots': 131}
{'frequency': 1}
{'arch_acf': 0.3347171050143251, 'garch_acf': 0.3347171050143251, 'arch_r2': 0.14089508110660665, 'garch_r2': 0.14089508110660665}
{'hurst': 0.4931972012451876}
{'hw_alpha': nan, 'hw_beta': nan, 'hw_gamma': nan}
{'intervals_mean': 2516.801557547009, 'intervals_sd': nan}
{'guerrero': nan}
{'lumpiness': 0.01205944072461473}
{'x_acf1': 0.8262122472240574, 'x_acf10': 3.079891123506255, 'diff1_acf1': -0.27648384824011435, 'diff1_acf10': 0.08236265771293629, 'diff2_acf1': -0.5980110240921641, 'diff2_acf10': 0.3724461872893135}
{'arch_lm': 0.7064704126082555}
{'x_pacf5': 0.7303549429779813, 'diff1x_pacf5': 0.09311680507880443, 'diff2x_pacf5': 0.7105000333917864}
{'sparsity': 0.0}
{'stability': 0.16986190432765097}
{'nperiods': 0, 'seasonal_period': 1, 'trend': nan, 'spike': nan, 'linearity': nan, 'curvature': nan, 'e_acf1': nan, 'e_acf10': nan}
{'unitroot_kpss': 0.06485903737928193}
{'unitroot_pp': -908.3309773009415}
The results section illustrates the values of extracted features for date wise temperature variation.
{'alpha': 0.4387345064923509, 'beta': 0.0}
{'count_entropy': -101348.71338310161}
{'crossing_points': 706}
{'entropy': 0.5089893350876903}
{'flat_spots': 10}
{'frequency': 1}
{'arch_acf': 0.016273743642920828, 'garch_acf': 0.016273743642920828, 'arch_r2': 0.015091960217949008, 'garch_r2': 0.015091960217949008}
{'hurst': 0.5716257806690483}
{'hw_alpha': nan, 'hw_beta': nan, 'hw_gamma': nan}
{'intervals_mean': 1216.0, 'intervals_sd': 1299.2740280633643}
{'guerrero': nan}
{'lumpiness': 5.464398615083545e-05}
{'x_acf1': -0.0005483958183129098, 'x_acf10': 3.0147995912148108e-06, 'diff1_acf1': -0.5, 'diff1_acf10': 0.25, 'diff2_acf1': -0.6666666666666666, 'diff2_acf10': 0.4722222222222222}
{'arch_lm': 3.6528279285796827e-06}
{'nonlinearity': 0.0}
{'x_pacf5': 1.5086491342316237e-06, 'diff1x_pacf5': 0.49138888888888893, 'diff2x_pacf5': 1.04718820861678}
{'sparsity': 0.0}
{'stability': 5.464398615083545e-05}
{'nperiods': 0, 'seasonal_period': 1, 'trend': nan, 'spike': nan, 'linearity': nan, 'curvature': nan, 'e_acf1': nan, 'e_acf10': nan}
{'unitroot_kpss': 0.29884876591708787}
{'unitroot_pp': -3643.7791982866393}