Before you go, check out these stories!

0
Hackernoon logoKey Tactics The Pros Use For Feature Extraction From Time Series by@sharmi1206

Key Tactics The Pros Use For Feature Extraction From Time Series

Author profile picture

@sharmi1206Sharmistha Chatterjee

https://www.linkedin.com/in/sharmistha-chatterjee-7a186310/

Introduction and Motivation

It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need toĀ detect unusual or anomalous time series.Ā For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on servers/IoT device performances are collected every hour for each of thousands of servers in order to identify servers/devices that are behaving unusually.

Python libraryĀ tsfeatureĀ helps to computeĀ a vector of features on each time series, measuring different characteristic-features of the series. The features may includeĀ lag correlation, the strength of seasonality, spectral entropy,Ā etc.

In this blog, we discuss about different feature extraction techniques from a time-series and demonstrate with two different time-series.

Popular Feature Extraction Metrics

One of the most commonly used mechanisms of Feature Extraction mechanisms in Data Science ā€“Ā Principal Component Analysis (PCA)Ā is also used in the context of time-series. After applyingĀ Principal Component Analysis(Decomposition)Ā on theĀ features, various bivariate outlier detection methodsĀ can be applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. TheĀ bivariate outlier detectionĀ methods used are based on the highest density regions.

A change in theĀ variance or volatilityĀ over time can cause problems when modeling time series with classical methods likeĀ ARIMA.

TheĀ ARCH orĀ Autoregressive Conditional HeteroskedasticityĀ method plays a vital role inĀ time-series highly volatile modelsĀ like aĀ stock prediction to measure theĀ change inĀ varianceĀ that isĀ time-dependent, such as increasing or decreasing volatility.

Below, we state some of theĀ time-series features, functionality, and their description.

The following code snippet shows how we canĀ extract relevant features with one line of codeĀ for each feature.

Source Code

tsf_hp = tf.holt_parameters(df2['# Direct_1'].values)
print(tsf_hp)

tsf_centrpy = tf.count_entropy(df2['# Direct_1'].values)
print(tsf_centrpy)

tsf_crossing_points =tf.crossing_points(df2['# Direct_1'].values)
print(tsf_centrpy)

tsf_entropy =tf.entropy(df2['# Direct_1'].values)
print(tsf_entropy)

tsf_flat_spots =tf.flat_spots(df2['# Direct_1'].values)
print(tsf_flat_spots)

tsf_frequency =tf.frequency(df2['# Direct_1'].values)
print(tsf_frequency)

tsf_heterogeneity = tf.heterogeneity(df2['# Direct_1'].values)
print(tsf_heterogeneity)

tsf_guerrero =tf.guerrero(df2['# Direct_1'].values)
print(tsf_guerrero)

tsf_hurst = tf.hurst(df2['# Direct_1'].values)
print(tsf_hurst)

tsf_hw_parameters = tf.hw_parameters(df2['# Direct_1'].values)
print(tsf_hw_parameters)

tsf_intv = tf.intervals(df2['# Direct_1'].values)
print(tsf_intv)

tsf_lmp = tf.lumpiness(df2['# Direct_1'].values)
print(tsf_lmp)

tsf_acf = tf.acf_features(df2['# Direct_1'].values)
print(tsf_acf)

tsf_arch_stat = tf.arch_stat(df2['# Direct_1'].values)
print(tsf_arch_stat)

tsf_pacf = tf.pacf_features(df2['# Direct_1'].values)
print(tsf_pacf)

tsf_sparsity = tf.sparsity(df2['# Direct_1'].values)
print(tsf_sparsity)

tsf_stability = tf.stability(df2['# Direct_1'].values)
print(tsf_stability)

tsf_stl_features = tf.stl_features(df2['# Direct_1'].values)
print(tsf_stl_features)

tsf_unitroot_kpss = tf.unitroot_kpss(df2['# Direct_1'].values)
print(tsf_unitroot_kpss)

tsf_unitroot_pp = tf.unitroot_pp(df2['# Direct_1'].values)
print(tsf_unitroot_pp)

The results section illustratesĀ the values of extracted featuresĀ fromĀ Fetal ECG.

Results Time Series -1 (Data from Fetal ECG)

The below figure illustrates a time series of data collectedĀ from Fetal ECG from where features have been extracted.

{'alpha': 0.9998016430979507, 'beta': 0.5262228301908355}
{'count_entropy': 1.783469256071135}
{'crossing_points': 436}
{'entropy': 0.6493414196542769}
{'flat_spots': 131}
{'frequency': 1}
{'arch_acf': 0.3347171050143251, 'garch_acf': 0.3347171050143251, 'arch_r2': 0.14089508110660665, 'garch_r2': 0.14089508110660665}
{'hurst': 0.4931972012451876}
{'hw_alpha': nan, 'hw_beta': nan, 'hw_gamma': nan}
{'intervals_mean': 2516.801557547009, 'intervals_sd': nan}
{'guerrero': nan}
{'lumpiness': 0.01205944072461473}
{'x_acf1': 0.8262122472240574, 'x_acf10': 3.079891123506255, 'diff1_acf1': -0.27648384824011435, 'diff1_acf10': 0.08236265771293629, 'diff2_acf1': -0.5980110240921641, 'diff2_acf10': 0.3724461872893135}
{'arch_lm': 0.7064704126082555}
{'x_pacf5': 0.7303549429779813, 'diff1x_pacf5': 0.09311680507880443, 'diff2x_pacf5': 0.7105000333917864}
{'sparsity': 0.0}
{'stability': 0.16986190432765097}
{'nperiods': 0, 'seasonal_period': 1, 'trend': nan, 'spike': nan, 'linearity': nan, 'curvature': nan, 'e_acf1': nan, 'e_acf10': nan}
{'unitroot_kpss': 0.06485903737928193}
{'unitroot_pp': -908.3309773009415}

The results section illustratesĀ the values of extracted featuresĀ for date wise temperature variation.

Result Time-Series 2 (Data from Daily Temperature)

{'alpha': 0.4387345064923509, 'beta': 0.0}
{'count_entropy': -101348.71338310161}
{'crossing_points': 706}
{'entropy': 0.5089893350876903}
{'flat_spots': 10}
{'frequency': 1}
{'arch_acf': 0.016273743642920828, 'garch_acf': 0.016273743642920828, 'arch_r2': 0.015091960217949008, 'garch_r2': 0.015091960217949008}
{'hurst': 0.5716257806690483}
{'hw_alpha': nan, 'hw_beta': nan, 'hw_gamma': nan}
{'intervals_mean': 1216.0, 'intervals_sd': 1299.2740280633643}
{'guerrero': nan}
{'lumpiness': 5.464398615083545e-05}
{'x_acf1': -0.0005483958183129098, 'x_acf10': 3.0147995912148108e-06, 'diff1_acf1': -0.5, 'diff1_acf10': 0.25, 'diff2_acf1': -0.6666666666666666, 'diff2_acf10': 0.4722222222222222}
{'arch_lm': 3.6528279285796827e-06}
{'nonlinearity': 0.0}
{'x_pacf5': 1.5086491342316237e-06, 'diff1x_pacf5': 0.49138888888888893, 'diff2x_pacf5': 1.04718820861678}
{'sparsity': 0.0}
{'stability': 5.464398615083545e-05}
{'nperiods': 0, 'seasonal_period': 1, 'trend': nan, 'spike': nan, 'linearity': nan, 'curvature': nan, 'e_acf1': nan, 'e_acf10': nan}
{'unitroot_kpss': 0.29884876591708787}
{'unitroot_pp': -3643.7791982866393}

Conclusion

  • In this blog, we discuss easy steps to extract features from time series (both time-series have seasonality =1), that can help us in discovering anomalies.
  • It is evident from the computed metrics that the first series is more stable (higher value as given by theĀ stability and entropy factor) as the time-stamped data is for a longer period with relatively few fluctuations compared to its entire period.
  • The second time series exhibitsĀ higher fluctuationsĀ as demonstrated by a high number of crossing points.
  • Consequently, we also observe that the second time-series also has a lowerĀ lumpinessĀ andĀ intervals mean, signifying a lower variance of variance.Ā unirooot_kpss and uniroot_pp reveal, the existence of a unit root in the vector which in both the time-series is less than 1 and negative respectively.
  • tsfeatureĀ also supports evaluation of custom functions that come as a NumPy array as input and returns a dictionary with the feature name as a key and its value

References

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.