Merging Datasets from Different Timescales

Written by stylianoskampakis | Published 2022/12/04
Tech Story Tags: data | datasets | deep-learning | timescale | data-structures | data-analytics | optimization | tips

TLDRUsing deep neural networks it is possible to do this in a smooth manner. You can create two subnetworks: one network reads the daily data, the other network reads monthly data. The outputs of the two networks are then joined together, before they are passed in another layer. This is done through the use of a layer, like [LSTM] or [1d convolution/layers/convolution_layers] The code below shows how you could do this for the two datasets outlined above.via the TL;DR App

Handling frequency in machine learning

One of the trickiest situations in machine learning is when you have to deal with datasets coming from different time scales. Let’s say, for example, that you are handling financial data, and some of the data is collected monthly (e.g. sales reports), and some other data comes at a daily frequency (e.g. stock market prices). How can you create a model that utilizes both pieces of information at the same time?

One solution is to try and create aggregates of the higher frequency features. So, for example, in this case, you can aggregate the daily frequency features on a monthly level using functions like the mean, and the standard deviation. However, this makes you use lose information, and it might be a suboptimal solution.

A better solution can arrive through the use of deep learning.

Deep learning for handling data with different frequency

Using deep neural networks it is possible to do this vert smoothly. You can create two subnetworks: one network reads the daily data, and the other network reads the monthly data. The outputs of the two subnetworks are then joined together before they are passed into another layer. The code below shows how you could do this for the two datasets we outlined above.

import pandas as pd from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.layers import

X_day=pd.read_csv('day_data.csv') X_month=pd.read_csv('month_data.csv')
day_input = keras.Input(shape=day_X.shape[1:3], name="day_input")
monthly_input = keras.Input(shape=month_X.shape[1], name="monthly_input")
x1 = LSTM(50)(day_input)
x2 = Dense(num_units,activation='elu')(monthly_input)
merging=Concatenate()([x1,x2])
x = Dense(100,activation='elu')(merging) x = BatchNormalization()(x) 
x = Dropout(0.2)(x) y = Dense(3,'softmax')(x)

The benefit of using deep learning

The benefit of using deep learning, in this case, is that you are not losing any information which would have been lost otherwise by aggregating features together. This is done through the use of a layer, like LSTM, GRU or 1d convolution, which can read sequential data. We are simply using a dense layer for the monthly data (processing each month in one batch), and then we can merge the two subnetworks together.

So, make sure to check this trick out next time you are faced with this problem!


Also published here.


Written by stylianoskampakis | My name is Stylianos (Stelios) Kampakis and I am a data scientist.
Published by HackerNoon on 2022/12/04