With the effect of the pandemic increasing every day and casting a vehemently toxic influence in almost all parts of the world, it becomes important how can we contain the spread of the disease. In an effort to combat the disease every country has increased not only their testing facility but also the amount of medical help and emergency and quarantine centers. Here in this blog, we try to model Single-Step Time Series Prediction, using Deep Learning Models, on the basis of Medical Information available for different states of India. Motivation: Predict Number of Active Cases by Covid-19 Pandemic based on Medical Facilities (Volume of Testing, ICU beds, Ventilators, Isolation Units, etc) using Multi-variate LSTM based forecasting Considering all these factors, it becomes important to have a predictive model that can predict the Number of Active Cases, Deaths and Recoveries based on the change in Medical Facilities as well as other changes in infrastructure. Single Step Time Series Prediction One step time series prediction is a supervised machine learning task that comes with the functionality where the are available when the next value in the time-series is predicted. In contrast, multi-step prediction involves prediction for previous n-values x future steps. The following figure depicts the different life cycle stages of time-series model training and prediction. Source Feeding from a single source or from aggregated sources available directly from the cloud or other 3rd-party providers into the ML modeling data ingestion system. Multi-variate data Cleaning, preprocessing, and feature engineering of the data involving and . scaling normalization Conversion of the data to a . supervised time-series Feeding the data to a deep learning training source that can train different time-series models like using different combinations of LSTM, CNN, BI-LSTM, CNN+LSTM hidden layers, neurons, batch-size, and other hyper-parameters. Forecasting based on or in future either using near term far distant term Single-Step or Multi-Step Forecasting respectively. Evaluation of some of the error metrics like ( ) by comparing it with the actual data, when it comes inRe-training the when the threshold of error exceeds. MAPE, MAE, ME, RMSE, MPE model and continuous improvements Data Loading and Selecting Features As Delhi had high Covid-19 cases, here we model different DL models for the Further we keep the scope of dates from 25th March to 6th June 2020. Data till 29th April has been used for Training, whereas from 30th April to 6th June has been used for testing/prediction. "DELHI" State (National Capital of India). , Here, we have selected features mostly related to the availability of medical facilities like hospitals, ICU beds, the amount of testing facilities, and a number of cured/discharged/migrated/quarantined centers. stateName = unique_states[ ] dataset =list_state_all[ ] dataset = dataset.sort_values(by= , ascending= ) dataset = dataset[(dataset[ ] >= ) & (dataset[ ] <= )] daterange = dataset[ ].values no_Dates = len(daterange) dateStart = daterange[ ] dateEnd = daterange[no_Dates - ] dataset = dataset[[ , , , , , , , , , , , , , , , , , , , , , , ]] 34 34 'Date' True 'Date' '2020-03-25' 'Date' '2020-06-06' 'Date' 0 1 'Total Confirmed cases' 'Death' 'Cured/Discharged/Migrated' 'coronaenquirycalls' 'cumulativepeopleinquarantine' 'negative' 'numcallsstatehelpline' 'numicubeds' 'numisolationbeds' 'numventilators' 'populationncp2019projection' 'positive' 'testpositivityrate' 'testspermillion' 'testsperpositivecase' 'testsperthousand' 'totaln95masks' 'totalpeoplecurrentlyinquarantine' 'totalpeoplereleasedfromquarantine' 'totalppe' 'totaltested' 'unconfirmed' 'Active Cases' As we have 22 features in total, we ensure each of the input features are initially scaled and then are to yield in 22 input features plus one output predicted outcome, i.e. The . The rest of the columns are dropped. The below code snippet explains that in detail. time-shifted by one unit (t+1) th output for t th input Number of Active Cases Feature Scaling This becomes very important given, as in this current problem scope the features vary in the range too much, (10 to 1000000) no_features = np.shape(dataset)[ ] print( , no_features) values = dataset.values values = values.astype( ) print(np.shape(values)) scaler = MinMaxScaler(feature_range=( , )) scaled = scaler.fit_transform(values) reframed = series_to_supervised(scaled, , ) print(np.shape(reframed)) #no_features = 22 1 -1 "No of features" # ensure all data is float 'float32' # normalize features 0 1 1 1 # drop columns we don't want to predict Convert Time-Series to a Supervised DataSet This procedure is known as a in time series which uses lagged (one) observations (e.g. t-1) as input variables to forecast the current time step (t). This ensures all series are stationary with differencing and seasonal adjustment. one-step prediction n_vars = type(data) list data.shape[ ] df = pd.DataFrame(data) cols, names = list(), list() i range(n_in, , ): cols.append(df.shift(i)) names += [( % (j + , i)) j range(n_vars)] i range( , n_out): cols.append(df.shift(-i)) i == : names += [( % (j + )) j range(n_vars)] : names += [( % (j + , i)) j range(n_vars)] agg = pd.concat(cols, axis= ) agg.columns = names dropnan: agg.dropna(inplace= ) agg # # convert series to supervised learning : def series_to_supervised (data, n_in= , n_out= , dropnan=True) 1 1 1 if is else 1 # input sequence (t-n, ... t-1) for in 0 -1 'var%d(t-%d)' 1 for in # forecast sequence (t, t+1, ... t+n) for in 0 if 0 'var%d(t)' 1 for in else 'var%d(t+%d)' 1 for in # put it all together 1 # drop rows with NaN values if True return After the redundant/un-necessary columns are dropped (24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45) the entire dataset is split into training and testing dataset in the and then we apply different deep learning techniques. ratio of 60%:40%, As we train only on the basis of 22 features and predict one output, columns starting from 24 to 45 are dropped. reframed.drop(reframed.columns[[ , , , , , , , , , , , , , , , , , , , , , ]], axis= , inplace= ) values = reframed.values split_factor = int(dataset.shape[ ]* ) print(split_factor) train = values[:split_factor, :] test = values[split_factor:, :] print(np.shape(train)) print(np.shape(test)) train_X, train_y = train[:, : ], train[:, ] test_X, test_y = test[:, : ], test[:, ] print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) train_X = train_X.reshape((train_X.shape[ ], , train_X.shape[ ])) test_X = test_X.reshape((test_X.shape[ ], , test_X.shape[ ])) print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) print(train_X.shape[ ], train_X.shape[ ]) 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 1 True # split into train and test sets 0 0.6 # split into input and outputs -1 -1 -1 -1 # reshape input to be 3D [samples, timesteps, features] 0 1 1 0 1 1 1 2 The following code snippet demonstrates how we train an , plot the t before making a prediction. LSTM model raining and validation loss, model = Sequential() model.add(LSTM(units= , return_sequences= , input_shape=(train_X.shape[ ], train_X.shape[ ]))) model.add(LSTM(units= , return_sequences= )) model.add(LSTM(units= )) model.add(Dense(units= )) model.compile(loss= , optimizer= ) history = model.fit(train_X, train_y, epochs= , batch_size= , validation_data=(test_X, test_y), verbose= , shuffle= ) plt.figure(figsize=( , )) plt.plot(history.history[ ], label= ) plt.plot(history.history[ ], label= ) plt.legend() plt.show() y_predict = model.predict(test_X) test_X = test_X.reshape((test_X.shape[ ], test_X.shape[ ])) print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) # design Stacked LSTM networks 50 True 1 2 50 True 50 1 'mae' 'adam' # fit network 1500 72 2 False # plot history 14 12 'loss' 'train' 'val_loss' 'test' # make a prediction 0 2 Training vs Validation Loss This code snippet shows mechanism to compute the error metrics and inverse scale the predicted outcome. plt.figure(figsize=( , )) plt.plot(test_y, label= ) plt.plot(y_predict, label= ) plt.title( + stateName) plt.legend() plt.show() rmse = np.sqrt(mean_squared_error(test_y, y_predict)) print( % rmse) inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis= ) inv_y_predict = scaler.inverse_transform(inv_y_predict) inv_y_predict = inv_y_predict[:, ] test_y = test_y.reshape((len(test_y), )) inv_y = concatenate((test_y, test_X[:, :]), axis= ) inv_y = scaler.inverse_transform(inv_y) inv_y = inv_y[:, ] rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict)) print( % rmse) pred_len = len(inv_y_predict) print(pred_len) dateEnd = daterange[split_factor+ ] print(dateEnd) pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq= ) print(pred_index) inv_y_actual = pd.Series(inv_y, pred_index) inv_y_predicted = pd.Series(inv_y_predict, pred_index) plt.figure(figsize=( , )) plt.plot(inv_y_actual, label= ) plt.plot(inv_y_predicted, label= ) plt.title( + stateName) plt.legend() plt.show() 14 12 'actual' 'predicted' 'Scaled LSTM based Time Series Active Cases Prediction for state ' 'Test RMSE: %.3f' 1 0 # invert scaling for actual 1 1 1 0 # calculate RMSE 'Test RMSE: %.3f' 1 'D' 14 12 'actual' 'predicted' 'LSTM based Time Series Active Cases Prediction for state ' The below figure illustrates the , after the predicted outcome has been inverse -transformed (to remove the effect of scaling). Actual vs Predicted Outcome of LSTM model Bi-directional LSTM As we know LSTM (Uni-directional) preserves information from inputs to the outputs that have already passed through it using the hidden state. On the contrary, bidirectional will run inputs in two ways, one from past to future and one from future to past. This kind of LSTM that runs backwards to preserve information from the and using the two hidden states combined, it is able in any point in time to preserve information from future both past and future Source The following code snippet demonstrates how we train a , plot the t before making a prediction. Bi-LSTM model raining and validation loss, train = values[:split_factor, :] test = values[split_factor:, :] print(np.shape(train)) print(np.shape(test)) train_X, train_y = train[:, : ], train[:, ] test_X, test_y = test[:, : ], test[:, ] print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) train_X = train_X.reshape((train_X.shape[ ], , train_X.shape[ ])) test_X = test_X.reshape((test_X.shape[ ], , test_X.shape[ ])) print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) print(train_X.shape[ ], train_X.shape[ ]) model = Sequential() model.add(Bidirectional(LSTM( , activation= ), input_shape=(train_X.shape[ ], train_X.shape[ ]))) model.add(Dense( )) model.compile(loss= , optimizer= ) history = model.fit(train_X, train_y, epochs= , batch_size= , validation_data=(test_X, test_y), verbose= , shuffle= ) # split into input and outputs -1 -1 -1 -1 # reshape input to be 3D [samples, timesteps, features] 0 1 1 0 1 1 1 2 # design Stacked LSTM networks/Bi-directional LSTM networks 50 'relu' 1 2 1 'mae' 'adam' # fit network 1500 72 2 False The below figure illustrates the , after the predicted outcome has been inverse -transformed (to remove the effect of scaling). Actual vs Predicted Outcome of Bi-LSTM model CNN (Convolution Neural Network) We also used CNN for evaluating the model performance for single-step time-series prediction. Source The following code snippet demonstrates how we train a , plot the t before making a prediction. CNN model raining and validation loss, train_X, train_y = train[:, : ], train[:, ] test_X, test_y = test[:, : ], test[:, ] print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) train_X = train_X.reshape((train_X.shape[ ], train_X.shape[ ], )) test_X = test_X.reshape((test_X.shape[ ], test_X.shape[ ], )) print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) model = Sequential() model.add(Conv1D(filters= , kernel_size= , activation= , input_shape=(train_X.shape[ ], train_X.shape[ ]))) model.add(MaxPooling1D(pool_size= )) model.add(Conv1D(filters= , kernel_size= , activation= , input_shape=(train_X.shape[ ], train_X.shape[ ]))) model.add(MaxPooling1D(pool_size= )) model.add(Flatten()) model.add(Dense( , activation= )) model.add(Dense( )) model.compile(loss= , optimizer= ) model.summary() history =model.fit(train_X, train_y, epochs= , batch_size= , validation_data=(test_X, test_y), verbose= ,shuffle= ) -1 -1 -1 -1 # reshape input to be 3D [samples, timesteps, features] 0 1 1 0 1 1 #CNN 64 2 'relu' 1 2 2 64 2 'relu' 1 2 2 100 'relu' 1 'mse' 'adam' #fit model 1500 72 2 False Use the trained model for prediction y_predict = model.predict(test_X) test_X = test_X.reshape((test_X.shape[ ], test_X.shape[ ])) print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) # make a prediction 0 1 Inverse Transform Predicted and Computation of Error Metrics inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis= ) inv_y_predict = scaler.inverse_transform(inv_y_predict) inv_y_predict = inv_y_predict[:, ] test_y = test_y.reshape((len(test_y), )) inv_y = concatenate((test_y, test_X[:, :]), axis= ) inv_y = scaler.inverse_transform(inv_y) inv_y = inv_y[:, ] rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict)) print( % rmse) pred_len = len(inv_y_predict) print(pred_len) dateEnd = daterange[split_factor+ ] print(dateEnd) pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq= ) inv_y_actual = pd.Series(inv_y, pred_index) inv_y_predicted = pd.Series(inv_y_predict, pred_index) 1 0 # invert scaling for actual 1 1 1 0 # calculate RMSE 'Test RMSE: %.3f' 1 'D' #print(pred_index) The below figure illustrates the , after the predicted outcome has been inverse -transformed (to remove the effect of scaling). Actual vs Predicted Outcome of CNN model CNN + LSTM Here we have used which is then fed to a , to predicted different sequences, as illustrated by the figure below. Conv1d with TimeDistributed Layer, single layer of LSTM TimeDistributed Layer is primarily used to present several sets of data (say images) that are chronologically ordered to detect movements, actions, directions. train_X, train_y = train[:, : ], train[:, ] test_X, test_y = test[:, : ], test[:, ] print(train_X.shape, train_y.shape, test_X.shape, test_y.shape) subsequences = timesteps = train_X.shape[ ] X_train_series_sub = train_X.reshape((train_X.shape[ ], subsequences, timesteps, )) X_valid_series_sub = test_X.reshape((test_X.shape[ ], subsequences, timesteps, )) print( , X_train_series_sub.shape) print( , X_valid_series_sub.shape) model = Sequential() model.add(TimeDistributed(Conv1D(filters= , kernel_size= , activation= ), input_shape=( , X_train_series_sub.shape[ ], X_valid_series_sub.shape[ ]))) model.add(TimeDistributed(MaxPooling1D(pool_size= ))) model.add(TimeDistributed(Flatten())) model.add(LSTM( , activation= )) model.add(Dense( )) model.compile(loss= , optimizer= ) history = model.fit(X_train_series_sub, train_y, validation_data=(X_valid_series_sub, test_y), epochs= , verbose= ) # split into input and outputs -1 -1 -1 -1 #LSTM + CNN 1 1 0 1 0 1 'Train set shape' 'Validation set shape' 64 1 'relu' None 2 3 2 50 'relu' 1 'mse' 'adam' 1500 2 The prediction and inverse scaling help to yield the actual predicted outcomes. yhat = model.predict(X_valid_series_sub) print(yhat) test_X = X_valid_series_sub.reshape((X_valid_series_sub.shape[ ], X_valid_series_sub.shape[ ])) inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis= ) inv_y_predict = scaler.inverse_transform(inv_y_predict) inv_y_predict = inv_y_predict[:, ] test_y = test_y.reshape((len(test_y), )) inv_y = concatenate((test_y, test_X[:, :]), axis= ) inv_y = scaler.inverse_transform(inv_y) inv_y = inv_y[:, ] rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict)) print( % rmse) pred_len = len(inv_y_predict) print(pred_len) dateEnd = daterange[split_factor+ ] print(dateEnd) pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq= ) inv_y_actual = pd.Series(inv_y, pred_index) inv_y_predicted = pd.Series(inv_y_predict, pred_index) #Prediction (LSTM + CNN) 0 2 1 0 # invert scaling for actual 1 1 1 0 # calculate RMSE 'Test RMSE: %.3f' 1 'D' #print(pred_index) The below figure illustrates the after the predicted outcome has been inverse -transformed (to remove the effect of scaling). Actual vs Predicted Outcome of stacked LSTM and CNN model Epoch 1494/1500 58/58 - 0s - loss: 3.2615e-06 - val_loss: 0.0056 Epoch 1495/1500 58/58 - 0s - loss: 3.3479e-06 - val_loss: 0.0056 Epoch 1496/1500 58/58 - 0s - loss: 3.3705e-06 - val_loss: 0.0053 Epoch 1497/1500 58/58 - 0s - loss: 3.2291e-06 - val_loss: 0.0054 Epoch 1498/1500 58/58 - 0s - loss: 3.0793e-06 - val_loss: 0.0056 Epoch 1499/1500 58/58 - 0s - loss: 3.8484e-06 - val_loss: 0.0055 Epoch 1500/1500 58/58 - 0s - loss: 3.8213e-06 - val_loss: 0.0054 The following table depicts the computed RMSE metrics for each of the deep learning models. Conclusion Here we see , followed by . This is just a basic study and results might differ based on the dataset. In the next blog (series 2 ) we will see different multi-step prediction results. bi-directional LSTM works the best multiple stacked layers of LSTM and single LSTM layer More extensive hyper-parameter tuning is needed along with featuring a dynamic data change in medical facilities and supplies. For complete source code check out https://github.com/sharmi1206/covid-19-analysis References https://arxiv.org/pdf/1801.02143.pdf https://machinelearningmastery.com/multi-step-time-series-forecasting/ https://machinelearningmastery.com/multi-step-time-series-forecasting-with-machine-learning-models-for-household-electricity-consumption/ https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/ https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/ https://www.tensorflow.org/tutorials/structured_data/time_series