Why you should be cautious with neural networks for trading So I built a Deep Neural Network to predict the price of Bitcoin — and it’s astonishingly accurate. Curious? See the prediction results for yourself. Looks pretty accurate, doesn’t it? And before you ask: Yes, the above evaluation was performed on unseen test data — only prior data was used to train the model (more details later). So this is a money-making machine I can use to get rich! Right? In fact, I am giving you the for the above model so that you can use it yourself… code Ok, . Don’t do it. stop right there I repeat: Do not use it for trading. Don’t do it! Don’t be fooled. There is something utterly deceptive about these results. Let me explain. Too Good to be True During the last couple of weeks and months I’ve encountered many articles that take a similar approach to the one presented here and that show graphs of cryptocurrency price predictions that look like the one above. The seemingly stunning accuracy of price predictions should immediately set off alarm bells. These results are obviously too good to be true. “When something looks too good to be true, it usually is.” — Emmy Rossum In the following, I want to demonstrate why this is the case. Don’t get me wrong — my intention is not to undermine the work put into those articles. They are good and deserve the claps they received. In fact, many of those approaches are very accurate — . technically speaking The goal of this article is to bring out why those models are, fallacious and why their predictions are not necessarily suitable for usage in actual trading. in practice, So why exactly is this the case? Let’s take a close look. Predicting the Price of Bitcoin using LSTMs To explain, let me walk you through an example of building a multidimensional neural network to predict the price of Bitcoin that yields the prediction results you saw above. Long Short Term Memory (LSTM) LSTMs are a special kind of , that are particularly suitable for time series problems. Hence, they have become popular when trying to forecast cryptocurrency prices, as well as stock markets. Recurrent Neural Networks (RNN) For in-depth introductions to LSTMs I recommend and article. this this For the present implementation of the LSTM, I used Python and . Keras (You can find the corresponding Jupyter Notebook with the complete code on my Github .) 1. Getting the Data First, I fetched historic Bitcoin price data (you can do this for any other cryptocurrency as well). To do so I used the API from : cryptocompare import jsonimport requestsimport pandas as pd endpoint = 'https://min-api.cryptocompare.com/data/histoday'res = requests.get(endpoint + '?fsym=BTC&tsym=USD&limit=2000')hist = pd.DataFrame(json.loads(res.content)['Data'])hist = hist.set_index('time')hist.index = pd.to_datetime(hist.index, unit='s')hist.head() A snapshot of historic Bitcoin price data. Voilà, historic daily BTC data for the last days, from until . 2000 2012–10–10 2018–04–04 2. Train-Test Split Then, I split the data into a and a set. I used the last of the data for testing, which splits the data on the All data before this date was used for training, all data from this date on was used to test the trained model. Below, I plotted the column of our DataFrame, which is the daily closing price I intended to predict. training test 10% 2017–09–14. close def train_test_split(df, test_size=0.1):split_row = len(df) - int(test_size * len(df))train_data = df.iloc[:split_row]test_data = df.iloc[split_row:]return train_data, test_data def line_plot(line1, line2, label1=None, label2=None, title=''):fig, ax = plt.subplots(1, figsize=(16, 9))ax.plot(line1, label=label1, linewidth=2)ax.plot(line2, label=label2, linewidth=2)ax.set_ylabel('price [USD]', fontsize=14)ax.set_title(title, fontsize=18)ax.legend(loc='best', fontsize=18) train, test = train_test_split(hist, test_size=0.1)line_plot(train.close, test.close, 'training', 'test', 'BTC') Train-test split of historic Bitcoin price data 3. Building the Model For training the LSTM, the data was split into windows of days (this number is arbitrary, I simply chose a week here) and within each window I normalised the data to , i.e. the first entry of each window is and all other values represent the change with respect to the first value. Hence, I am predicting price , rather than absolute price. 7 zero base 0 changes def normalise_zero_base(df):""" Normalise dataframe column-wise to reflect changes withrespect to first entry."""return df / df.iloc[0] - 1 def extract_window_data(df, window=7, zero_base=True):""" Convert dataframe to overlapping sequences/windows oflength `window`."""window_data = []for idx in range(len(df) - window):tmp = df[idx: (idx + window)].copy()if zero_base:tmp = normalise_zero_base(tmp)window_data.append(tmp.values)return np.array(window_data) def prepare_data(df, window=7, zero_base=True, test_size=0.1):""" Prepare data for LSTM. """# train test splittrain_data, test_data = train_test_split(df, test_size) # extract window data X\_train = extract\_window\_data(train\_data, window, zero\_base) X\_test = extract\_window\_data(test\_data, window, zero\_base) # extract targets y\_train = train\_data.close\[window:\].values y\_test = test\_data.close\[window:\].values if zero\_base: y\_train = y\_train / train\_data.close\[:-window\].values - 1 y\_test = y\_test / test\_data.close\[:-window\].values - 1 return train\_data, test\_data, X\_train, X\_test, y\_train, y\_test train, test, X_train, X_test, y_train, y_test = prepare_data(hist) I used a simple neural network with a single LSTM layer consisting of neurons, a dropout factor of , and a Dense layer with a single activation function. In addition, I used as loss function and the optimiser. 20 0.25 linear Mean Absolute Error (MAE) Adam I trained the network for epochs with a batch size of . 50 4 Note: The choice of the network architecture and all parameters is arbitrary and I didn’t optimise for any them, as this is not the focus of this article. def build_lstm_model(input_data, output_size, neurons=20,activ_func='linear', dropout=0.25,loss='mae', optimizer='adam'):model = Sequential() model.add(LSTM(neurons, input\_shape=( input\_data.shape\[1\], input\_data.shape\[2\]))) model.add(Dropout(dropout)) model.add(Dense(units=output\_size)) model.add(Activation(activ\_func)) model.compile(loss=loss, optimizer=optimizer) return model model = build_lstm_model(X_train, output_size=1)history = model.fit(X_train, y_train, epochs=50, batch_size=4) 4. Results Using the trained model to predict on the left-out test set, we obtain the graph shown in the beginning of this article. So what exactly is wrong with these results? Why shouldn’t we use this model for actual trading? Let’s take a closer look and zoom into the last days of the plot. 30 targets = test[target_col][window:]preds = model.predict(X_test).squeeze()# convert change predictions back to actual pricepreds = test.close.values[:-window] * (preds + 1)preds = pd.Series(index=targets.index, data=preds) n = 30 line_plot(targets[-n:], preds[-n:], 'actual', 'prediction') See that? You might have already correctly guessed that the fundamental flaw with this model is that for the prediction of a particular day, it is mostly using the value of the previous day. The prediction line doesn’t seem to be much more than a shifted version of the actual price. In fact, if we adjust the predictions and shift them by a day, this observation becomes even more obvious. line_plot(targets[-n:][:-1], preds[-n:].shift(-1)) As you can see, we suddenly observe an almost perfect match between actual data and predictions, indicating that the model is essentially the price at the previous day. learning These results are exactly what I’ve been seeing in many of the examples using single-point predictions with LSTMs. To make this point clearer, let’s compute the expected as predicted by the model and compare those with the actual returns. returns actual_returns = targets.pct_change()[1:]predicted_returns = preds.pct_change()[1:] Looking at the actual and predicted returns, both in their original form as well as with the applied to them, we obtain the same observation. 1-day-shift Actual and predicted returns. In the left plot predictions are adjusted by a day. Actually, if we compute the correlation between actual and predicted returns both for the original predictions as well as for those adjusted by a day, we can make the following observation: fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 9)) # actual correlationcorr = np.corrcoef(actual_returns, predicted_returns)[0][1]ax1.scatter(actual_returns, predicted_returns, color='k')ax1.set_title('r = {:.2f}'.format(corr), fontsize=18) # shifted correlationshifted_actual = actual_returns[:-1]shifted_predicted = predicted_returns.shift(-1).dropna()corr = np.corrcoef(shifted_actual, shifted_predicted)[0][1]ax2.scatter(shifted_actual, shifted_predicted, color='k')ax2.set_title('r = {:.2f}'.format(corr)); As you can see from the plots above, actual and predicted returns are uncorrelated. Only after applying the on the predictions we obtain highly correlated returns that resemble the returns of the actual bitcoin data. 1-day-shift Summary The goal of the this blogpost was to address the many examples of predictions of cryptocurrency and stock market prices using deep neural networks that I have encountered in the past couple of months — these take a similar approach as the one employed here: Implementing an LSTM using historic price data to predict outcomes. I have demonstrated why these models might not be necessarily viable for actual trading. future Yes, the network is effectively able to learn. But it ends up using a strategy in which predicting a value close to the previous one turns out to be successful in terms of minimising the mean absolute error. However, no matter how accurate the predictions are in terms of the loss error — in practice, the results of single-point prediction models based on , as the one showcased here, remain hard to accomplish and are not particularly useful for trading. historic price data alone Needless to say that more sophisticated approaches of implementing useful LSTMs for price predictions potentially do exist. Using more data, as well as optimising network architecture and hyperparameters are a start. In my opinion, however, there is more potential in incorporating data and features that go beyond historic prices alone. After all, the finance world has already known for long that “ ”. past performance is not an indicator for future outcomes And the same might also hold for cryptocurrencies. Disclaimer: This is not financial advice. The article and the presented model are for educational purposes only. Do not use it for trading or making investment decisions.