## Why you should be cautious with neural networks for trading

So I built a Deep Neural Network to predict the price of Bitcoin — and it’s astonishingly accurate.

Curious?

See the prediction results for yourself.

Looks pretty accurate, doesn’t it?

And before you ask: Yes, the above evaluation was performed on unseen test data — only prior data was used to train the model (more details later).

**So this is a money-making machine I can use to get rich!**

Right?

In fact, I am giving you the code for the above model so that you can use it yourself…

#### Ok, *stop right there*. Don’t do it.

I repeat: *Don’t do it! *Do not use it for trading.

Don’t be fooled.

There is something utterly deceptive about these results.

Let me explain.

### Too Good to be True

During the last couple of weeks and months I’ve encountered many articles that take a similar approach to the one presented here and that show graphs of cryptocurrency price predictions that look like the one above.

The seemingly stunning accuracy of price predictions should immediately set off alarm bells.

*These results are obviously too good to be true.*

“When something looks too good to be true, it usually is.” — Emmy Rossum

In the following, I want to demonstrate why this is the case.

Don’t get me wrong — my intention is not to undermine the work put into those articles. They are good and deserve the claps they received. In fact, many of those approaches are very accurate — *technically speaking*.

The goal of this article is to bring out why those models are, *in practice, *fallacious and why their predictions are not necessarily suitable for usage in actual trading.

So why exactly is this the case? Let’s take a close look.

### Predicting the Price of Bitcoin using LSTMs

To explain, let me walk you through an example of building a multidimensional Long Short Term Memory (LSTM) neural network to predict the price of Bitcoin that yields the prediction results you saw above.

LSTMs are a special kind of Recurrent Neural Networks (RNN), that are particularly suitable for time series problems. Hence, they have become popular when trying to forecast cryptocurrency prices, as well as stock markets.

For in-depth introductions to LSTMs I recommend this and this article.

For the present implementation of the LSTM, I used Python and Keras. *(You can find the corresponding Jupyter Notebook with the complete code **on my Github**.)*

#### 1. Getting the Data

First, I fetched historic Bitcoin price data (you can do this for any other cryptocurrency as well). To do so I used the API from cryptocompare:

import json

import requests

import pandas as pd

endpoint = 'https://min-api.cryptocompare.com/data/histoday'

res = requests.get(endpoint + '?fsym=BTC&tsym=USD&limit=2000')

hist = pd.DataFrame(json.loads(res.content)['Data'])

hist = hist.set_index('time')

hist.index = pd.to_datetime(hist.index, unit='s')

hist.head()

Voilà, historic daily BTC data for the last *2000* days, from *2012–10–10* until *2018–04–04*.

#### 2. Train-Test Split

Then, I split the data into a *training* and a* test *set. I used the last *10%* of the data for testing, which splits the data on the *2017–09–14. *All data before this date was used for training, all data from this date on was used to test the trained model. Below, I plotted the `close`

column of our DataFrame, which is the daily closing price I intended to predict.

def train_test_split(df, test_size=0.1):

split_row = len(df) - int(test_size * len(df))

train_data = df.iloc[:split_row]

test_data = df.iloc[split_row:]

return train_data, test_data

def line_plot(line1, line2, label1=None, label2=None, title=''):

fig, ax = plt.subplots(1, figsize=(16, 9))

ax.plot(line1, label=label1, linewidth=2)

ax.plot(line2, label=label2, linewidth=2)

ax.set_ylabel('price [USD]', fontsize=14)

ax.set_title(title, fontsize=18)

ax.legend(loc='best', fontsize=18)

train, test = train_test_split(hist, test_size=0.1)

line_plot(train.close, test.close, 'training', 'test', 'BTC')

#### 3. Building the Model

For training the LSTM, the data was split into windows of `7`

days (this number is arbitrary, I simply chose a week here) and within each window I normalised the data to* zero base*, i.e. the first entry of each window is `0`

and all other values represent the change with respect to the first value. Hence, I am predicting price *changes*, rather than absolute price.

def normalise_zero_base(df):

""" Normalise dataframe column-wise to reflect changes with

respect to first entry.

"""

return df / df.iloc[0] - 1

def extract_window_data(df, window=7, zero_base=True):

""" Convert dataframe to overlapping sequences/windows of

length `window`.

"""

window_data = []

for idx in range(len(df) - window):

tmp = df[idx: (idx + window)].copy()

if zero_base:

tmp = normalise_zero_base(tmp)

window_data.append(tmp.values)

return np.array(window_data)

def prepare_data(df, window=7, zero_base=True, test_size=0.1):

""" Prepare data for LSTM. """

# train test split

train_data, test_data = train_test_split(df, test_size)

# extract window data

X_train = extract_window_data(train_data, window, zero_base)

X_test = extract_window_data(test_data, window, zero_base)

# extract targets

y_train = train_data.close[window:].values

y_test = test_data.close[window:].values

if zero_base:

y_train = y_train / train_data.close[:-window].values - 1

y_test = y_test / test_data.close[:-window].values - 1

return train_data, test_data, X_train, X_test, y_train, y_test

train, test, X_train, X_test, y_train, y_test = prepare_data(hist)

I used a simple neural network with a single LSTM layer consisting of `20`

neurons, a dropout factor of `0.25`

, and a Dense layer with a single *linear* activation function. In addition, I used *Mean Absolute Error (MAE) *as loss function and the* Adam *optimiser.

I trained the network for `50`

epochs with a batch size of `4`

.

Note: The choice of the network architecture and all parameters is arbitrary and I didn’t optimise for any them, as this is not the focus of this article.

def build_lstm_model(input_data, output_size, neurons=20,

activ_func='linear', dropout=0.25,

loss='mae', optimizer='adam'):

model = Sequential()

model.add(LSTM(neurons, input_shape=(

input_data.shape[1], input_data.shape[2])))

model.add(Dropout(dropout))

model.add(Dense(units=output_size))

model.add(Activation(activ_func))

model.compile(loss=loss, optimizer=optimizer)

return model

model = build_lstm_model(X_train, output_size=1)

history = model.fit(X_train, y_train, epochs=50, batch_size=4)

#### 4. Results

Using the trained model to predict on the left-out test set, we obtain the graph shown in the beginning of this article.

So what exactly is wrong with these results?

Why shouldn’t we use this model for actual trading?

Let’s take a closer look and zoom into the last *30* days of the plot.

targets = test[target_col][window:]

preds = model.predict(X_test).squeeze()

# convert change predictions back to actual price

preds = test.close.values[:-window] * (preds + 1)

preds = pd.Series(index=targets.index, data=preds)

n = 30

line_plot(targets[-n:], preds[-n:], 'actual', 'prediction')

See that?

You might have already correctly guessed that the fundamental flaw with this model is that *for the prediction of a particular day, it is mostly using the value of the previous day.*

**The prediction line doesn’t seem to be much more than a shifted version of the actual price.**

In fact, if we adjust the predictions and shift them by a day, this observation becomes even more obvious.

line_plot(targets[-n:][:-1], preds[-n:].shift(-1))

As you can see, we suddenly observe an almost perfect match between actual data and predictions, indicating that the model is essentially learning the price at the previous day.

These results are exactly what I’ve been seeing in many of the examples using single-point predictions with LSTMs.

To make this point clearer, let’s compute the expected *returns* as predicted by the model and compare those with the actual returns.

actual_returns = targets.pct_change()[1:]

predicted_returns = preds.pct_change()[1:]

Looking at the actual and predicted returns, both in their original form as well as with the *1-day-shift* applied to them, we obtain the same observation.

Actually, if we compute the correlation between actual and predicted returns both for the original predictions as well as for those adjusted by a day, we can make the following observation:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 9))

# actual correlation

corr = np.corrcoef(actual_returns, predicted_returns)[0][1]

ax1.scatter(actual_returns, predicted_returns, color='k')

ax1.set_title('r = {:.2f}'.format(corr), fontsize=18)

# shifted correlation

shifted_actual = actual_returns[:-1]

shifted_predicted = predicted_returns.shift(-1).dropna()

corr = np.corrcoef(shifted_actual, shifted_predicted)[0][1]

ax2.scatter(shifted_actual, shifted_predicted, color='k')

ax2.set_title('r = {:.2f}'.format(corr));

As you can see from the plots above, actual and predicted returns are uncorrelated. Only after applying the *1-day-shift* on the predictions we obtain highly correlated returns that resemble the returns of the actual bitcoin data.

### Summary

The goal of the this blogpost was to address the many examples of predictions of cryptocurrency and stock market prices using deep neural networks that I have encountered in the past couple of months — these take a similar approach as the one employed here: Implementing an LSTM using historic price data to predict future outcomes. I have demonstrated why these models might not be necessarily viable for actual trading.

Yes, the network is effectively able to learn. But it ends up using a strategy in which predicting a value close to the previous one turns out to be successful in terms of minimising the mean absolute error.

However, no matter how accurate the predictions are in terms of the loss error — in practice, the results of single-point prediction models based on *historic price data alone*, as the one showcased here, remain hard to accomplish and are not particularly useful for trading.

Needless to say that more sophisticated approaches of implementing useful LSTMs for price predictions potentially do exist. Using more data, as well as optimising network architecture and hyperparameters are a start. In my opinion, however, there is more potential in incorporating data and features that go beyond historic prices alone. After all, the finance world has already known for long that “past performance is not an indicator for future outcomes”.

And the same might also hold for cryptocurrencies.

*Disclaimer: This is not financial advice. The article and the presented model are for educational purposes only. Do not use it for trading or making investment decisions.*