Unfortunately, most of those attempts haven’t gone past some fancy blog posts of research papers and we are still waiting to see major applications relying on predictive models for crypto assets.
Being a completely digital asset class, crypto assets generate a rich digital footprint that seems logical that can be used to predict their behavior. Additionally, we are currently living in the golden era of machine learning in which there is no lack of platforms and frameworks with sophisticated predictive capabilities available to mainstream developers.
However, the fact of the matter is that predicting financial markets is brutally difficult and some of those challenges are accentuated in the crypto space. At IntoTheBlock, we have been doing a lot of work in predictive models for crypto assets and have been facing some unexpected and yet fascinating challenges.
The crypto market has some very unique characteristics but many of the challenges related to its predictability are also present in traditional capital markets. Financial markets are one of the most complex environments to apply modern machine/deep learning techniques.
Markets are complex, non-linearly interconnected, constantly changing, vulnerable to macro factors and to the psychological behavior of investors and the list goes on and on.
Compare that with building model that recognized a cat in a picture and you will get an idea of what I am talking about 😉
From a conceptual standpoint, the process of predicting the behavior of crypto assets can be abstracted in three main elements:
· The Target: What are you trying to predict?
· The Thesis: What are you trying to predict the target with?
· The Approach: Which method to follow for the prediction model?
While this might sounds simplistic, each one of those steps presents a universe of unique challenges that can drive the most seasoned machine learning practitioners insane 😉
Most predictive approaches in financial markets can be classified in two main groups: asset-based or factor-based. Asset-based models focus on predicting the price of an asset in a given timeframe (ex: the price of Ethereum in the next 2 hours).
These strategies rely on modeling very in-depth characteristics of the target asset to the point that the models can rarely be applied to a different asset. A different prediction strategy is factor-based which tries to predict the impact of a specific characteristic across a group of assets.
For instance, a factor-based strategy can predict that crypto assets with strong momentum tend to outperform crypto assets with weak momentum in the next week.
Both asset-based and factor-based prediction strategies are viable in the crypto space. Asset-based prediction strategies allow for incorporating highly optimized thesis about the behavior of a crypto asset that can produce strong results.
However, these type of strategies introduces a high level of risk as they are completely vulnerable to the behavior of a single asset. Factor-based prediction strategies tend to be more diversified across different assets which means that sometimes they mean the uniqueness of a specific asset class.
For instance, it would be impossible to create a factor-based strategy based on the UTXO metrics are those are not present in account-based blockchains.
In order to predict the price of Bitcoin we need to understand what to predict it with. Yes, differently to what some crypto-experts might have told you, it’s kind of difficult to predict price with price 😉.
The next step in creating a predictive model for crypto asset is to identify the thesis and predictors we would like to use. Crypto is a very unique asset class with some very unique predictive features but it also inherit many of the predictors of traditional capital markets.
In the current state of the crypto markets, there are five fundamental sources that can help formulate a predictive model thesis:
Blockchain: On-chain metrics are a unique source of intelligence in crypto assets. From global metrics like hash rates to inflows and outflows of exchanges, blockchains are an incredible source of predictive characteristics for crypto-assets. Typically, blockchain predictors are more effective in medium frequency strategies that forecast across several days instead of seconds or minutes.
Order Books: Like traditional capital markets, order books are the key source to predict the exchange behavior of crypto assets. Factors such as market imbalances or spread momentum have been the source of many quant predictive strategies in traditional financial markets and remain very relevant in the crypto space. Order book predictors are more effective in high frequency strategies that forecast in short periods of time.
Derivatives: Futures, perpetual swaps, options and other derivatives are becoming an important citizen of crypto markets. From a predictive modeling standpoint, derivatives present unique factors that could be used to predict the behavior of crypto assets. For instance, a predictive model can use the open interest in future contracts to estimate the impact in price after the settlement of those contracts takes place. Typically, derivative factors are more efficient in medium frequency strategies.
Protocol: Another unique source of predictive information in the crypto space is the behavior of the protocols governing a specific asset. Bitcoin halvening events, token issuances or even forks can have major impact in the behavior of cryptocurrencies. This type of factor is mostly used in event-driven strategies that are applicable only on specific periods of time.
Alternative-Datasets: Social media channels such as Twitter or Telegram or news outlets have proven to be influential in the behavior of crypto assets. As a result, factors such as Twitter sentiment or news topics can also play a role in predictive models. Typically, this type of factor is more effective in medium-frequency strategies.
A traditional prediction model can be mathematically formulated in the following way:
Let Y={y1,…,yn} denote a time series. Forecasting denotes the process of estimating the future values of Y, yn+h, where h denotes the forecasting horizon.
When comes to building prediction models, practitioners typically have two main options:
Time Series Analysis: These are statistical models that focus on predicting a variable based on linear correlations with other attributes of the dataset. Auto-Regressive Integrated Moving Average (ARIMA) are one of the most popular techniques in this area.
Machine/Deep Learning Models: These are neural network architectures that forecast a variable based on hierarchical, non-linear relationships between other predictors. Long-Short Term Memory Neural Networks and Recurrent Neural Networks have become very prominent approaches in this school of thought.
The advantages and disadvantages of each approach are an active subject of debate in the machine learning community. Typically, time series models are believed to be more effective with smaller datasets and short term predictions while deep learning models can uncover incredibly complex relationships between predictors that hold over long term periods of time.
Given that time series analysis have been used for a longer time in financial markets, there are pretty well documented instances of their limitations but it's unclear yet whether deep learning techniques are able to improve in these areas.
When comes to crypto-assets using either time-series or deep learning brings some unexpected challenges.
Building predictive models for financial markets is incredibly complex but the crypto space still manages to add its own set of unique challenges to make it even more interesting. Here are some of the difficulties you should be aware of when trying to create predictive strategies for crypto assets:
1. Limited Datasets: The crypto markets are incredibly young and there are many events that have only occurred once and under very unique circumstances. As a result, its challenging for a predictive model to forecast a market condition that it hasn’t seen before. For instance, predicting the next 2017 like bull market would be very difficult as there were very unique factors such as the ICO madness that would be hard to replicate.
2. Lack of Labeled Datasets: The anonymity of blockchains creates a challenge for predictive models as its hard to extrapolate predictors without knowing some identity information. For instance, creating a predictive model that forecasts price based on flows of funds into exchanges first require us to identify specific addresses as exchanges.
3. Exchange Fake Volumes and Wash Trading: Predictive models based on order book data are vulnerable to the constant market manipulations that takes place in centralized exchanges. For instance, behaviors such as wash trading can cause that predictive models learn bad behaviors that can’t be replicated.
4. Lack of Established Predictors: Differently from other financial markets, the crypto space brings some very unique factors that haven’t been established as predictors. For instance, there have been plenty of speculation whether blockchain metrics such as hash rate, the famous NVT ration or exchange inflows and outflows have predictive influence in price fluctuations but the statistical evidence remains questionable at best.
5. Lack of Negatively-Correlated Factors: Factor-based predictive strategies can only be effective in markets in which different assets show negative correlations with respect to a single factor. In the case of crypto, the market is highly correlated which makes factor-based predictive strategies challenging. For instance, its hard to rely on a momentum strategy when the top 20 crypto assets show very similar momentum patterns.
Building predictive models for crypto assets is a fascinating challenge. The digital nature of the crypto space, the large of machine learning techniques available to mainstream developers and the irrationality of crypto markets offer very unique opportunities to create robust predictive models for certain market conditions.
Like other financial markets, crypto behaves very predictably in some circumstances while others will challenge the most sophisticated models.
(Disclaimer: The author is the CTO at IntoTheBlock)