Centralized crypto exchanges are the most important black box of the crypto ecosystem. We all use them, we have a love-hate relationship with them, and we understand very little about their internal behavior. At IntoTheBlock, we have been heads down working on a series of machine learning models that help us better understand the internal of crypto exchanges. Recently, we presented some of our initial findings at a highly oversubscribed webinar and I thought it would be elaborate further in some of the ideas discussed there.
There are several factors that contribute to the difficulty of understanding the behavior of centralized crypto exchanges. Anonymity, non-standard blockchain reconciliation procedures and regular wash trading behaviors are some of the factors that challenge most analyses of centralized crypto exchanges. In an environment in which there are not well established rules, machine learning and data science offer the best framework to unlock some of the mysteries of crypto exchanges.
In previous posts, we’ve discussed some of the internals components of the architecture of centralized crypto exchanges. In essence, there are four key components that are relevant in the behavior of centralized crypto exchanges:
· Hot Wallets: Hot wallets are typically the main interaction point between external parties and an exchange. Exchanges use this type of wallets to make an asset available to trade.
· Cold Wallets: Exchanges use cold wallets as a secured storage of crypto-assets. This type of wallets typically hold larger amounts of assets that are not intended to be traded frequently.
· Deposit Addresses: Deposit addresses are, often temporary, on-chain addresses used to transfer funds into an exchange. The focus of this type of address is to facilitate user to exchange money flows.
· Withdrawal Addresses: Withdrawal addresses are, often temporary, on-chain addresses that are used to transfer funds out of the main exchange wallet. Sometimes withdrawal addresses can play a dual role as deposit addresses.
If we think about tackling the analysis of centralized crypto exchanges using traditional machine learning, we would think about designing and implementing a model that can effectively classify specific addresses as exchanges. While that concept seems logical, it has proven unpractical when applied to centralized crypto exchanges.
For starters, the behavior of centralized crypto exchanges are too diverse and complex to encapsulate in a single model. Additionally, centralized crypto exchanges are constantly changing or updating their on-chain processes which poses a challenges for the learning dynamics of any machine learning model.
Imagine that, instead of a single machine learning model, we are able to combine multiple models into a single knowledge structure that can understand the internal patterns of centralized crypto exchanges. Different models would perform better for different architecture of centralized exchanges and the entire group should be able to be more resilient to changes. In machine learning theory, this can be accomplished using techniques known as ensemble learning.
Conceptually, ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. The goal of ensemble algorithms is to combine the predictions of several base estimators built with a given learning algorithm in order to improve robustness over a single estimator.
In the case of our target problem, an interesting idea would be to construct an ensemble of models that can predict or classify specific addresses as either hot wallets, cold wallets, deposit or withdrawal. We followed a similar approach in the IntoTheBlock platform and the results have been incredibly encouraging.
Building a robust ensemble model for classifying crypto exchanges is hardly enough. To begin with, there are not enough labeled datasets about crypto exchanges to train sophisticated machine learning models. Additionally, each of these models factors in a large number of combinations of parameters which makes it extremely hard to interpret its decisions. From that perspective, complementing those machine learning models with robust data visualization frameworks can help us better understand the characteristics of centralized crypto exchanges. At IntoTheBlock, the use of data visualizations have proven to be a unique asset to understand the behavior of centralized crypto exchanges. Let’s take a look at some things we discovered.
Not all centralized crypto exchanges are created equal. The variety of architectures and exchange patterns makes the classification of crypto exchanges addresses and incredibly challenging exercise. Here are a few of the most common patterns you should know about:
1)Low Volume Deposit Transactions: Some exchanges like Poloniex combine a small number of deposit addresses into a single deposit transaction into a hot wallet.
2) High Volume Deposit Transactions: Other exchanges like Binance combine a large number of funds from deposit addresses into single transactions going into a hot wallet.
3)Low Volume Withdrawal Transactions: Similarly, some exchanges structure withdrawals into a small number of addresses.
4)High Volume Withdrawal Transactions: Other exchanges use a single withdrawal transaction to distribute funds to a large number of addresses.
5)Direct Deposits: Some exchanges transfer funds directly from deposit addresses into a hot wallet.
6)Temp Deposit Addresses: Other exchanges transfer the funds from deposit addresses into a temporary address before it goes to the exchange hot wallet.
7)UTXO Patterns: Unspent transaction outputs(UTXO) mechanics are constantly used by centralized exchange to structure transactions.
8)Single Wallet Exchanges: Some exchanges like Binance use a single hot wallet to process transactions.
9)Dual Hot Wallet Exchanges: Exchanges like Poloniex use a dual hot wallet structure in which one hot wallet is used to receive deposits and one to distribute withdrawals.
10)Multi-Exchange Transfers: Some exchanges like BitStamp regularly interact with exchanges like Binance to access funds.
These are just some of the early and most obvious patterns that we have discovered in our experiments. The IntoTheBlock platform leverages a lot of this information to create more intelligent signals that help researchers and traders make better decisions. However, the most important thing is to realize that the use of machine learning and data visualizations can really helps to better understand the behavior of crypto exchanges without the need of any insight information.
(Disclosure: The Author is the CTO at IntoTheBlock)