The fascinating story of me diving deep into an arbitrage business on the most active blockchain platform. Intro This story started one long evening while I was routinely analyzing traders’ performance on the BSC blockchain. I was using our . This engine contains historical and live data being constantly streamed from the most active blockchain in terms of daily active users’ quantity — BNB Smart Chain (former Binance Smart Chain). I had a hypothesis that it was possible to find trading insights from these traders’ behavior. Datamint data analytics engine to look for the most profitable actors Suddenly, something caught my attention. I found a trader who managed to overplay the market by (65,000$ at the time of writing and even more than that at the time of deal). He used only decentralized exchanges (DEX) to achieve this result. 270 BNB Well, “not great not terrible,” I thought. Then, I ran a query to understand how much time it took him to achieve this result. The query console thought for a few seconds and left me with a stunning answer: one day. Really? Okay, but how many transactions did he send? Just one? How could it be? Well, it’s technically possible to make multiple deals in one transaction. On Ethereum-like blockchains (like Polygon, Avalanche C-Chain, many others, and, of course, BSC) this would require you to write and deploy a smart contract. You’d need to code in Solidity language, but it isn’t that difficult if you have some coding experience (just be aware that most code examples of “Uniswap Arbitrage Bot” promoted online are scams that just steal your money). But how exactly can you extract so much value in one atomic step❓ It’s time to check the transaction contents. https://bscscan.com/tx/0x724e64f8426af9329c71ff3eee16e2bf5fc03b979370217c277e8bbf917f3bb7 Let’s take a closer look at what’s happening here. We see that this lucky guy takes and swaps them to BUSD stablecoin. Then, he swaps BUSD to BSC-USD (another dollar-pegged stablecoin). After that, the trader swaps BSC-USD to CAKE (Pancakeswap DeFi protocol’s native token), which is then swapped back to BNB in its wrapped form. In the end, the trader gets , effectively extracting . 1500 BNB 1769 BNB 269 BNB of profit This is called arbitrage. Many assets are traded on multiple decentralized exchanges (DEX) and prices on different DEXes and asset pairs are constantly changing. A smart actor can leverage these price discrepancies and extract value in one single transaction. After some research, I’ve found out that this example, while impressive, isn’t that common because of two things. First, in this case, the player used his own capital for trading. Second, he extracted a really big amount. So here’s another, much more common example of arbitrage: https://bscscan.com/tx/0x257af7585ca32e8648c4307e839d70e2e7d2a6879375a29abe9c15145b204015 First of all, you may see that the order of operations is strange. The arbitrageur takes tokens from a liquidity pool before paying for them. This is called a . Exchange protocols like Uniswap v2 (Pancakeswap and most other DEXes on BSC are forks of Uniswap v2) allow you to take any amount of money (up to the total liquidity amount of the pool). . The protocol doesn’t bear any risk here because it’ll forcefully revert the transaction if you don’t fully repay the loan. “flash loan” But you have to repay all borrowed funds with a fee before the transaction ends So, this arbitrageur leverages flash loans. This way, he can execute arbitrage without a penny of working capital (except for gas costs). But we also have another difference here: the small profit. The trader got only 1.26$ and paid something like 0.54$ for gas. So, his net profit for this transaction was less than 1$. However, arbitrageurs like this one may do dozens of arbitrage attempts per minute. This means he probably still earned a lot. Soon, we’ll know how much exactly. At this point, I was so thrilled by these examples that I decided to dive deeper. I was determined to find out how exactly this market works. And maybe… find out how to earn a Lambo by going into an arbitrage business 😃. : I still don’t own a Lambo, but this was a fascinating journey. I also learned a lot about arbitrage and even tried to put myself into the arbitrageur’s shoes. Spoiler Are you ready to follow me on this journey? Let’s start then! How Big and Stable Is the Market? I decided to start with calculating how much arbitrageurs are earning. Today, we’re talking solely about so-called flash “arbitrageurs”. These are the guys who take a flash loan, execute trades, repay the loan, and keep the profit — all in one transaction. This part of the arbitrage market is very sweet. In fact, you don’t need working capital to do it, and you don’t risk losing your capital. When you send the transaction, you either win or the transaction is reverted. You only lose a small gas fee, usually less than 10 cents. After a few experiments with a manual review, I came up with a query that does the following steps: But how to calculate the flash arbitrageurs’ total profits? Filter only transactions that satisfy following criteria: Assets traded in a closed loop (like WBNB — DOGE — CAKE — WBNB. - Value extracted in one of the most liquid assets on BSC: BNB, BUSD, USDT, USDC, WBTC, or WETH. - The tx receiver isn’t a public contract (like the PancakeSwap router for example). We exclude these contracts to properly calculate the total arbitrage gas fees, including the fees for failed attempts. Find out the contract creator — we need this because one arbitrageur may have multiple contracts and we want to group them together. Calculate the total revenue and convert all incomes to USD value by using the median asset-BUSD DEX price of the day of the deal. Calculate gas costs for both successful and unsuccessful attempts. When talking about an arbitrageurs’ profit, we mean : revenues minus gas costs. In reality, arbitrageurs most likely have other significant costs that include R&D team payroll and infrastructure payments (server rent, private network links, etc.). The exact cost structure is the arbitrageur’s know-how of arbitrageur, we can’t discover it with on-chain analysis. This tool is powerful but obviously has limitations. Note: gross profit The resulting query turned out to be really compute-intensive. In normal circumstances, it might have taken days or weeks to be executed (especially given the enormous size of the BNB Smart Chain archive). Then, I connected a visualization tool to the resulting data marts. And here you have it: . You can check it out live on our website (see the link at the end of the article). However, thanks to the very fast analytical database at the core of Datamint’s data analytics engine and some help from my fellow colleagues with query optimizations, I got all the data in a matter of minutes. the Datamint BSC Flash Arbitrage Monitor But let’s get back to our analysis. We can see that all profitable arbitrageurs earned 138,05m$ since the 1st of January 2021. This is essentially almost the whole history of BNB Smart Chain. The BNB Smart Chain was launched in September 2020 and gained momentum in Q4 2020. We also can see a slow but steady decline in arbitrageurs’ profits. (That’s excluding the spikes which we’ll discuss later). I interpret this decline as the fact that DeFi markets are becoming more mature and efficient, leaving less room for value extraction by arbitrage. . I was surprised by how many arbitrageurs suffer direct losses –some of them losing thousands of dollars And this isn’t because of gas costs (gas costs are really a small fraction of revenue for a wide majority of arbitrageurs). They don’t really have to carry these losses because you can revert a transaction if you see that it’s about to have negative profit. I can only think that their losses are due to a bad math job. And this is not so surprising as this sweet market attracts both professionals and amateurs. Let’s take a look at the most successful guy, I call him . He’s been in the game since the beginning and earned . “The BSC Arbitrage King” almost 8m$ by the end of Q2 2022 He owns 32 arbitrage contracts, all of them profitable (well, he’s a professional, no doubt). Looking at his profit dynamics, we see that his earnings are generally lower in 2022 than in 2021. This is partly because of the more efficient market and partly because of the lower BNB price ( ) and the overall market dip. However, we can see a huge spike in his profits in May. He typically earned around 5–15k$ a day in 2020. (I can imagine what a wild party his team had that night). How? the average BNB price in 2022 is ~300$ vs ~400$ in 2021 But on the 12th of May, he earned a stunning 320k$! Well, to get the right answer, I just had to Google “What happened on the crypto market on 11–12th of May 2022?”. But I decided to dive deeper into on-chain data. I wanted to learn everything the hard way. So, I’ve assembled the second query that allowed me to break down arbitrage profits by assets used in arbitrage. many arbitrages involve 3 and even 4 assets in a row, so the figures on the diagram below are non-additive (the total deal profit is credited to all participating assets). Note: When I ran the query and set the date filter to the 11th of May, I understood everything with one glance. . This was the day when insufficient liquidity in the LUNA protocol caused the algorithmic UST stablecoin to de-peg from USD. Then, the massive automatic minting of LUNA tokens caused their price to drop to the abyss. Yes, it was LUNA and UST But how did this affect the arbitrageurs’ profits? It’s easy. When things like this happen, massive sellouts occur. . And this is exactly what arbitrageurs crave for. When people start to panic, they tend to sell faster instead of selling at an optimal price. In normal times, they’d split their order, start selling in a pair with higher liquidity, wait for the price to be recovered, etc. But that day, they didn’t have enough time to do all that. Moreover, a lot of positions in overcollateralized on-chain loan protocols (usually used for shorting) were closed by liquidators. Collaterals were sold automatically to cover the loans. These massive automatic sales also created inefficiencies. And big sales in DEX protocol mean big price changes. This causes big price discrepancies between different DEXes (e.g. Pancakeswap and BiSwap) All of this brought 320k$ to our proud King of BSC Arbitrage. The total profit for all arbitrageurs on that single day was over 2m$. To get the idea of what assets ( ) are the most profitable at “normal” times, I had to assemble another query. . Then, I adjusted the date filter to exclude spikes from observations. and, specifically, combinations of assets I call “paths” This query grouped profits (again, in a non-additive way) by sorted paths rather than single assets I expected that paths made up exclusively of the most liquid and popular assets like CAKE, ALPACA, and SHIB would be most profitable. But that wasn’t true. , which I never heard about before. Some brief Googling showed me several websites claiming to be FistSwap or FstSwap DEX affiliated with the FIST token. At least some of them are definitely scams. I’m sure that more investigation here can bring interesting insights, but I decided that this is out of scope for this research. For some reason, many top profitable paths included FIST token . An interesting thing is that I couldn’t find any significant correlation between market volatility and arbitrage earnings Here’s my initial hypothesis: the more volatile the market is, the more arbitrageurs earn. However, other than several big spikes, I couldn’t identify any strong relationship between arbitrage profits and volatility. Okay, now we know two main things. First, the arbitrage market does have some money. Second, the brave ones can still find profit opportunities to chase despite the steady decline in overall USD profits. However, before we dive deeper, we want to know a very important thing: How Fair Is the Market? 💭 Because if the market is unfair, it doesn’t matter how much money it has - you won’t be able to cut the share-off player with some secret advantage. And we have a lot of reasons to be suspicious of the market’s fairness. Here’s why. That’s because some actors on the BNB Smart Chain can have a massive advantage. When the arbitrage opportunity appears, all arbitrageurs are trying to backrun it. (say, a swap for 2000 BNB for CAKE). I’m calling this transaction . The winner takes it all, others just pay gas for an unsuccessful attempt. To be that fast, arbitrageurs must race to detect the trigger in the mempool (temporary storage for pending transactions). Then, they’ll have to send their arbitrage transaction to the validator first. This means they put their transaction in exactly the next position after the transaction that creates the opportunity “a trigger” Obviously, two types of actors have a massive advantage here — RPC nodes (that get transactions through RPC API from, say, your Metamask wallet) and Validator nodes. Validator nodes can reorder transactions to their advantage and even privately mine transactions without sending them to the mempool. This advantage is called the MEV (Miner Extractable Value). . This initiative attempts to provide fair access to private mining. However, the BSC doesn’t have anything like this. It only has 21 validators. And at any given moment, we can suspect that these validators are playing the arbitrage game to increase their profits. On the Ethereum network, we even find a special initiative called Flashbots MEV So, let’s first take a look at the market share distribution. Since the 1st of January 2021, I’ve observed using . And of them were (with 2648 profitable contracts). If we look at the current situation (say, Q2 2022), we see . The market is also quite consolidated — the top 7 players hold 50+% of the market, and the top 20 hold 80+%. However, many smaller players are still within the tail of profit distribution, and they still earn their penny. 1972 arbitrageurs 6689 custom contracts only 430 profitable 129 profitable arbitrageurs out of 645 in total Well, this sort of market distribution doesn’t reveal how fair the market is. We need some other way to figure it out. . That’s when we want to use the on-chain data crystal ball 🔮 again At this moment, we have the complete history of all successful arbitrage transactions. And for each transaction, we know the number of the block. We also know the address (and sometimes the name) of its validator. We may assume that it’s unlikely that any arbitrageur has “special” relationships with all 21 validators at once. . So, if an arbitrageur is leveraging MEV, his profit should be skewed to some validator Okay, looks like it’s time for another query. This time, I took the time period from the 1st of January 2021 and calculated the following indicators for each successful arbitrageur: total profit, average profit per deal, how many successful deals they made, and, most importantly, the biggest profit share they have with a single validator. In theory, if an arbitrageur doesn’t have any preference for validators, this indicator should be close to 1/21 = ~5%. Actually, many other factors can skew the success rate. For instance, the arbitrageur’s nodes can share the same datacenter with some of the validators. Or their nodes can have direct peering with a validator. Heuristically, I’d assume that everything below 20% is an indicator of fair play. “topValidatorProfitShare” So, I’ve run a query and sorted the result by total profit. And… Well, at least the top 10 arbitrageurs show no signs of special relations with validators. But still, this isn’t enough to say that this is unfair play. It might be a strong sign of affiliation, but in this case, the total profit looks too small. It’s approximately 6 times less than 1/21 of the total market. The first suspicious entry (0x92ef7fac0708fc3c49921907361429ec14cd8cb6) is at position 15 with 39% of profits. It gained 1,1m$ in total from blocks validated by NodeReal. An even more suspicious entry (0xba5276f63492b351c7227a4f285593cefa250ad3) is at position 45 with 89% of profits. It got 566k$ in total from blocks validated by HashQuark. Then, I tried to verify that this entry is special in affiliation with HashQuark. I’ve selected only arbitrageurs “favored” by HashQuark and sorted them by their “ descending. topValidatorProfitShare” The suspicious guy is at the first place. And by the way, he truly has an amazing average profit per successful deal — 331$. That’s a lot. However, in the second place, we see absolutely no remarkable arbitrageur with a total profit of less than 2k$. At the same time, his is only two percent less than of the first one. It doesn’t look like he profits from affiliation. He also has more than 6674 successful deals, so this doesn’t look like an accident. topValidatorProfitShare Well, maybe both just accidentally share the same shelf in the datacenter? We’ll never know for sure. However, . It looks like most players are playing fair, and no whales are dominating the market. 😌 I can say that overall, for a market that competitive, the evidence of affiliation is very weak (or very well disguised) Is It Possible to Enter the Arbitrage Market? Okay, so we know now that the flash arbitrage market on BNB Smart Chain has a decent size, a seemingly fair structure, and a lot of opportunities. Looks like we’re ready to enter it and fight for the Lambo! Well, not so fast! One piece is still missing from the puzzle. How are arbitrage opportunities created? We know how that works in general — big DEX swaps, liquidations, etc. But to start monitoring mempool, we need to know exactly what types of trigger transactions arbitrageurs observe. Let’s get back to the lucky one who extracted 60k$+ in one transaction. Remember? That guy started my fascinating journey. By exploring the block on BSCScan explore, we can easily find his trigger transaction. This is the transaction in the same block mined immediately before the arbitrageur’s transaction. Here it is: https://bscscan.com/tx/0x06de0901e11bd19b3a42f0746e17604a9c8c502bdc4202737f24407bf1743f75 This is a liquidation transaction of the Alpaca Finance lending protocol. . Someone had a very big position in this protocol and, clearly, it wasn’t exactly their day. Their position was liquidated and their collateral was automatically sold on PancakeSwap DEX Obviously, checking triggers for each tens of millions of arbitrage transactions isn’t an option. And you know already how I’m going to solve this problem. . Datamint servers and databases, data analysis tools, and a few cups of ☕️ This query was probably the most complicated and compute-intensive. When I hit “Execute”, I could almost hear the fans of our data servers howling like jet engines. Thankfully, this server torture didn’t last long, and I got my results. So, the results aren’t surprising at all. . The most popular function to serve as a trigger is the . It accounts for . This function is a part of Uniswap V2 forked by most DEXes on BSC — e.g. BiSwap, ApeSwap, etc. The only surprising thing is again the FstSwap (or FistSwap?) on second place right after the PancakeRouter. More than a third of arbitrage profits involve transactions to the PancakeSwap DEX as a trigger “swapExactTokensForTokens” ~14% of all profits At this point, we can simply pick any combination of trigger addresses and functions. And now, we’re ready for experiments! My team and I spent some resources doing more in-depth research on triggers. Mainly, we were analyzing the relationship between competition and the profitability of different triggers. However, this is clearly out of scope for this article. So, please contact us if you’re interested in more details. At this point, I had all the theory I needed, but the theory is nothing without practice. Which means… Let’s Try Arbitrage! I have to start with a disclaimer — my goal wasn’t to build a profitable flash arbitrage bot. I have no illusions and I totally understand that this is a complex development task. My actual goal was to cut the corners as much as possible. I wanted to understand how big the distance between an experienced player on the market and a newbie is**.** So, I decided : that I will NOT 1. My guess is that this is where pro arbitrageurs compete in computation speed. For my research, I’ll craft the trigger myself and respond to my own trigger only. Implement simulation or heuristic estimates for potential profit created by every transaction. 2. . This isn’t that difficult, but I don’t need it for my research. If I can put my dummy transaction next to my own trigger overrunning competition, then writing the contract is a matter of technique. Write and deploy an actual arbitrage contract 3. . I won’t tweak blockchain nodes, optimize infrastructure, networking, etc. This is a rabbit hole. And if I can’t win without it, it won’t be an easy walk definitely. Make specific optimizations Having all this in mind, I spun up a BSC node and wrote a simple script in Python. This way, I could monitor all new transactions in the mempool by constantly fetching them through the IPC connection. Then, I could look for my crafted trigger and immediately send a responding dummy transaction (transfer of 0 BNB to my own account). Here’s how it should’ve worked in theory. But I want to say a few words about crafting a trigger. I needed to send a transaction that would catch the attention of active arbitrageurs. This should be a big swap for at least 10k$. But how can I do this if I don’t want to put real money in the game? Luckily, when competing for reaction speed in the mempool, arbitrageurs don’t have time to check if the sender has funds to make the requested swap. So, I took my own small swap transaction as an example and edited it. This way, . it appeared as a swap of 1m$ (BUSD) to CAKE at Pancakeswap This is what the transaction payload looked like on BSCScan. The first experiment has shown that this approach works perfectly. My trigger was mined, reverted (as I don’t have 1m$ in my wallet yet 😆), and At that point, I was thinking that my attempt to overplay the big guys is doomed. backrun by approximately 400 arbitrage attempts. 400… But I was committed to put the thing through. So, I started my Python script, waited for the node connection initialization, and sent my carefully crafted trigger using Metamask’s custom Hex Data field and online ABI (Application Binary Interface) encoder . Almost immediately, I saw a notification in the node’s console that a trigger was detected in the mempool and the dummy response transaction was sent. After a few seconds, both transactions were mined and I opened BSCscan (which isn’t easy to do when you’re crossing your fingers). I evaluated the results: https://abi.hashex.org/ . My expectations were low, but holy cow, the result was quite disappointing. Even though I managed to put my response transaction in the same block as the trigger, I lost the battle to hundreds of other arbitrageurs That means all of them had a massive infrastructure advantage — they had tweaked powerful nodes, special network solutions, better peering with other nodes, and some other optimizations that I don’t even know about. It would’ve been wise, but I wanted to put things to the extreme. I was going to present a very hard test to active arbitrageurs. Should we stop here? Here’s a short digression to better understand how the . When a node gets new transactions (either from RPC or from other nodes), it adds them to the mempool. Then, it retransmits them to other connected nodes (peers). This process has things: Ethereum P2P protocol works 2 special A node sends full new transactions only to a subset of peers (like 10 of 100). The rest of the peers only get hashes of new transactions. They’re free to request full transaction bodies if they don’t have them already. This reduces unnecessary clutter in the network but can slow down your transaction propagation if you aren’t lucky. Nodes usually don’t send transactions individually. Instead, they pack multiple transactions into a single network packet. Otherwise, transactions would have a significant delay, because message throughput in the Ethereum P2P network is rate-limited to prevent DDoS attacks. Knowing these two things, I could simulate a massive advantage on my side. . I could send both trigger and response transactions in one single packet I could also be sure that I’m sending full transactions right away. So, in theory, arbitrageurs shouldn’t be able to insert their transactions between my trigger and my response. They’re joined in one packet, so they don’t have any latency between them. This idea didn’t seem as wonderful as in the beginning after I had spent hours digging into the source code of bsc-geth, the official node software for the BNB Smart Chain. However, everything passes, and finally, I was able to send custom packets to 150 peers of my node. I carefully crafted trigger and response transactions, packed them into a single packet, and sent them to peers. I’ve never been so nervous opening BSCScan ever. What I saw left me speechless. The result was almost the same as in the previous run! I still lost to 100+ arbitrageurs. How could it be possible? I see only two options here. Either the validators are in the game (which is unlikely based on the results of my research) or… Or, the BNB Smart Chain has so many arbitrageur nodes! This means they can intercept transactions on every route to the validator. Arbitrageurs will find triggers, insert their own response transactions, and delay weaker opponents’ transactions (yes, those like me 😢). Looks like flash arbitrage profits aren’t easy money at all. I most likely won’t get a Lambo from this market … However, moments before I slipped into depression thinking about my broken dreams, I recalled that my business isn’t arbitrage. . I work in on-chain data analysis and value-added data services. And this research is great for my business! It shows how much you can learn using only public blockchain data, the right tools, and commitment Conclusion This was a fascinating journey that once again, proved the power of on-chain data. But not just any data; I mean data harnessed with proper analytics and efficient tools like . And the story isn’t over, as we have published the analytical application on our website. You can continue the journey yourself and look for more insights with it. Datamint data engine Datamint BSC Arbitrage Monitor But to summarize this research, I’ll reiterate key findings: The flash arbitrage market of the BNB Smart Chain alone has a decent size of 100m$ per year. The market has a slow but steady decline because of market conditions and DeFi’s growing efficiency. But, it still has many opportunities. Top players are still earning up to 8m$ per year. The market is highly competitive with hundreds of professional players, but it doesn’t display signs of large-scale “unfair play”. The market is open for brave new players, but you should be prepared for high R&D and infrastructure costs and risks. . That’s all for today. If you like this story, please drop us a few lines about topics that interest you in the realm of on-chain data. This would help us to prioritize themes for upcoming content including articles, research, online tools, and webinars May the data be with you! *:* Ivan Vakhmyanin is a data analytics and visualization (BI, Big Data, Data Science) expert with years of experience. He is also a blockchain and Web 3.0 adept making the on-chain data from leading blockchain platforms (Ethereum, BNB Smart Chain, Solana, etc) available for analysis. Ivan is passionate about sharing experience by developing educational programs in the field of Data-driven Management for specialists and executives. About the author Also Published Here