How To Build A Secure Eth2 Staking Infrastructure

Written by blockdaemon | Published 2020/12/06
Tech Story Tags: ethereum | staking | blockchain | secure-eth2-staking | ethereum2-staking | eth2 | good-company | ethereum-2.0

TLDRvia the TL;DR App

Ethereum 1.0 was a landmark moment in blockchain technology. Allowing for the trustless execution of code on a blockchain in its current form, however, it does not have the ability to scale to the level of computation that would be needed to disrupt current financial systems.
With the evolution of Ethereum 2.0, it will usher in the new era of "Broadband Blockchain".
As part of this upgrade, Ethereum is changing some of the most foundational components of its system, most notably, its consensus algorithm. A consensus algorithm is the mechanism used to come to agreement on the state of the blockchain.
Ethereum 1.0 uses the now famous Proof of Work (or Nakamoto) consensus mechanism. This method is extremely energy intensive, with Bitcoin and Ethereum together consuming more energy coming to consensus than the entire annual energy consumption of Switzerland.  Ethereum 2.0 will instead use Proof of Stake consensus.In simple terms, if PoW consensus is a guessing game trying to find a magic number, PoS consensus is an organised voting system where everyone keeps a large sum of money in escrow, and if they misbehave, some of their escrowed money is taken off of them (called slashing).
The users that take part in this proof of stake system are called stakers, and they lock up 32 ether into the escrow contract and deploy small servers on the internet called validators to perform the duties of a staker in exchange for rewards. The rewards vary depending on the number of people taking part, but roughly vary between the margins of 2% to 18% APR. 
Ethereum 2.0 had its Genesis block on December 1st 2020. At launch there was more than $400m worth of Ether locked into this escrow, and there were 21,063 validators in operation, each communicating with each other through beacon nodes using peer to peer communication across the internet.
The infrastructure to run all of these validators is immense, and the security and reliability of these systems is of utmost importance, as any downtime results in a loss of earnings or even a loss of the principal sum escrowed.
Blockdaemon has been developing secure and scalable non-custodial Eth2 Staking Infrastructure for its enterprise clients, and today we're sharing a few pro tips to maximise the safety and performance of your enterprise-grade staking infrastructure.

Let clients control their withdrawal key

By default, each of the Eth2 clients (Prysm, Lighthouse, Teku, Nimbus etc.) create withdrawal keys and validating keys from the same mnemonic phrase. This is the easiest way to get started, but if you are building an enterprise staking solution for customers, this means you are taking full custody of their funds. This has major security and legal implications.
Instead, have your clients create an Eth2 address they control, and use a tool like ethdo to craft a custom deposit data file that allows your enterprise's validator to withdraw to the client's withdrawal address. This way, if your validating server is compromised, the attacker can force you to exit the validator (with or without getting slashed in the process), but the exited funds will be sent to your client's cold wallet, which the attacker should not be able to access because it should be nowhere near your compromised validator--thus limiting potential damage. If an attacker can only burn your client's funds, but cannot steal them, they are less incentivised to attack your validating stack relative to other custodial enterprise staking solutions.

Have robust monitoring and alerting for your validator

If your validator goes offline, how long would it take you to notice? Make sure to invest in monitoring infrastructure that can watch your validator's participation in the network, as well as some of the more general server management risks like running out of free disk space on your device, your CPU getting overloaded, your server clock falling out of sync or your validator running out of free memory to use.
It is important to remember that monitoring without alerting is not sufficient. If you have to actively check your monitoring to know if there is an issue, you are likely to get complacent as the months go by, and sooner or later you could miss a long period of inactivity that would have been avoidable had you integrated prometheus alert-manager or opsgenie or any of a number of alerting tools into your enterprise stack.

Consider remote key signers

As of writing this post, the Prysm, Lighthouse and Teku clients all have the ability to integrate with remote key signing infrastructure. These services store your private keys in an encrypted and secure fashion, and make it even more difficult for a hacker to extract your precious private keys. Dirk is an enterprise solution from Attestant for fault tolerant and secure key management of Eth2 validating keys that might make your enterprise staking solution more robust.

Defend yourself against DoS attacks with Sentry Nodes

Currently, a significant concern when it comes to Eth2 staking is that it is relatively trivial to correlate validators to the IP address of the beacon node they attach to. Although the validators themselves should not be exposed to the internet, their beacon nodes are, and with this knowledge, it becomes feasible to selectively DoS the beacon node that owns a validator that's about to propose a new block. This means a malicious attacker can prevent your validator from doing its duties by overloading the beacon node it's attached to for about 12 seconds every 6 minutes when your validator is due to make an attestation or block proposal.
To mitigate this issue, one can design an array of publicly accessible beacon nodes that act as sentries, and the beacon node your validators attach to is not publicly accessible over the internet and instead it peers with only your sentry nodes. This means an attacker needs to simultaneously DoS every one of these nodes to keep your validator offline, otherwise one of them will be able to relay your validators attestations to the wider network successfully.

Be client agnostic

In August 2020, a bug in the Prysm client's clock management code caused every Prysm client to go offline, resulting in more than 60% of the Medalla testnet validators going offline. This problem was exacerbated when an emergency release to solve the problem was broken, rendering the prysm validators unusable on that version, requiring a downgrade to rectify. In an emergency situation where a specific client becomes unusable, it helps tremendously to be able to run your staking system using more than one of the clients. Interoperability of clients is a strong priority amongst the Eth2 teams, and through their hard work it is becoming achievable to be able to replace a Prysm validator with a Teku validator, or a Lighthouse validator seamlessly. With multiple supported client implementations, your company, and your users, are more protected than if your entire infrastructure relies on just one Ethereum 2.0 client.

Practice, practice, practice

No system is fully reliable, no system has 0 downtime forever. Don't be afraid to break your testnet systems so that you can work on your recovery skills. Make it fun, make a red team and blue team out of your engineers and incentivise them to break their own infrastructure. Develop playbooks for handling the different types of faults so when they happen in production people immediately know how to remedy the situation. Build experience and confidence in your team that they can successfully manage this brand new infrastructure in a brand new threat landscape. Mistakes happen, but not learning from your mistakes is probably the worst one of all.
To learn more about how you can participate in Ethereum 2.0 with us, visit our marketplace for more information.
By Oisin Kyne, Developer at Blockdaemon 

Written by blockdaemon | the leading blockchain infrastructure platform for node management
Published by HackerNoon on 2020/12/06