paint-brush
Take a Deep Dive Into Verkle Tree For Ethereumby@sin7y
3,993 reads
3,993 reads

Take a Deep Dive Into Verkle Tree For Ethereum

by Sin7YMay 20th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Verkle Tree is a common Accumulator, which can be used to prove that one element exists in the Accumulators. Compared to Merkle tree, VerKle Tree has been improved a lot in the Proof size as a critical part of the ETH2.0 upgrade. The 23rd tech review by Sin7Y will demonstrate the principle of the principle. When it comes to data with the size of one billion, the Merkle Tree proof will take 1kB, while the Verkle tree proof only needs no more than 150 bytes.

People Mentioned

Mention Thumbnail

Company Mentioned

Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Take a Deep Dive Into Verkle Tree For Ethereum
Sin7Y HackerNoon profile picture


Compared to Merkle Tree, Verkle Tree has been improved a lot in the Proof size as a critical part of the ETH2.0 upgrade. When it comes to data with the size of one billion, the Merkle Tree proof will take 1kB, while the Verkle Tree proof only needs no more than 150 bytes.


The Verkle Tree concept was proposed in 2018 (More details can be found here.). The 23rd tech review by Sin7Y will demonstrate the principle of Verkle Tree.

Merkle Tree

Before digging into Verkle Tree, it is important to understand the Merkle Tree concept. Merkle Tree is a common Accumulator, which can be used to prove that one element exists in the Accumulator as shown in the following figure:

Figure 1: Merkle Tree

To prove that (key: value) = (06: 32) exists in the Tree (green-marked), the Proof must contain all red-marked nodes in the figure.


The verifier can calculate the Root according to the method shown in Figure 1, and then compare it with the expected Root (grey-marked).


It is predictable that with the depth and width of the Tree getting greater, the Proof size will also be larger (for branched-2, the complexity is log_2(n), while for branched-k, it is (k−1)log_k(n).


As well, the verifier needs to conduct a great number of Hash calculations from the basic level to the upper level. Thus, the increase in the depth and width of the Tree leads to the increase in the verifier’s workload, which is unacceptable.

Verkle Tree - Concept

Simply increasing the Tree’s width can reduce its depth, but the proof size will not be reduced because the proof size changes from the original log_2(n) to (k−1)log_k(n).


That is, for each layer, the prover needs to provide (k−1) additional node information. In the paper Verkle Tree, John Kuszmaul mentioned a scheme to reduce the proof complexity to logk(n).


If we set k=1024, the proof will be reduced by log_2(k) = 10 times.


The Verkle Tree design is shown as follows:

Figure 2. Verkle Tree

For each node, there are two pieces of information: (1) value; (2) existence proof π. For example, the green-marked (H(k,v),π_03) shows that H(k,v) exists in commitment C_0 and π_03 is the proof of this argument.


Similarly, (C_0,π_0) means that C_0 exists in commitment C_Root and π_0 is the proof of this argument.


In the paper Verkle Tree, the method of such existence commitment is called Vector commitment. If the Vector commitment scheme is used to execute existence commitment for the original data, the Proof with O(1) complexity will be obtained while the complexity of Construct Proof and that of update Proof are O(n^2),O(n) respectively.


Therefore, to strike a balance, the K-ary Verkle Tree scheme is used in the paper Verkle Tree (as shown in Figure 2) to make the complexity of construct Proof, update Proof and Proof be O(kn),O(klogk n),O(logk n) respectively.


The specific performance comparison is shown in Table 1:


In this article, we are not intended to provide a detailed introduction to some specific vector commitment schemes, which John Kuszmaul has explained well in his paper.


Fortunately, compared to the vector commitment, we have a more efficient tool called polynomial commitment.


Given a group of the coordinate set (c_0,c_1,....,c_n) and a value set (y_1,y_2,....,y_n), you can construct a polynomial (Lagrange interpolation), satisfying P(c_i)=y_i, and conduct a commitment to this polynomial.


KZG10 and IPA are common polynomial commitment schemes (At this point, the commitment is a point on the elliptic curve, typically between 32 and 48 bytes in size).

Basis

KZG for Single Point

Take KZG10 as an example. For the polynomial P(x), we use [P(s)]_1 to represent the polynomial commitment.


As we all know, for P(x), if P(z)=y, then (x−z)|(P(x)−y).That is to say, if we set Q(x)=(P(x)−y)/(x−z), then Q(x) is a polynomial.


Now, we generate a proof for P(x)P(x) to satisfy P(z)=yP(z)=y. That is, calculate [Q(s)]1[Q(s)]1 and send it to the verifier, who needs to verify:

Because s is a randomly-chosen point in the finite domain F, the probability of the prover’s succesful evil behavior is degree(Q)/P (Schwartz–Zippel lemma).

KZG for Multiple Points

Now, we want to prove that the values of the polynomial P(x) on (z0,z1,....,zk−1) are (y1,y2,....,yk−1), respectively. Therefore, we need to define two polynomials:

According to the description mentioned above, we need to satisfy V(x)|(P(x)−I(x)). That is, there exists a polynomial Q(x) , satisfying:


Therefore, the Prover needs to provide the commitments [P(s)]_1,[Q(s)]_1 for P(x) and Q(x), and send the commitments to the verifier.


The Verifier calculates [I(s)]_1,[V(s)]_2 locally, and verifies the equation:


It is clear that the proof size is constant no matter how many Points there are. If we choose the BLS12-381 curve, the Proof size is only 48 bytes, which is very efficient.

Verkle Tree - ETH

Compared to Merkle Tree, in which to prove the existence of an element, the prover still needs to provide the proof with O(log_2n) size, Verkle Tree has made a great improvement on the proof size.


Let’s check out a simple example of Verkle Tree.

Figure 3. Verkle Tree for ETH


It can be seen that, similar to the Merkle Patricia Tree structure, nodes can be divided into three types - empty node, inner node, and leaf node.


The width of each inner node tree is 16 (0000->1111 in hexadecimal). To prove that the state of the leaf node is (0101 0111 1010 1111 -> 1213), we need to conduct the commitment to Inner node A and Inner node B:


  1. Prove that the value of Inner node B’s commitment is hash (0101 0111 1010 1111, 1213) at index 1010.


  2. Prove that the value of Inner node A’s commitment is hash (cm_B) at index 0111.


  3. Prove that the value of node Root’s commitment is hash (cm_A) at index 0101;


Use C_0(InnernodeB),C_1(InnernodeA),C_2(Root) to represent the commitments mentioned above and correspond them to the polynomial f_i(x) respectively.


Therefore, the Prover needs to prove:

Compress for Multiple Polys

To make it easy, we will use z_i to represent the index.


The prover needs to prove that for the polynomial set f_0(x),f_1(x),....,f_m−1(x), it satisfies the following conditions at points z_0,z_1,....,z_m−1, respectively:
According to the previous description (KZG for Single point), for each polynomial, there exists a quotient polynomial satisfying:
Prover needs to conduct the commitment to the original polynomial and the quotient polynomial, and send it to the Verifier:

Verifier executes the verification:
It is obvious that we don’t want the verifier to execute so many pairing operations (it’s expensive). Therefore, we need to execute a Compress as follows.


Generate some random numbers r_0,r_1,....,r_m−1, and gather the above quotient polynomials together:
Assume that if and only if each q_i(x) is a polynomial, g(x) will be a polynomial (The probability that the fractions between q_i(x) exactly offset is very low because of random numbers).


The prover conducts commitment to the polynomial g(x) and send [g(s)]_1 to the verifier.


Next, let the verifier believe that [g(s)]_1 is the commitment to the polynomial g(x).


Observe the form of the polynomial g(x)g(x), which can be written as:

Choose a value tt randomly and there is:

Define the polynomial:

Its commitment can be calculated with the following method:

Then the value of the polynomial h(x)−g(x)h(x)−g(x) at point tt is:

Calculate the quotient polynomial q(x)=(h(x)−g(x)−y)/(x−z).


Calculate the commitment π = [q(s)]_1=[(h(s)−g(s)−y)/(s−t)]_1, and send it to the verifier.


Verifier performs the following verification:

  1. Calculate

  2. Verify


Key Properties

  1. Any number of points can be proved using this scheme without changing the Proof size. (For each commitment, there is a proof π.)


  2. The value of y_i do not need to be provided explicitly as it is the hash of the next layer value.


  3. The value of x_i do not need to be provided explicitly as it can be judged from Key.


  4. The public information used includes the key/value pair to be proved and the corresponding commitments from the basic level to the upper level.

References

  1. Dankrad Feist, “PCS multiproofs using random evaluation,” https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html, accessed: 2022-05-10.


  2. Vitalik Buterin, “Verkle trees,” https://vitalik.ca/general/2021/06/18/verkle.html, accessed: 2022-05-10.


  3. John Kuszmaul, “Verkle Trees,” https://math.mit.edu/research/highschool/primes/materials/2018/Kuszmaul.pdf, accessed: 2022-05-10.