Sin7Y is a tech team that explores layer 2, cross-chain, ZK, and privacy computing. #WHAT IS HAPPENING IN BLOCKCHAIN#
Compared to Merkle Tree, Verkle Tree has been improved a lot in the Proof size as a critical part of the ETH2.0 upgrade. When it comes to data with the size of one billion, the Merkle Tree proof will take 1kB, while the Verkle Tree proof only needs no more than 150 bytes.
The Verkle Tree concept was proposed in 2018 (More details can be found here.). The 23rd tech review by Sin7Y will demonstrate the principle of Verkle Tree.
Before digging into Verkle Tree, it is important to understand the Merkle Tree concept. Merkle Tree is a common Accumulator, which can be used to prove that one element exists in the Accumulator as shown in the following figure:
To prove that (key: value) = (06: 32) exists in the Tree (green-marked), the Proof must contain all red-marked nodes in the figure.
The verifier can calculate the Root according to the method shown in Figure 1, and then compare it with the expected Root (grey-marked).
It is predictable that with the depth and width of the Tree getting greater, the Proof size will also be larger (for branched-2, the complexity is log_2(n), while for branched-k, it is (k−1)log_k(n).
As well, the verifier needs to conduct a great number of Hash calculations from the basic level to the upper level. Thus, the increase in the depth and width of the Tree leads to the increase in the verifier’s workload, which is unacceptable.
Simply increasing the Tree’s width can reduce its depth, but the proof size will not be reduced because the proof size changes from the original log_2(n) to (k−1)log_k(n).
That is, for each layer, the prover needs to provide (k−1) additional node information. In the paper Verkle Tree, John Kuszmaul mentioned a scheme to reduce the proof complexity to logk(n).
If we set k=1024, the proof will be reduced by log_2(k) = 10 times.
The Verkle Tree design is shown as follows:
For each node, there are two pieces of information: (1) value; (2) existence proof π. For example, the green-marked (H(k,v),π_03) shows that H(k,v) exists in commitment C_0 and π_03 is the proof of this argument.
Similarly, (C_0,π_0) means that C_0 exists in commitment C_Root and π_0 is the proof of this argument.
In the paper Verkle Tree, the method of such existence commitment is called Vector commitment. If the Vector commitment scheme is used to execute existence commitment for the original data, the Proof with O(1) complexity will be obtained while the complexity of Construct Proof and that of update Proof are O(n^2),O(n) respectively.
Therefore, to strike a balance, the K-ary Verkle Tree scheme is used in the paper Verkle Tree (as shown in Figure 2) to make the complexity of construct Proof, update Proof and Proof be O(kn),O(klogk n),O(logk n) respectively.
The specific performance comparison is shown in Table 1:
In this article, we are not intended to provide a detailed introduction to some specific vector commitment schemes, which John Kuszmaul has explained well in his paper.
Fortunately, compared to the vector commitment, we have a more efficient tool called polynomial commitment.
Given a group of the coordinate set (c_0,c_1,....,c_n) and a value set (y_1,y_2,....,y_n), you can construct a polynomial (Lagrange interpolation), satisfying P(c_i)=y_i, and conduct a commitment to this polynomial.
KZG10 and are common polynomial commitment schemes (At this point, the commitment is a point on the elliptic curve, typically between 32 and 48 bytes in size).
Take KZG10 as an example. For the polynomial P(x), we use [P(s)]_1 to represent the polynomial commitment.
As we all know, for P(x), if P(z)=y, then (x−z)|(P(x)−y).That is to say, if we set Q(x)=(P(x)−y)/(x−z), then Q(x) is a polynomial.
Now, we generate a proof for P(x)P(x) to satisfy P(z)=yP(z)=y. That is, calculate [Q(s)]1[Q(s)]1 and send it to the verifier, who needs to verify:
Because s is a randomly-chosen point in the finite domain F, the probability of the prover’s succesful evil behavior is degree(Q)/P (Schwartz–Zippel lemma).
Now, we want to prove that the values of the polynomial P(x) on (z0,z1,....,zk−1) are (y1,y2,....,yk−1), respectively. Therefore, we need to define two polynomials:
According to the description mentioned above, we need to satisfy V(x)|(P(x)−I(x)). That is, there exists a polynomial Q(x) , satisfying:
Therefore, the Prover needs to provide the commitments [P(s)]_1,[Q(s)]_1 for P(x) and Q(x), and send the commitments to the verifier.
The Verifier calculates [I(s)]_1,[V(s)]_2 locally, and verifies the equation:
It is clear that the proof size is constant no matter how many Points there are. If we choose the BLS12-381 curve, the Proof size is only 48 bytes, which is very efficient.
Compared to Merkle Tree, in which to prove the existence of an element, the prover still needs to provide the proof with O(log_2n) size, Verkle Tree has made a great improvement on the proof size.
Let’s check out a simple example of Verkle Tree.
It can be seen that, similar to the Merkle Patricia Tree structure, nodes can be divided into three types - empty node, inner node, and leaf node.
The width of each inner node tree is 16 (0000->1111 in hexadecimal). To prove that the state of the leaf node is (0101 0111 1010 1111 -> 1213), we need to conduct the commitment to Inner node A and Inner node B:
Prove that the value of Inner node B’s commitment is hash (0101 0111 1010 1111, 1213) at index 1010.
Prove that the value of Inner node A’s commitment is hash (cm_B) at index 0111.
Prove that the value of node Root’s commitment is hash (cm_A) at index 0101;
Use C_0(InnernodeB),C_1(InnernodeA),C_2(Root) to represent the commitments mentioned above and correspond them to the polynomial f_i(x) respectively.
Therefore, the Prover needs to prove:
To make it easy, we will use z_i to represent the index.
The prover needs to prove that for the polynomial set f_0(x),f_1(x),....,f_m−1(x), it satisfies the following conditions at points z_0,z_1,....,z_m−1, respectively:
Verifier executes the verification:
Generate some random numbers r_0,r_1,....,r_m−1, and gather the above quotient polynomials together:
The prover conducts commitment to the polynomial g(x) and send [g(s)]_1 to the verifier.
Next, let the verifier believe that [g(s)]_1 is the commitment to the polynomial g(x).
Observe the form of the polynomial g(x)g(x), which can be written as:
Choose a value tt randomly and there is:
Define the polynomial:
Its commitment can be calculated with the following method:
Then the value of the polynomial h(x)−g(x)h(x)−g(x) at point tt is:
Calculate the quotient polynomial q(x)=(h(x)−g(x)−y)/(x−z).
Calculate the commitment π = [q(s)]_1=[(h(s)−g(s)−y)/(s−t)]_1, and send it to the verifier.
Verifier performs the following verification:
Any number of points can be proved using this scheme without changing the Proof size. (For each commitment, there is a proof π.)
The value of y_i do not need to be provided explicitly as it is the hash of the next layer value.
The value of x_i do not need to be provided explicitly as it can be judged from Key.
The public information used includes the key/value pair to be proved and the corresponding commitments from the basic level to the upper level.
Dankrad Feist, “PCS multiproofs using random evaluation,” https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html, accessed: 2022-05-10.
Vitalik Buterin, “Verkle trees,” https://vitalik.ca/general/2021/06/18/verkle.html, accessed: 2022-05-10.
John Kuszmaul, “Verkle Trees,” https://math.mit.edu/research/highschool/primes/materials/2018/Kuszmaul.pdf, accessed: 2022-05-10.