Optimizing EIP-4844 transaction validation for mempool (using KZG proofs) #5088

asn-d6 · 2022-05-11T10:40:28Z

Hello,

this PR uses KZG proofs to speed up the procedure of validating blobs of EIP-4844 transactions in the mempool.

With the current EIP-4844 proposal it takes about 40ms to validate the blobs, and we received concerns that it would be too slow for mempool validation. With this PR we can bring the verification time of the entire transaction to about 3.5 ms regardless of the number of blobs included (also see subsequent post on this PR).

Details

To validate a 4844 transaction in the mempool, the verifier checks that each provided KZG commitment matches the polynomial represented by the corresponding blob data (see validate_blob_transaction_wrapper()).

     | d_1 | d_2 | d_3 | ... | d_4096 |    -> commitment

Before this patch, to do this validation, we reconstructed the commitment from the blob data (d_i above), and checked it against the provided commitment. This was expensive because computing a commitment from blob data (even using Lagrange basis) involves N scalar multiplications, where N is the number of field elements per blob.

Initial benchmarking showed that this was about 40ms for N=4096 which was deemed too expensive.

In this patch, we speed this up by providing a KZG proof for each commitment. The verifier can check the proof to ensure that the KZG commitment matches the polynomial represented by the corresponding blob data.

     | d_1 | d_2 | d_3 | ... | d_4096 |    -> commitment, proof

To do so, we evaluate the blob data polynomial at a random point x to get a value y. We then use the KZG proof to ensure that the committed polynomial (i.e. the commitment) also evaluates to y at x. If the check passes, it means that the KZG commitment matches the polynomial represented by the blob data.

This is significantly faster since evaluating the blob data polynomial at a random point using the Barycentric formula can be done efficiently using only field operations. Then, verifying a KZG proof takes two pairing operations (which take about 0.6ms each). This brings the total verification cost to about 2 ms per blob.

Drawbacks

The main drawback of this technique is that it requires an implementation of the Barycentric formula. You can see in the PR that it's not that much code, but it's still an increase in required math. All the math code in this PR have been stolen from the Danksharding PR (ethereum/consensus-specs#2792) with a few simplifications and bug fixes.

It also very slightly increases the transaction size by 48 bytes per blob (each blob is 128kb and the proof is 48 bytes of overhead).

Optimizations

There is a bunch of optimizations that can/should be done here:

We can aggregate and verify all blobs using random linear combinations to bring the validation time to 2.5 ms per transaction (instead of per blob). See my next post of this PR on how this is done.

We can apply the same technique on the consensus side, which will allow verifying the blobs of the entire block (up to 16 blobs) much more efficiently.

Also, you can see that this patch removes a call to blob_to_kzg(). The only other use of blog_to_kzg() is in the blob verification precompile. This means that if that precompile gets removed (as discussed in the past, in favor of the point verification one), we can completely remove blob_to_kzg() and the structures associated with it.

Kudos to @dankrad for suggesting this approach and for the danksharding code.
Thanks to @adietrichs for proofreading and for catching a mistake in the Fiat-Shamir computation.

shamatar · 2022-05-11T10:49:16Z

Great to see that this optimization was implemented

asn-d6 · 2022-05-17T11:28:28Z

FWIW, I'm working on some explanation and code for the optimizations discussed at the end of the post. WIP.

asn-d6 · 2022-05-24T11:21:36Z

In the previous post of this PR, we mentioned a possible optimization that allows us to verify any number of blob commitments in approximately the same time it takes us to verify a single blob commitment.

We just pushed two commits in this PR that implement the technique (603ab2a and 7d3b449) so let's dive into how it all works.

spoiler: it's all about taking random linear combinations and KZG having a neat algebraic structure

We will demonstrate the technique for a transaction with two blobs but the technique can be generalized to an arbitrary amount of blobs (e.g. MAX_BLOBS_PER_BLOCK). Similarly, we will be using FIELD_ELEMENTS_PER_BLOB=4 (instead of 4096) just for the sake of this demonstration.

Here is a transaction with two blobs (and hence two commitments and two proofs) arranged in matrix form:

     | d_00 | d_01 | d_02 | d_03 |    -> commitment_0, proof_0
     | d_10 | d_11 | d_12 | d_13 |    -> commitment_1, proof_1

The previous post's verification logic would go over each blob and verify its commitment using the corresponding proof. The time to verify each proof is 2.5ms and hence this would take us 2.5 ms * 2 = 5ms.

Let's get a bit deeper now and see how we can minimize the verification cost.

The rough idea is that instead of verifying each blob individually, we combine all blobs into a single aggregated blob using a random linear combination.

This new aggregated blob corresponds to a single polynomial of degree 4 (let's call it f(x)), the same way that each individual blob corresponds to a polynomial of degree 4. The cool thing here is that due to the clean algebraic structure of KZG commitments, it's possible to show that by taking the same random linear combination of the commitments, we can compute an aggregated commitment that corresponds to f(x).

In the diagram below, we show how we produce the aggregated blob and aggregated commitment via a random linear combination using a random scalar r:

So right now we have a single aggregated blob that we need to check against this new aggregated commitment. In terms of security, the fact that we used a random linear combination, means that checking the aggregated commitment against the aggregated blob is practically equivalent to checking each individual blob against its individual commitment. This is a classic argument in cryptography when aggregating proofs; for similar situations see the Random linear combinations lemma in the Hyperproofs paper, or SnarkPack.

The only missing part now is how to actually check the aggregated commitment against the aggregated blob. To do this verification, the transaction creator includes an aggregated proof on the transaction which gets verified by the verifier using the Barycentric formula technique of the previous post. There is no need for individual blob proofs anymore, and hence the transaction looks a bit like this:

     | d_00 | d_01 | d_02 | d_03 |    -> commitment_0
     | d_10 | d_11 | d_12 | d_13 |    -> commitment_1

                                         aggregated_proof

To summarize, the verifier creates an aggregated commitment and an aggregated blob using a random linear combination and verifies them using a provided aggregated proof.

Now let's analyze the computational cost of the above procedure:

We verify a single KZG proof (2.5ms)
We do a multiscalar exponentiation of the commitments to get the aggregated commitment
We do a random linear combination of the blobs (just field element operations) to get the aggregated blob

The computational cost is dominated by the KZG proof verification, and hence we expect the total procedure to take about 3.5ms (benchmarks pending).

This pretty much brings the cost of verifying EIP-4844 blocks and transactions to a near-optimal level, since now instead of verifying a linear number of KZG proofs, we just verify a single KZG proof, and instead do linear finite field operations (which are cheap).

Post written with the invaluable help of @adietrichs.

Code once again stolen from @dankrad's danksharding code with minor modifications.

All code pretty much straight up copied from ethereum/EIPs#5088

asn-d6

Added some comments to assist Dankrad with code review, by highlighting the differences between this code and the Danksharding PR.

asn-d6 · 2022-06-21T12:21:16Z

EIPS/eip-4844.md

+    Compute the modular inverse of x using the eGCD algorithm
+    i.e. return y such that x * y % BLS_MODULUS == 1 and return 0 for x == 0
+    """
+    if x == 0:


Difference from danksharding PR: Check for x == 0

asn-d6 · 2022-06-21T12:24:11Z

EIPS/eip-4844.md

+    return x * inv(y) % MODULUS
+
+
+def evaluate_polynomial_in_evaluation_form(poly: List[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:


Difference from danksharding PR: Switch barycentric code with this one from research repo: https://github.com/ethereum/research/blob/master/verkle_trie/kzg_utils.py#L35

The barycentric formula code from the danksharding PR was not giving the right results based on some rudimentary tests.

asn-d6 · 2022-06-21T12:26:50Z

EIPS/eip-4844.md

@@ -46,6 +46,7 @@ Compared to full data sharding, this EIP has a reduced cap on the number of thes
 | `BLS_MODULUS` | `52435875175126190479447740508185965837690552500527637822603658699938581184513` |
 | `KZG_SETUP_G2` | `Vector[G2Point, FIELD_ELEMENTS_PER_BLOB]`, contents TBD |
 | `KZG_SETUP_LAGRANGE` | `Vector[KZGCommitment, FIELD_ELEMENTS_PER_BLOB]`, contents TBD |
+| `ROOTS_OF_UNITY` | `Vector[BLSFieldElement, FIELD_ELEMENTS_PER_BLOB]` |


Difference from danksharding PR: Add the roots of unity list as a global constant instead of having explicit code that generates them on demand.

There is no `tx.message.blob_commitments` anymore, or `kzg_to_commitment()`

To validate a 4844 transaction in the mempool, the verifier checks that each provided KZG commitment matches the polynomial represented by the corresponding blob data. | d_1 | d_2 | d_3 | ... | d_4096 | -> commitment Before this patch, to do this validation, we reconstructed the commitment from the blob data (d_i above), and checked it against the provided commitment. This was expensive because computing a commitment from blob data (even using Lagrange basis) involves N scalar multiplications, where N is the number of field elements per blob. Initial benchmarking showed that this was about 40ms for N=4096 which was deemed too expensive. For more details see: https://hackmd.io/@protolambda/eip-4844-implementer-notes#Optimizations protolambda/go-ethereum#4 In this patch, we speed this up by providing a KZG proof for each commitment. The verifier can check that proof to ensure that the KZG commitment matches the polynomial represented by the corresponding blob data. | d_1 | d_2 | d_3 | ... | d_4096 | -> commitment, proof To do so, we evaluate the blob data polynomial at a random point `x` to get a value `y`. We then use the KZG proof to ensure that the commited polynomial (i.e. the commitment) also evaluates to `y` at `x`. If the check passes, it means that the KZG commitment matches the polynomial represented by the blob data. This is significantly faster since evaluating the blob data polynomial at a random point using the Barycentric formula can be done efficiently with only field operations (see https://hackmd.io/@vbuterin/barycentric_evaluation). Then, verifying a KZG proof takes two pairing operations (which take about 0.6ms each). This brings the total verification cost to about 2 ms per blob. With some additional optimizations (using linear combination tricks as the ones linked above) we can batch all the blobs together into a single efficient verification, and hence verify the entire transaction in 2.5 ms. The same techniques can be used to efficiently verify blocks on the consensus side.

Also abstract `lincomb()` out of the `blob_to_kzg()` function to be used in the verification.

eth-bot · 2022-06-29T10:43:11Z

All tests passed; auto-merging...

(pass) eip-4844.md

classification
`updateEIP`

passed!

asn-d6 · 2022-06-29T10:43:47Z

Rebased and force pushed because of conflicts with #5106

MicahZoltu

Triggering bot.

hwwhww

Some implementation issues.

hwwhww · 2022-06-29T12:28:27Z

EIPS/eip-4844.md

+    assert width == FIELD_ELEMENTS_PER_BLOB
+    inverse_width = bls_modular_inverse(width)
+
+    for i in range(width):


initialize r

Suggested change

for i in range(width):

r = 0

for i in range(width):

hwwhww · 2022-06-29T12:42:14Z

EIPS/eip-4844.md


 ### Helpers

 Converts a blob to its corresponding KZG point:

 ```python
+def lincomb(points: List[KZGCommitment], scalars: List[BLSFieldElement]) -> KZGCommitment:


Although it's in EL's point of view, this EIP uses SSZ to define the parameters and constants. It would be better to use the more general Sequence type so that (i) it can accept all sequence types (basic Python sequence and SSZ sequence), (ii) less confusion, and (iii) be similar to CL specs.

Suggested change

def lincomb(points: List[KZGCommitment], scalars: List[BLSFieldElement]) -> KZGCommitment:

def lincomb(points: Sequence[KZGCommitment], scalars: Sequence[BLSFieldElement]) -> KZGCommitment:

hwwhww · 2022-06-29T12:42:52Z

EIPS/eip-4844.md

+    return x * bls_modular_inverse(y) % BLS_MODULUS
+
+
+def evaluate_polynomial_in_evaluation_form(poly: List[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:


ditto

Suggested change

def evaluate_polynomial_in_evaluation_form(poly: List[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:

def evaluate_polynomial_in_evaluation_form(poly: Sequence[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:

hwwhww · 2022-06-29T12:44:07Z

EIPS/eip-4844.md

+        current_power = current_power * int(x) % BLS_MODULUS
+    return powers
+
+def vector_lincomb(vectors: List[List[BLSFieldElement]], scalars: List[BLSFieldElement]) -> List[BLSFieldElement]:


Suggested change

def vector_lincomb(vectors: List[List[BLSFieldElement]], scalars: List[BLSFieldElement]) -> List[BLSFieldElement]:

def vector_lincomb(vectors: Sequence[Sequence[BLSFieldElement]], scalars: Sequence[BLSFieldElement]) -> Sequence[BLSFieldElement]:

hwwhww · 2022-06-29T12:44:27Z

EIPS/eip-4844.md

+    Given a list of vectors, compute the linear combination of each column with `scalars`, and return the resulting
+    vector.
+    """
+    r = [0]*len(vectors[0])


Suggested change

r = [0]*len(vectors[0])

r = [0] * len(vectors[0])

hwwhww · 2022-06-29T12:49:17Z

EIPS/eip-4844.md

+    number_of_blobs = len(blobs)
+
+    # Generate random linear combination challenges
+    r = hash_to_bls_field([blobs, commitments])


hash_to_bls_field accepts Container. I think it needs to define a Container and cast a type here.

class BlobsAndCommmitments(Container): blobs: List[Blob, MAX_BLOBS_PER_BLOCK] blob_kzgs: List[KZGCommitment, MAX_BLOBS_PER_BLOCK]

and do

r = hash_to_bls_field(BlobsAndCommmitments(blobs=blobs, blob_kzgs=commitments))

hwwhww · 2022-06-29T12:58:06Z

EIPS/eip-4844.md

+    aggregated_poly = vector_lincomb(blobs, r_powers)
+
+    # Generate challenge `x` and evaluate the aggregated polynomial at `x`
+    x = hash_to_bls_field([aggregated_poly, aggregated_poly_commitment])


ditto, need casting

…ofs) (ethereum#5088) * Fix missing variables/funcs in validate_blob_transaction_wrapper() There is no `tx.message.blob_commitments` anymore, or `kzg_to_commitment()` * Introduce KZGProof as its own type instead of using KZGCommitment * Introduce high-level logic of new efficient transaction validation To validate a 4844 transaction in the mempool, the verifier checks that each provided KZG commitment matches the polynomial represented by the corresponding blob data. | d_1 | d_2 | d_3 | ... | d_4096 | -> commitment Before this patch, to do this validation, we reconstructed the commitment from the blob data (d_i above), and checked it against the provided commitment. This was expensive because computing a commitment from blob data (even using Lagrange basis) involves N scalar multiplications, where N is the number of field elements per blob. Initial benchmarking showed that this was about 40ms for N=4096 which was deemed too expensive. For more details see: https://hackmd.io/@protolambda/eip-4844-implementer-notes#Optimizations protolambda/go-ethereum#4 In this patch, we speed this up by providing a KZG proof for each commitment. The verifier can check that proof to ensure that the KZG commitment matches the polynomial represented by the corresponding blob data. | d_1 | d_2 | d_3 | ... | d_4096 | -> commitment, proof To do so, we evaluate the blob data polynomial at a random point `x` to get a value `y`. We then use the KZG proof to ensure that the commited polynomial (i.e. the commitment) also evaluates to `y` at `x`. If the check passes, it means that the KZG commitment matches the polynomial represented by the blob data. This is significantly faster since evaluating the blob data polynomial at a random point using the Barycentric formula can be done efficiently with only field operations (see https://hackmd.io/@vbuterin/barycentric_evaluation). Then, verifying a KZG proof takes two pairing operations (which take about 0.6ms each). This brings the total verification cost to about 2 ms per blob. With some additional optimizations (using linear combination tricks as the ones linked above) we can batch all the blobs together into a single efficient verification, and hence verify the entire transaction in 2.5 ms. The same techniques can be used to efficiently verify blocks on the consensus side. * Introduce polynomial helper functions for transaction validation * Implement high-level logic of aggregated proof verification * Add helper functions for aggregated proof verification Also abstract `lincomb()` out of the `blob_to_kzg()` function to be used in the verification. * Fixes after review on the consensus PR

asn-d6 force-pushed the eip4844-proofs-optimization branch from 6dae5c8 to b967f25 Compare May 23, 2022 09:54

asn-d6 mentioned this pull request May 24, 2022

Ethereum Core Devs Meeting 139 Agenda ethereum/pm#528

Closed

asn-d6 force-pushed the eip4844-proofs-optimization branch from 201a077 to 6df884c Compare June 13, 2022 12:15

asn-d6 added a commit to asn-d6/consensus-specs that referenced this pull request Jun 13, 2022

Add needed math/crypto functions to validate KZG aggregated proofs

c28e1a1

All code pretty much straight up copied from ethereum/EIPs#5088

asn-d6 added a commit to asn-d6/consensus-specs that referenced this pull request Jun 13, 2022

Add needed math/crypto functions to validate KZG aggregated proofs

b733256

All code pretty much straight up copied from ethereum/EIPs#5088

asn-d6 added a commit to asn-d6/consensus-specs that referenced this pull request Jun 14, 2022

Add needed math/crypto functions to validate KZG aggregated proofs

4be0b9d

All code pretty much straight up copied from ethereum/EIPs#5088

asn-d6 mentioned this pull request Jun 14, 2022

Optimizing EIP-4844 block validation (using KZG proofs) ethereum/consensus-specs#2915

Merged

asn-d6 changed the title ~~Optimizing EIP-4844 transaction validation (using KZG proofs)~~ Optimizing EIP-4844 transaction validation for mempool (using KZG proofs) Jun 20, 2022

asn-d6 commented Jun 21, 2022

View reviewed changes

asn-d6 force-pushed the eip4844-proofs-optimization branch from 3cb77a8 to 08e5f92 Compare June 27, 2022 06:24

asn-d6 added 7 commits June 29, 2022 13:40

Fix missing variables/funcs in validate_blob_transaction_wrapper()

0c399f1

There is no `tx.message.blob_commitments` anymore, or `kzg_to_commitment()`

Introduce KZGProof as its own type instead of using KZGCommitment

e3f2bd4

Introduce polynomial helper functions for transaction validation

ab7eef0

Implement high-level logic of aggregated proof verification

603ab2a

Add helper functions for aggregated proof verification

7d3b449

Also abstract `lincomb()` out of the `blob_to_kzg()` function to be used in the verification.

Fixes after review on the consensus PR

ea9ae70

asn-d6 force-pushed the eip4844-proofs-optimization branch from 08e5f92 to ea9ae70 Compare June 29, 2022 10:42

eth-bot added the updateEIP label Jun 29, 2022

asn-d6 marked this pull request as ready for review June 29, 2022 10:43

MicahZoltu approved these changes Jun 29, 2022

View reviewed changes

kodiakhq bot merged commit 0cf9afe into ethereum:master Jun 29, 2022

hwwhww reviewed Jun 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing EIP-4844 transaction validation for mempool (using KZG proofs) #5088

Optimizing EIP-4844 transaction validation for mempool (using KZG proofs) #5088

asn-d6 commented May 11, 2022 •

edited

Loading

shamatar commented May 11, 2022

asn-d6 commented May 17, 2022

asn-d6 commented May 24, 2022 •

edited

Loading

asn-d6 left a comment

asn-d6 Jun 21, 2022

asn-d6 Jun 21, 2022

asn-d6 Jun 21, 2022

eth-bot commented Jun 29, 2022

asn-d6 commented Jun 29, 2022

MicahZoltu left a comment

hwwhww left a comment

hwwhww Jun 29, 2022

hwwhww Jun 29, 2022

hwwhww Jun 29, 2022

hwwhww Jun 29, 2022

hwwhww Jun 29, 2022

hwwhww Jun 29, 2022 •

edited

Loading

hwwhww Jun 29, 2022

		return x * inv(y) % MODULUS


		def evaluate_polynomial_in_evaluation_form(poly: List[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:

	def lincomb(points: List[KZGCommitment], scalars: List[BLSFieldElement]) -> KZGCommitment:
	def lincomb(points: Sequence[KZGCommitment], scalars: Sequence[BLSFieldElement]) -> KZGCommitment:

		return x * bls_modular_inverse(y) % BLS_MODULUS


		def evaluate_polynomial_in_evaluation_form(poly: List[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:

	def vector_lincomb(vectors: List[List[BLSFieldElement]], scalars: List[BLSFieldElement]) -> List[BLSFieldElement]:
	def vector_lincomb(vectors: Sequence[Sequence[BLSFieldElement]], scalars: Sequence[BLSFieldElement]) -> Sequence[BLSFieldElement]:

Optimizing EIP-4844 transaction validation for mempool (using KZG proofs) #5088

Optimizing EIP-4844 transaction validation for mempool (using KZG proofs) #5088

Conversation

asn-d6 commented May 11, 2022 • edited Loading

Details

Drawbacks

Optimizations

shamatar commented May 11, 2022

asn-d6 commented May 17, 2022

asn-d6 commented May 24, 2022 • edited Loading

asn-d6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eth-bot commented Jun 29, 2022

All tests passed; auto-merging...

(pass) eip-4844.md

asn-d6 commented Jun 29, 2022

MicahZoltu left a comment

Choose a reason for hiding this comment

hwwhww left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwwhww Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asn-d6 commented May 11, 2022 •

edited

Loading

asn-d6 commented May 24, 2022 •

edited

Loading

hwwhww Jun 29, 2022 •

edited

Loading