Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine API: add getPayloadBodiesByRangeV1 to #146 #218

Closed
wants to merge 12 commits into from

Conversation

arnetheduck
Copy link
Contributor

This PR extends #146 to also specify a by-range request.

Similar to the consensus p2p spec, a by-range request allows execution
clients to store finalized blocks in linear by-number storage instead of
relying on by-hash indices, significantly increasing efficiency in
fetching them from cold storage.

Clients whose database design does not permit efficient by-number
lookups may opt to not implement this call, but must then give a
well-known error code allowing consensus later clients to fall back to a
less efficient method of fetching the blocks.

This specification assumes that execution clients know nothing of slot
numbers as seen on the consensus layer. Should execution clients later
learn about these, the specification may be amended to work with slot
numbers instead.

Until then, consensus clients must be careful to compute block numbers
correctly.

Consensus clients must also be careful when this request is used to
fetch non-finalized blocks, reverting to by-root requests if an
unexpected chain is returned.

mkalinin and others added 6 commits December 10, 2021 15:47
Similar to the consensus p2p spec, a by-range request allows execution
clients to store finalized blocks in linear by-number storage instead of
relying on by-hash indices, significantly increasing efficiency in
fetching them from cold storage.

Clients whose database design does not permit efficient by-number
lookups may opt to not implement this call, but must then give a
well-known error code allowing consensus later clients to fall back to a
less efficient method of fetching the blocks.

This specification assumes that execution clients know nothing of slot
numbers as seen on the consensus layer. Should execution clients later
learn about these, the specification may be amended to work with slot
numbers instead.

Until then, consensus clients must be careful to compute block numbers
correctly.

Consensus clients must also be careful when this request is used to
fetch non-finalized blocks, reverting to by-root requests if an
unexpected chain is returned.
@arnetheduck arnetheduck changed the title Lots of bodies Execution API: add getPayloadBodiesByRangeV1 to #146 May 6, 2022
@arnetheduck arnetheduck changed the title Execution API: add getPayloadBodiesByRangeV1 to #146 Engine API: add getPayloadBodiesByRangeV1 to #146 May 6, 2022
Copy link
Collaborator

@mkalinin mkalinin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments.

One recent addition worth considering is timeouts. I think we should modify Timeouts section to say that if timeout parameter isn't specified, CL has a liberty to decide how much time it should wait for response. But in this particular case I think that timeouts matter as these requests may be time consuming as they require disk access.

Also, we may consider a limitation on a number of block bodies in the response. Say something like 1024 as a hard cap to prevent buggy CL making EL read and send back a million of blocks.

src/engine/specification.md Show resolved Hide resolved
src/engine/specification.md Outdated Show resolved Hide resolved
* result: `Array of ExecutionPayloadBodyV1` - Array of [`ExecutionPayloadBodyV1`](#ExecutionPayloadBodyV1) objects.
* error: code and message set in case an exception happens while processing the method call.
* Clients that don't support fetching bodies by range **MUST** return the error code `-32601 Method not found The method does not exist / is not available.`. Callers may then revert to `engine_getPayloadBodiesByRootV1`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this statement to the Specification section of this method

src/engine/specification.md Outdated Show resolved Hide resolved
arnetheduck and others added 2 commits May 10, 2022 16:31
Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com>
Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com>
@arnetheduck
Copy link
Contributor Author

Thanks!

But in this particular case I think that timeouts matter as these requests may be time consuming as they require disk access.

In the consensus layer, we have to start responding respond within 5s, and the entire response must take no longer than 10s in total which effectively puts a few caps on request sizes - if we can't serve a response within this time, it's likely useless. I haven't followed the rationale for adding timeouts to this spec in particular, but if we're going to add it, these are the values to take into consideration.

Say something like 1024 as a hard cap to prevent buggy CL making EL read and send back a million of blocks.

On the CL side, we "soft limit" requests to 1024 slots, meaning that anyone that requests more will not get a "full" answer.

Since the EL/CL is more of trusted connection, we could also opt for a hard limit where the server gives an error if more is requested - I agree it's useful to add this as a coordination point so that clients don't make outrageous requests.

I'd probably go with a soft limit - I'll put that and 1024 in the next edit unless someone comes along with an opinion :)

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Agree with @mkalinin's comments

Also, summoning @paulhauner to ensure that this fits with the deduplication engineering they've been working on in lighthouse

#### Specification

1. Given a `start` and a `count`, the client software **MUST** respond with array of `ExecutionPayloadBodyV1` objects with the corresponding execution block number respecting the order of blocks in the canonical chain, as selected by the latest `forkChoiceUpdated` call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add an error condition for if the EL does not have the requisite data (either a malformed CL request or some sort of failure/resync on EL side

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated response to give nil when blocks are missing

1. Given array of block hashes client software **MUST** respond with array of `ExecutionPayloadBodyV1` objects with the corresponding hashes respecting the order of block hashes in the input array.

1. Client software **MUST** skip the payload body in the response array if the data of this body is missing. For instance, if the request is `[A.block_hash, B.block_hash, C.block_hash]` and client software has data of payloads `A` and `C`, but doesn't have data of `B`, the response **MUST** be `[A.body, C.body]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats a bit weird, if I request n-blocks I would expect n-blocks even if some blocks are nil.
So I would rather return [A.body, nil, C.body]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this raises a question: should CL:s verify the hash of the data? including nil here alleviates CL:s of this requirement (which they otherwise have to to, to match response to request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of including the nils, as it's conceptually cleaner to have one response per request, and I think CL verification should be optional (although we should probably enable it by default, especially initially).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only I'm in favor of including nil but I think CL software should be mandated to check the hashes or do some verification that the data was not corrupted in the wire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CL software should be mandated

the only viable way to mandate it is by making it technically infeasible to not check - if we mandate it, we should also reap the benefits of having the hash check there (and use a more compact response that potentially can be reordered) - I see this mostly as a trust decision - if we include nil, we're saying that the CL fully trusts the EL and at that point, I don't think it's reasonable to mandate verification because it makes serving block requests more expensive for little gain (ie no trust gain, tiny chance of detecting corruption that whoever did the block request on the far end will have to repeat anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil has been added as a required response for missing blocks - consumers may or may not verify the payload - this is up to their own quality policy - when they are sending the EL contents to someone on the internet, that someone will have to verify it regardless, so it's a bit of a waste to redo it, assuming they trust their EL.

@lightclient lightclient added the A-engine Area: for future consideration label Jun 22, 2022

1. Client software **MUST** skip the payload body in the response array if the data of this body is missing. For instance, if the request is `[A.block_hash, B.block_hash, C.block_hash]` and client software has data of payloads `A` and `C`, but doesn't have data of `B`, the response **MUST** be `[A.body, C.body]`.

1. Clients that support `engine_getPayloadBodiesByRangeV1` **MAY NOT** respond to requests for finalized blocks by hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because they may have pruned them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as CL really - this allows us to drop hash-based indices and retrieve blocks from linear archival storage - the aim with this clause is to ensure that CL:s use the linear request for prefinality blocks and only "top up" with by-hash requests where forks are possible.

@mkalinin
Copy link
Collaborator

There is a rough consensus to attempt to include these methods into Shanghai, if there is any Shanghai delay related to implementation of proposed changes then we can reconsider these methods for a later inclusion.

@arnetheduck would you mind to rebase this PR with the most recent changes from main, it implies adding these new methods into shanghai.md and appending withdrawals to the ExecutionPayloadBodyV1 structure. After a conversation with EL client devs I feel like both methods are easy to be implemented thus I think both should be required, i.e. I'd remove optionality from the spec of the ByRangeV1.

@mkalinin
Copy link
Collaborator

Rebased, refined and moved to #352

@mkalinin mkalinin closed this Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-engine Area: for future consideration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants