-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a downloader component separate from the stages #764
Comments
we're already doing this for bodies, right? Can you elaborate on At this point I'm thinking we can achieve all that just by converting the downloader into a |
No, they are not downloaded optimistically. The bodies stage asks for a range of bodies and then the current downloader will download only that range. If we are blocked on e.g. the first body, then we are spending a lot of time where we are not downloading anything. The main idea here is that the downloader will try to download as much as possible on its own, not only when the stages ask for a range of data.
In reference to what specifically? |
blocked by what exactly? Do you refer to the execute function of the BodiesStage? which only tries to download a fixed batch per invocation which takes as long as the longest response + valdition (worst case first request arrives last), hence preventing follow-up downloads. This could be fixed by turning the bodies downloader into a stream itself that automatically sends new requests once a response arrived.
separate to what: do we want to extract them out of the stages (Headers,Bodies)? |
Hmm it sounds plausible that this may be addressable this way without a larger refactor? Do we still want to do the refactor as it should improve headers download speed as well? |
I would want the refactor to be done, it makes sense to merge the downloaders into one since they are so similar. It also makes sense to lookahead and download as much as possible. If we just turn the bodies downloader into a stream we would likely still have the same issues I think? In the sense that the stage will request what it needs, no more. If we want it to request more than it needs, then there is a lot of additional logic needed in the stage and it ends up not really benefitting us in that sense
I'm still not entirely sure what you mean. If you are referencing this line: "The reason the client is separated from the downloader itself is so it can be swapped; " Then what I mean is that the downloader component itself takes some trait
Slow peers.
See above: we request a range of blocks If the last 9 batches arrive, and the first batch is blocked on a slow peer, then naturally we should repurpose the fast peers to download all the other batches. If the downloader component was separate, it could:
In the current case, each time the bodies stage is executed, we will probably end up talking to the slow peer, which means that if the peer takes 20 seconds to respond, that's 20 seconds wasted for every Hopefully that makes sense? |
in |
Ok I think I understand now, basically we would have a long-lived stream with an internal buffer and e.g. the bodies stage would:
And the stream would:
(and all the other stuff like validation, making sure bodies are returned sequentially etc) |
Describe the feature
Currently we have two downloaders: a concurrent one for bodies and a linear one for headers.
Ideally, both downloaders would be concurrent. They have some shared logic:
Both of the downloaders also have retry logic and peer penalization.
Both downloaders also have their own issues:
These issues lead to not saturating the network properly: each downloader, at some point, is not requesting as much data as it could, for various reasons. This leads to slow sync times for online stages.
To address these issues, the plan is now to create a downloader component that lives outside of the stages, with a channel to communicate with the stages.
High level plan
Point 2) also means that the downloader component must satisfy these invariants:
Flow
The downloader MUST not download headers and bodies at the same time, see the concurrency section.
It is up to the stages to control how much data to request, i.e. it is OK for stages to only ask for e.g. 1000 headers and commit.
Concurrency
The bodies are already downloaded concurrently and the headers will follow the same general idea:
For a range of block a to b, slice the range into n batches and request each batch from idle peers in the set, at most c at a time. Note that the batch size for headers and bodies should be different because of the fundamental constraint for request size being the message size limit of devp2p.
For headers to be downloaded concurrently, we have to request headers by block number instead of block hash.
There was some thought about downloading headers and bodies optimistically at the same time (i.e. as soon as a header has been downloaded, download the corresponding body), however this does not make a lot of sense for two reasons:
Validation
For headers:
For bodies:
Note that the validation for bodies can be expensive
Configuration
The following parameters at least must be configurable:
Client vs downloader
The downloader sends requests to a client. For most of sync, this client will be the
FetchClient
in the networking component, which forwards requests to peers.The reason the client is separated from the downloader itself is so it can be swapped; this is particularly important for post-merge syncing where syncing from a consensus client via the engine API is more appropriate - it is still possible to ask execution client peers for block data, but this will lead to us always being a bit behind.
Post-merge download
The consensus client will post new blocks to us via the Engine API. These new blocks (called payloads) should be kept in-memory by some component (TBD) until the consensus client sends a new forkchoice state. When the forkchoice state is sent, we need to figure out what payloads are now on the canonical chain and which we can discard. The new canonical blocks are kept in a buffer for the downloader to pick from later.
I propose that this buffering mechanism between EL/CL is kept in a engine API-specific downloader client. At some condition, we switch between the P2P client and this engine API client. This switching logic can be handled in a client that wraps both the P2P client and the engine API client.
Error handling
The downloader should internally handle timeouts, invalid responses, and retries. These should not propagate to the stages themselves.
The only errors that should propagate to the stages themselves are fatal errors from which the downloader can never recover. This will mean that the stage will block until it gets more data, however, this is fine since the pipeline cannot meaningfully progress without the requested data.
Re-orgs
The downloader should keep track of whether a re-org occurred or not and communicate this to the stages. For beacon consensus this would occur when a fork choice state is sent from the consensus layer to us and the new tip does not connect directly onto our local head.
The downloader should emit an event upon request of data containing:
Upon receiving this event online stages must unwind to the latest block we have that connects to the new tip in order to discard any data that is no longer on the canonical chain.
In my opinion this can be left for later when we have the engine API and are able to meaningfully test it
Additional context
Supercedes #744, #741 and #391
The text was updated successfully, but these errors were encountered: