wip: feat(sync): headers stage #58

rkrasiuk · 2022-10-13T10:55:12Z

still WIP, currently doesn't compile, because of db types

This is the basic implementation of the headers stage. The current steps are:

Query for the chain tip
Download headers in batches in reverse from the chain tip up to the last stored header
Header validation and consistency checks are performed upon receiving them
Store the headers in the db

Things left to be done:

gakonst

Suggest taking a more Stream-centric approach vs Akula's loops, cc @mattsse on my proposed design. @rkrasiuk please investigate if the design I suggest is doable, as it feels super readable & easy to optimize.

gakonst · 2022-10-13T22:00:51Z

crates/stages/src/stages/headers.rs

+    #[error(transparent)]
+    Internal(Box<dyn std::error::Error + Send + Sync>),


Could you try doing trait HeaderClient { type Error: std::error::Error + Send + Sync }, and try using <H as HeaderClient>::Error instead of box dyn? This same thing should be across all stages, would rather avoid having internal box dyn errors bc they're hard to test

gakonst · 2022-10-13T22:21:19Z

crates/stages/src/stages/headers.rs

+            cursor_header_number.put(
+                hash.to_fixed_bytes().to_vec(),
+                header.number,
+                Some(WriteFlags::APPEND),
+            )?;
+            cursor_header.put((header.number, hash), header, Some(WriteFlags::APPEND))?;


do we need to write the header hash/header keys + values in order, or can we do them in any order?

i'd assume they would have to be ordered because if any intermediate error occurs, we need to be able to walk the db somehow to unwind

gakonst · 2022-10-13T22:25:30Z

crates/stages/src/stages/headers.rs

+        let mut out = Vec::<HeaderLocked>::new();
+        loop {
+            match self.download_batch(head, &forkchoice_state, &mut stream, &mut out).await {
+                Ok(done) => {
+                    if done {
+                        return Ok(out)
+                    }
+                }
+                Err(e) if e.is_retryable() && retries > 0 => {
+                    retries -= 1;
+                }
+                Err(e) => return Err(e),
+            }
+        }


Could we take a mapped Stream-like API approach here?

async fn execute(&self, input) -> ... { // get head let (state_tip, _) = self.next_forkchoice_state(&head.hash()).await; let headers = self.download(state_tip).await?; // handle the error // get the tx cursors and write } /// Requests new headers starting from the specified block. async fn request_headers<T: Into<BlockId>>(&self, start: T) -> impl Stream <...> { // Send a P2P request to make the stream start receiving values, if we don't send // this out, it's possible we won't receive any headers that we care about. // This is going to be a no-op I suspect in the test impls let request = HeaderRequest { start: start.into(), limit: self.batch_size, reverse: true }; let request_id = rand::thread_rng().gen(); self.client.send_header_request(request_id, request).timeout(self.timeout).await; // This should return an `impl Stream<Item = (u64, Vec<Header>)>` // The headers in the stream are both unvalidated and unordered header self.client.stream_headers() } async fn download(&self, start: BlockId) -> Result<Vec<Header>> { // Get the stream let stream = self.request_headers(start).await?; // This makes the stream retryable let stream = RetriableStream::new(stream); // This makes the stream timeout (or whatever configurable timeout) // if no headers are received back for a particular request let stream = stream.timeout(5); // Filter the stream's output for only non-empty headers & responses // that match our request_Id let filtered_stream = stream.filter(|(id, headers)| request_id == id && !headers.is_empty()); let headers = { // maybe can avoid collecting if we can operate on ordered stream somehow? // https://docs.rs/ordered-stream/0.0.1/ordered_stream/ let mut h = filtered_stream.collect::<Vec<_>>(); h.sort_unstable_by_key(|h| h.number); h } // TODO: Investigate if this can be done in parallel by validating sorted // buckets in parallel and checking the boundaries between each bucket self.consensus.validate_headers(&headers, start).await?; Ok(headers) }

cc @mattsse I feel like this should be doable as a cleaner abstraction. Avoid the loops and whiles as much as possible, stay in the stream abstraction, collect & sort only when we want to run verification.

gakonst · 2022-10-13T23:16:19Z

crates/stages/src/stages/headers.rs

+        let mut state_rcv = self.consensus.forkchoice_state();
+        loop {
+            state_rcv.changed().await;


Is it possible that this deadlocks things somehow if we wait?

Should we instead have a self.consensus.tip() method which immediately returns H256 instead of having a receiver? Main tradeoff would be that maybe we receive an old tip, and we need to run an extra run of the loop?

that makes sense. should we simply early exit on tip that was already processed?

Yeah I think so? @rakita

I am confused a little bit here, is forkchoice in essence forkid? In forkid EIP there is (passed_block_hash, next_fork_number) https://eips.ethereum.org/EIPS/eip-2364.

What I am thinking about the flow of calls that can potentially deadlock is something in sense of:
HeaderStage -> calls fork_choice -> Consensus
Consensus -> pushed new_block -> HeaderStage.
We need to know where mutexes are

@rakita it gives you the current tip and the last finalized block. ForkId is used in eth p2p which we already have in master

mattsse

perhaps this is a stupid question but can headers be downloaded concurrently?

mattsse · 2022-10-17T07:29:02Z

crates/primitives/Cargo.toml

@@ -11,6 +11,7 @@ description = "Commonly used types in reth."
 ethers-core = { git = "https://github.com/gakonst/ethers-rs", default-features = false }
 bytes = "1.2"
 serde = "1.0"
+ethereum-types = { version = "0.13.1", default-features = false }


what's missing from primitives that makes this necessary?

mattsse · 2022-10-17T07:29:29Z

crates/primitives/src/lib.rs

+// For uint to hash conversion
+pub use ethereum_types::BigEndianHash;


I see, will add this to primitives

gakonst/ethers-rs#1789

thanks ❤️‍🔥

@rkrasiuk this is done^

rkrasiuk · 2022-10-17T08:37:24Z

@mattsse currently, we download them in batches by hash, when we receive the batch - we change the cursor to the earliest header hash within that batch. I've been thinking about this yesterday as well, we could optimistically request the headers by block number, unless there is smth preventing us from doing that cc @gakonst

…rs-stage

mattsse · 2022-10-17T13:57:34Z

If I understand that correctly that it really depends on how we request them,
if we can request headers from concurrent connections, I think we should be able to request block ranges concurrently.

If the range is not full, then we need to send follow-up request though, right?

Worth thinking about and outlining how and where requests are sent to.

gakonst

nice progress!

gakonst · 2022-10-16T20:39:32Z

crates/stages/src/stages/headers/linear.rs

+        // Unwrap the latest stream message which will be either
+        // the msg with headers or timeout error


are we sure we want only the next stream message? could it be that there's >1 messages and we need to process them in a loop? or do we just leave that for the next time the stage will be executed in the loop? @onbjerg

Given we only send 1 request for a range of headers and we perform a check above on line 71 to match the response with the request we send I think it's ok if this is the direction we are going, I don't think we're going to end up in a place where we have more than 1 message waiting for us here

Actually, rethinking this: We might want to request a large range of blocks (let's say 100K just as an example), but in order to not use a lot of memory at a time it might make sense for the downloader to send smaller batches (e.g. 1K blocks or something like that). I guess we should accomodate that? In that case, we would get more than 1 message per request.

gakonst · 2022-10-17T14:16:17Z

crates/stages/src/stages/headers/linear.rs

+        // Iterate the headers in reverse
+        out.reserve_exact(headers.len());
+        let mut headers_rev = headers.into_iter().rev();
+        while let Some(parent) = headers_rev.next() {


can we add some docs explaining this? i also don't really like the loop appending to out, can we instead make this return the Headers and do out.extend_from_slice(&self.download_batch(..) or something

gakonst · 2022-10-17T14:16:59Z

crates/stages/src/stages/headers/stage.rs

+    /// Strategy for downloading the headers
+    pub downloader: Arc<dyn Downloader>,
+    /// Consensus client implementation
+    pub consensus: Arc<dyn Consensus>,
+    /// Downloader client implementation
+    pub client: Arc<dyn HeadersClient>,


I think these can be generics instead of dyns?

gakonst · 2022-10-17T14:20:53Z

crates/stages/src/stages/headers/linear.rs

+
+        let mut out = Vec::<HeaderLocked>::new();
+        loop {
+            match self.download_batch(head, tip, &mut stream, &mut out).await {


ideally this should be let headers = stream.download(head, tip) or something similar (pass the consensus by ref to Stream::download

…rs-stage

gakonst · 2022-10-19T22:22:39Z

crates/primitives/src/lib.rs

+// For uint to hash conversion
+pub use ethereum_types::BigEndianHash;


@rkrasiuk this is done^

crates/stages/Cargo.toml

crates/stages/src/stages/headers/downloader.rs

crates/stages/src/stages/headers/stage.rs

crates/stages/src/stages/headers/linear.rs

gakonst · 2022-10-19T23:35:43Z

crates/stages/src/stages/headers/linear.rs

@@ -0,0 +1,466 @@
+use super::downloader::{DownloadError, Downloader};


@mattsse PTAL this feels a bit convoluted but I could be wrong

* feat(interfaces): auto impl for ref/arc/box * feat(downloader): make consensus part of the downloader and a generic * impl generic for linear dl * impl generic for parallel dl * test(headers): make it work with generics * chore: rm dead code Co-authored-by: Roman Krasiuk <rokrassyuk@gmail.com>

rkrasiuk · 2022-10-24T13:35:55Z

closing in favor of #126

♻️ remove optimism config, replace with boolean flag

…net (paradigmxyz#58) * chore: fix system account issue and hertz storage patch issue on testnet * fix CI issues * fix review comments * fix CI issues

rkrasiuk added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Oct 13, 2022

rkrasiuk force-pushed the rkrasiuk/headers-stage branch from 6a03b56 to de1af72 Compare October 13, 2022 10:56

rkrasiuk changed the title ~~wip: feat(sync): headers stage scaffolding~~ wip: feat(sync): headers stage Oct 13, 2022

rkrasiuk force-pushed the rkrasiuk/headers-stage branch from de1af72 to 71dcd0c Compare October 13, 2022 12:05

headers stage scaffolding

07b9266

rkrasiuk force-pushed the rkrasiuk/headers-stage branch from 71dcd0c to 07b9266 Compare October 13, 2022 12:19

gakonst requested changes Oct 13, 2022

View reviewed changes

gakonst reviewed Oct 13, 2022

View reviewed changes

gakonst mentioned this pull request Oct 14, 2022

Tracking: P2P #64

Closed

23 tasks

rkrasiuk added 4 commits October 16, 2022 18:33

refactor to stream based approach, add some tests & docs

8284cc3

add consensus propagation test

23b85e1

extract downloading strategy

cef747c

Merge remote-tracking branch 'origin/main' into rkrasiuk/headers-stage

0da8504

mattsse reviewed Oct 17, 2022

View reviewed changes

rkrasiuk added 3 commits October 17, 2022 15:21

Merge branch 'main' of github.com:foundry-rs/reth into rkrasiuk/heade…

a3456a8

…rs-stage

cleanup tests and db encoding

b59f74f

comment

bd84305

gakonst reviewed Oct 17, 2022

View reviewed changes

rkrasiuk added 4 commits October 17, 2022 20:57

Merge branch 'main' of github.com:foundry-rs/reth into rkrasiuk/heade…

82b9ade

…rs-stage

fix validation err propagation test

d5fab60

add chain tip test & test runner

28a614f

stream attempt

7e39155

gakonst reviewed Oct 19, 2022

View reviewed changes

gakonst and others added 4 commits October 19, 2022 18:37

chore: replace boxed iterator with vector

b27f25c

replace dyn client with generic

930f5bd

Merge remote-tracking branch 'origin/main' into rkrasiuk/headers-stage

ad452a6

more tests & cleanup

015d5bf

This was referenced Oct 23, 2022

headers(part2) - feat: add Downloader trait and test utils #118

Merged

headers(part 3) feat: implement Linear downloader #119

Merged

rkrasiuk mentioned this pull request Oct 24, 2022

feat(sync): headers stage #126

Merged

3 tasks

rkrasiuk closed this Oct 24, 2022

gakonst deleted the rkrasiuk/headers-stage branch October 24, 2022 13:59

clabby added a commit to clabby/reth that referenced this pull request Aug 13, 2023

Merge pull request paradigmxyz#58 from anton-rs/nox/remove-opconf

365524b

♻️ remove optimism config, replace with boolean flag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: feat(sync): headers stage #58

wip: feat(sync): headers stage #58

rkrasiuk commented Oct 13, 2022 •

edited

Loading

gakonst left a comment

gakonst Oct 13, 2022

gakonst Oct 13, 2022

rkrasiuk Oct 16, 2022

gakonst Oct 13, 2022 •

edited

Loading

gakonst Oct 13, 2022

rkrasiuk Oct 16, 2022

gakonst Oct 18, 2022

rakita Oct 19, 2022

gakonst Oct 20, 2022 •

edited

Loading

mattsse left a comment

mattsse Oct 17, 2022

mattsse Oct 17, 2022

mattsse Oct 17, 2022

rkrasiuk Oct 17, 2022

gakonst Oct 19, 2022

rkrasiuk commented Oct 17, 2022

mattsse commented Oct 17, 2022

gakonst left a comment

gakonst Oct 16, 2022

onbjerg Oct 18, 2022

onbjerg Oct 18, 2022

gakonst Oct 17, 2022

gakonst Oct 17, 2022

gakonst Oct 17, 2022

gakonst Oct 19, 2022

gakonst Oct 19, 2022

rkrasiuk commented Oct 24, 2022

		#[error(transparent)]
		Internal(Box<dyn std::error::Error + Send + Sync>),

		// For uint to hash conversion
		pub use ethereum_types::BigEndianHash;

		// Unwrap the latest stream message which will be either
		// the msg with headers or timeout error

		@@ -0,0 +1,466 @@
		use super::downloader::{DownloadError, Downloader};

wip: feat(sync): headers stage #58

wip: feat(sync): headers stage #58

Conversation

rkrasiuk commented Oct 13, 2022 • edited Loading

gakonst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gakonst Oct 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gakonst Oct 20, 2022 • edited Loading

Choose a reason for hiding this comment

mattsse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkrasiuk commented Oct 17, 2022

mattsse commented Oct 17, 2022

gakonst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkrasiuk commented Oct 24, 2022

rkrasiuk commented Oct 13, 2022 •

edited

Loading

gakonst Oct 13, 2022 •

edited

Loading

gakonst Oct 20, 2022 •

edited

Loading