Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ETL and use it on HeaderStage #6154

Merged
merged 39 commits into from
Jan 26, 2024
Merged

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Jan 22, 2024

PR into feat/static-files.

Built on top of cherry-picked #5812 from @onbjerg . This is a stepping stone into pushing Headers into static files.

This PR adds:

  • reth-etl : temporary file space where we push unsorted data to disk, that can be processed later on in a sorted manner. s/o to https://github.com/akula-bft/akula from where the implementation was inspired from.
  • Refactor HeaderStage into using ETL.
    • This is necessary for static files, since they can only append sorted data. However HeaderStage actually receives headers in a reverse order. ETL allows us to store all the missing headers on disk, and later on iterate on them in ascending order.
    • This also allows us to remove the TotalDifficultyStage since now we iterate the headers in ascending order, and can easily calculate the total difficulty as we append to other tables.
    • HeaderHash -> BlockNumber table can be populated with tx.append on first sync. (similar on how TxLookup is going to look like when we use ETL there.)
    • This means that there is no scenario where checkpoint.done = false. We always download all headers to fill the gap into the ETL collector before executing the stage. Since ETL uses temporary files, if there is a shutdown all this data is lost and the stage will have to execute from scratch as if nothing had happened.

@joshieDo joshieDo added the A-staged-sync Related to staged sync (pipelines and stages) label Jan 22, 2024
@joshieDo joshieDo marked this pull request as ready for review January 23, 2024 16:41
crates/primitives/src/header.rs Outdated Show resolved Hide resolved
"Buffer is not empty"
);
// Return if stage has already completed the gap
if self.is_etl_ready {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this check the same as if gap.is_closed()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added additional context on the doc but no:

  • gap checks the DB to see if the gap is closed
  • is_etl_ready is true only when the ETL has all headers that will be inserted onto the DB

crates/stages/src/stages/headers.rs Outdated Show resolved Hide resolved
crates/stages/src/stages/headers.rs Outdated Show resolved Hide resolved
// iterate them in the reverse order
for header in headers.into_iter().rev() {
) -> Result<BlockNumber, StageError> {
trace!(target: "sync::stages::headers", len = self.header_collector.len(), "writing headers");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need some kind of info logs here, because we only write them during the download phase, and being silent during the ETL phase

crates/stages/src/sets.rs Outdated Show resolved Hide resolved
crates/stages/src/sets.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

crates/consensus/beacon/src/engine/sync.rs Outdated Show resolved Hide resolved
crates/consensus/beacon/src/engine/test_utils.rs Outdated Show resolved Hide resolved
crates/consensus/beacon/src/engine/test_utils.rs Outdated Show resolved Hide resolved
crates/stages/src/lib.rs Outdated Show resolved Hide resolved
crates/stages/src/sets.rs Outdated Show resolved Hide resolved
crates/stages/src/stages/headers.rs Outdated Show resolved Hide resolved
crates/stages/src/stages/headers.rs Outdated Show resolved Hide resolved
crates/stages/src/stages/headers.rs Outdated Show resolved Hide resolved
Comment on lines +163 to +171
// If we only have the genesis block hash, then we are at first sync, and we can remove it,
// add it to the collector and use tx.append on all hashes.
if let Some((hash, block_number)) = cursor_header_numbers.last()? {
if block_number.value()? == 0 {
self.hash_collector.insert(hash.key()?, 0);
cursor_header_numbers.delete_current()?;
first_sync = true;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove it if we immediately re-insert it?

Copy link
Collaborator Author

@joshieDo joshieDo Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the hash "sort position" is in the middle of all sorted hashes it forces to tx.insert half of the hashes. This way we can use tx.append on all

crates/stages/src/stages/headers.rs Outdated Show resolved Hide resolved
Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one comment, think @shekhirin got it all

@@ -143,7 +143,7 @@ fn should_use_alt_impl(ftype: &String, segment: &syn::PathSegment) -> bool {
if let (Some(path), 1) =
(arg_path.path.segments.first(), arg_path.path.segments.len())
{
if ["B256", "Address", "Address", "Bloom", "TxHash"]
if ["B256", "Address", "Address", "Bloom", "TxHash", "BlockHash"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this breaking/will it change the codec?

Copy link
Collaborator Author

@joshieDo joshieDo Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is actually the first type where use BlockHash with Compact (I'm sure we use B256 in some)

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few more nits, otherwise lgtm

crates/stages/src/stages/total_difficulty.rs Outdated Show resolved Hide resolved
crates/etl/src/lib.rs Outdated Show resolved Hide resolved
crates/etl/src/lib.rs Show resolved Hide resolved
crates/etl/src/lib.rs Show resolved Hide resolved
crates/etl/src/lib.rs Show resolved Hide resolved
@joshieDo joshieDo merged commit bd26c69 into feat/static-files Jan 26, 2024
24 checks passed
@joshieDo joshieDo deleted the feat/etl-headers branch January 26, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants