-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ETL and use it on HeaderStage
#6154
Conversation
"Buffer is not empty" | ||
); | ||
// Return if stage has already completed the gap | ||
if self.is_etl_ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this check the same as if gap.is_closed()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added additional context on the doc but no:
gap
checks the DB to see if the gap is closedis_etl_ready
is true only when the ETL has all headers that will be inserted onto the DB
crates/stages/src/stages/headers.rs
Outdated
// iterate them in the reverse order | ||
for header in headers.into_iter().rev() { | ||
) -> Result<BlockNumber, StageError> { | ||
trace!(target: "sync::stages::headers", len = self.header_collector.len(), "writing headers"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need some kind of info logs here, because we only write them during the download phase, and being silent during the ETL phase
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
// If we only have the genesis block hash, then we are at first sync, and we can remove it, | ||
// add it to the collector and use tx.append on all hashes. | ||
if let Some((hash, block_number)) = cursor_header_numbers.last()? { | ||
if block_number.value()? == 0 { | ||
self.hash_collector.insert(hash.key()?, 0); | ||
cursor_header_numbers.delete_current()?; | ||
first_sync = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove it if we immediately re-insert it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the hash "sort position" is in the middle of all sorted hashes it forces to tx.insert
half of the hashes. This way we can use tx.append
on all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one comment, think @shekhirin got it all
@@ -143,7 +143,7 @@ fn should_use_alt_impl(ftype: &String, segment: &syn::PathSegment) -> bool { | |||
if let (Some(path), 1) = | |||
(arg_path.path.segments.first(), arg_path.path.segments.len()) | |||
{ | |||
if ["B256", "Address", "Address", "Bloom", "TxHash"] | |||
if ["B256", "Address", "Address", "Bloom", "TxHash", "BlockHash"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this breaking/will it change the codec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this is actually the first type where use BlockHash
with Compact
(I'm sure we use B256 in some)
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
This reverts commit e218e52.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few more nits, otherwise lgtm
PR into
feat/static-files
.Built on top of cherry-picked #5812 from @onbjerg . This is a stepping stone into pushing Headers into static files.
This PR adds:
reth-etl
: temporary file space where we push unsorted data to disk, that can be processed later on in a sorted manner. s/o to https://github.com/akula-bft/akula from where the implementation was inspired from.HeaderStage
into using ETL.HeaderStage
actually receives headers in a reverse order. ETL allows us to store all the missing headers on disk, and later on iterate on them in ascending order.TotalDifficultyStage
since now we iterate the headers in ascending order, and can easily calculate the total difficulty as we append to other tables.HeaderHash -> BlockNumber
table can be populated withtx.append
on first sync. (similar on howTxLookup
is going to look like when we use ETL there.)checkpoint.done = false
. We always download all headers to fill the gap into the ETL collector before executing the stage. Since ETL uses temporary files, if there is a shutdown all this data is lost and the stage will have to execute from scratch as if nothing had happened.