-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ETL
to TxLookup
stage
#6655
Conversation
Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>
Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>
txhash_cursor.append( | ||
RawKey::<TxHash>::from_vec(hash), | ||
RawValue::<TxNumber>::from_vec(number), | ||
)?; | ||
} else { | ||
txhash_cursor.insert( | ||
RawKey::<TxHash>::from_vec(hash), | ||
RawValue::<TxNumber>::from_vec(number), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice wonder how much usage of RawKey can impact perf, are we missing on any compressions due to its usage?
let total_hashes = hash_collector.len(); | ||
let interval = (total_hashes / 10).max(1); | ||
for (index, hash_to_number) in hash_collector.iter()?.enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ETL API is remarkably clean, loop, insert in collector loop, when end of range iterate over the collector and write to DB. We should document this general pattern.
info!(target: "sync::stages::transaction_lookup", ?append_only, progress = %format!("{:.2}%", (index as f64 / total_hashes as f64) * 100.0), "Writing transaction hash index"); | ||
} | ||
|
||
if append_only { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this do anything meaningful anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, first run will be append_only
, subsequent runs won't be, although they will still benefit from ETL when dealing with large sets
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
let mut hash_collector: Collector<TxHash, TxNumber> = | ||
Collector::new(Arc::new(TempDir::new()?), 500 * (1024 * 1024)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to make buffer capacity configurable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to #6696, but should make things slightly faster on bigger sized chunks
@@ -99,7 +100,7 @@ where | |||
|
|||
fn flush(&mut self) { | |||
self.buffer_size_bytes = 0; | |||
self.buffer.sort_unstable_by(|a, b| a.0.cmp(&b.0)); | |||
self.buffer.par_sort_unstable_by(|a, b| a.0.cmp(&b.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how large do we expect this to be?
we might not benefit from par sort here if not large, can do a length check and use par_sort if large if small buffers are possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given its only used on the pipeline, i'd say pretty large. Max at 100MB for headers, and 500MB worth for txlookup
} | ||
} | ||
|
||
impl TransactionLookupStage { | ||
/// Create new instance of [TransactionLookupStage]. | ||
pub fn new(commit_threshold: u64, prune_mode: Option<PruneMode>) -> Self { | ||
Self { commit_threshold, prune_mode } | ||
pub fn new(chunk_size: u64, prune_mode: Option<PruneMode>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config and documentation need to be changed to reflect this
built on top of #6654
Changes
Adds
etl
to TxLookup stage since inserting hashing hashes vs appending is very, very expensive..Benches
maxperf holesky @ block 925555:
main
56 - 59smaxperf mainnet @ tip: