feat: add `ETL` to `TxLookup` stage #6655

joshieDo · 2024-02-18T15:28:31Z

built on top of #6654

Changes

Adds etl to TxLookup stage since inserting hashing hashes vs appending is very, very expensive..

Benches

maxperf holesky @ block 925555:

PR_6654 without etl: 47s-50s,
PR_6654 with etl: 13s
main 56 - 59s

maxperf mainnet @ tip:

etl @ tip: 22min
main @ 23rd Nov 2023: 5h

Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>

crates/stages/src/stages/tx_lookup.rs

gakonst · 2024-02-19T07:39:45Z

crates/stages/src/stages/tx_lookup.rs

+                        txhash_cursor.append(
+                            RawKey::<TxHash>::from_vec(hash),
+                            RawValue::<TxNumber>::from_vec(number),
+                        )?;
+                    } else {
+                        txhash_cursor.insert(
+                            RawKey::<TxHash>::from_vec(hash),
+                            RawValue::<TxNumber>::from_vec(number),


nice wonder how much usage of RawKey can impact perf, are we missing on any compressions due to its usage?

gakonst · 2024-02-19T07:40:13Z

crates/stages/src/stages/tx_lookup.rs

+                let total_hashes = hash_collector.len();
+                let interval = (total_hashes / 10).max(1);
+                for (index, hash_to_number) in hash_collector.iter()?.enumerate() {


The ETL API is remarkably clean, loop, insert in collector loop, when end of range iterate over the collector and write to DB. We should document this general pattern.

onbjerg · 2024-02-19T11:31:52Z

crates/stages/src/stages/tx_lookup.rs

+                        info!(target: "sync::stages::transaction_lookup", ?append_only, progress = %format!("{:.2}%", (index as f64 / total_hashes as f64) * 100.0), "Writing transaction hash index");
+                    }
+
+                    if append_only {


does this do anything meaningful anymore?

yes, first run will be append_only, subsequent runs won't be, although they will still benefit from ETL when dealing with large sets

crates/stages/src/stages/tx_lookup.rs

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>

rkrasiuk · 2024-02-21T15:09:24Z

crates/stages/src/stages/tx_lookup.rs

+        let mut hash_collector: Collector<TxHash, TxNumber> =
+            Collector::new(Arc::new(TempDir::new()?), 500 * (1024 * 1024));


any reason to make buffer capacity configurable here?

related to #6696, but should make things slightly faster on bigger sized chunks

crates/stages/src/stages/tx_lookup.rs

mattsse · 2024-02-22T11:33:44Z

crates/etl/src/lib.rs

@@ -99,7 +100,7 @@ where

    fn flush(&mut self) {
        self.buffer_size_bytes = 0;
-        self.buffer.sort_unstable_by(|a, b| a.0.cmp(&b.0));
+        self.buffer.par_sort_unstable_by(|a, b| a.0.cmp(&b.0));


how large do we expect this to be?

we might not benefit from par sort here if not large, can do a length check and use par_sort if large if small buffers are possible

given its only used on the pipeline, i'd say pretty large. Max at 100MB for headers, and 500MB worth for txlookup

shekhirin · 2024-02-22T17:33:20Z

crates/stages/src/stages/tx_lookup.rs

    }
 }

 impl TransactionLookupStage {
    /// Create new instance of [TransactionLookupStage].
-    pub fn new(commit_threshold: u64, prune_mode: Option<PruneMode>) -> Self {
-        Self { commit_threshold, prune_mode }
+    pub fn new(chunk_size: u64, prune_mode: Option<PruneMode>) -> Self {


config and documentation need to be changed to reflect this

joshieDo added 11 commits February 17, 2024 12:18

SnapshotProvider becomes wrapper to Arc<SnapshotProviderInner>

065303a

make transaction_hashes_by_range on static files parallel

8002efd

adjust sender recovery stage

da1d5d1

chunk_size for parallel work on recovery and hashing is 100

fe6b377

remove testing code

30fbfe9

clippy

cdaf758

add timing to stage run cli

7f73631

clippy

7c1116f

fix recovery tests

44a4346

further fixes

28086cf

add etl to tx-lookup

68fbd1c

joshieDo added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) A-static-files Related to static files labels Feb 18, 2024

joshieDo requested review from rakita, onbjerg, rkrasiuk, shekhirin and gakonst as code owners February 18, 2024 15:28

joshieDo changed the base branch from feat/static-files to joshie/concurrent-range-fetch February 18, 2024 15:28

joshieDo and others added 4 commits February 18, 2024 15:30

clippy

b72d8f6

Update bin/reth/src/commands/stage/run.rs

5933fff

Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>

Update bin/reth/src/commands/stage/run.rs

ddaf070

Co-authored-by: Matthias Seitz <matthias.seitz@outlook.de>

call cursor directly

eb690ed

gakonst reviewed Feb 19, 2024

View reviewed changes

onbjerg reviewed Feb 19, 2024

View reviewed changes

joshieDo added 2 commits February 20, 2024 12:10

change insert_blocks so it can insert to static and db

1e762df

Merge branch 'joshie/concurrent-range-fetch' into joshie/txlookup-etl

356a796

shekhirin reviewed Feb 20, 2024

View reviewed changes

crates/stages/src/stages/tx_lookup.rs Outdated Show resolved Hide resolved

crates/stages/src/stages/tx_lookup.rs Show resolved Hide resolved

Update bin/reth/src/commands/stage/run.rs

9c50ed3

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>

joshieDo added 7 commits February 20, 2024 16:15

add StorageKind to more tests

d237c47

fmt

5777768

Merge branch 'joshie/concurrent-range-fetch' into joshie/txlookup-etl

84cbfe3

fix is_static

475b974

Merge branch 'joshie/concurrent-range-fetch' into joshie/txlookup-etl

8cf077c

fix txlookup unwind

b623a23

Merge branch 'joshie/concurrent-range-fetch' into joshie/txlookup-etl

3edac90

Base automatically changed from joshie/concurrent-range-fetch to feat/static-files February 20, 2024 18:47

joshieDo requested review from mattsse, Rjected and emhane as code owners February 20, 2024 18:47

joshieDo added 3 commits February 20, 2024 18:49

Merge branch 'feat/static-files' into joshie/txlookup-etl

256278d

remove intermediate commit test since it's no longer possible

916f9a7

clippy

0635507

rkrasiuk reviewed Feb 21, 2024

View reviewed changes

rename commit_threshold to chunk_size

8139630

joshieDo requested review from shekhirin and rkrasiuk February 22, 2024 11:14

mattsse reviewed Feb 22, 2024

View reviewed changes

joshieDo added 3 commits February 22, 2024 11:59

Merge branch 'feat/static-files' into joshie/txlookup-etl

cb2e1cc

Merge branch 'feat/static-files' into joshie/txlookup-etl

bc4b921

Merge branch 'feat/static-files' into joshie/txlookup-etl

f40565c

shekhirin reviewed Feb 22, 2024

View reviewed changes

joshieDo added 2 commits February 22, 2024 17:44

update config and docs field name to chunk_size

a9c09c7

missing chunk_size change

5cca8f6

shekhirin approved these changes Feb 22, 2024

View reviewed changes

mattsse approved these changes Feb 22, 2024

View reviewed changes

joshieDo merged commit aeaabfb into feat/static-files Feb 22, 2024
23 of 25 checks passed

joshieDo deleted the joshie/txlookup-etl branch February 22, 2024 18:30

joshieDo mentioned this pull request Mar 15, 2024

Performance Degradation on LUKS+btrfs Filesystem #6894

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `ETL` to `TxLookup` stage #6655

feat: add `ETL` to `TxLookup` stage #6655

joshieDo commented Feb 18, 2024 •

edited

Loading

gakonst Feb 19, 2024

gakonst Feb 19, 2024

onbjerg Feb 19, 2024

joshieDo Feb 19, 2024 •

edited

Loading

rkrasiuk Feb 21, 2024

joshieDo Feb 21, 2024

mattsse Feb 22, 2024 •

edited

Loading

joshieDo Feb 22, 2024

shekhirin Feb 22, 2024

		let mut hash_collector: Collector<TxHash, TxNumber> =
		Collector::new(Arc::new(TempDir::new()?), 500 * (1024 * 1024));

feat: add ETL to TxLookup stage #6655

feat: add ETL to TxLookup stage #6655

Conversation

joshieDo commented Feb 18, 2024 • edited Loading

Changes

Benches

gakonst Feb 19, 2024

Choose a reason for hiding this comment

gakonst Feb 19, 2024

Choose a reason for hiding this comment

onbjerg Feb 19, 2024

Choose a reason for hiding this comment

joshieDo Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

rkrasiuk Feb 21, 2024

Choose a reason for hiding this comment

joshieDo Feb 21, 2024

Choose a reason for hiding this comment

mattsse Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

joshieDo Feb 22, 2024

Choose a reason for hiding this comment

shekhirin Feb 22, 2024

Choose a reason for hiding this comment

feat: add `ETL` to `TxLookup` stage #6655

feat: add `ETL` to `TxLookup` stage #6655

joshieDo commented Feb 18, 2024 •

edited

Loading

joshieDo Feb 19, 2024 •

edited

Loading

mattsse Feb 22, 2024 •

edited

Loading