-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stages): recover senders for both DB and static file transactions #6089
feat(stages): recover senders for both DB and static file transactions #6089
Conversation
Okay, it performs worse... I need to return the raw query functionality ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
Time (mean ± σ): 48.451 s ± 0.335 s [User: 1696.042 s, System: 6.879 s]
Range (min … max): 47.847 s … 48.946 s 10 runs
ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_6089 stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_6089 stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
Time (mean ± σ): 64.176 s ± 0.451 s [User: 1700.714 s, System: 15.671 s]
Range (min … max): 63.374 s … 64.920 s 10 runs |
ed39c9a
to
e9e217d
Compare
Raw query looks good ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
Time (mean ± σ): 48.451 s ± 0.335 s [User: 1696.042 s, System: 6.879 s]
Range (min … max): 47.847 s … 48.946 s 10 runs
ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_6089_raw stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_6089_raw stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
Time (mean ± σ): 52.559 s ± 0.146 s [User: 1762.202 s, System: 8.185 s]
Range (min … max): 52.248 s … 52.708 s 10 runs |
7fd3598
to
f394358
Compare
f394358
to
8394742
Compare
This reverts commit e9e217d.
8394742
to
019d1bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
setup::unwind_hashes, | ||
stage, | ||
1..DEFAULT_NUM_BLOCKS, | ||
1..=DEFAULT_NUM_BLOCKS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this a bug before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or well "bug", i guess this is in a benchmark, so doesn't rly matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the behavior didn't change: even though we were passing an exclusive range, we constructed StageRange
struct with range.start
and range.end
…ender-recovery-snapshot
7d6bb56
to
68c5296
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you don't constraint generics before it's necessary, it improves readability of the code. it also makes the code more flexible, and therefore easier to test since generic types without constraints can easily be mocked.
in some places the code can panic by calling expect
. in these situations, are you sure you want the node to stop execution? I'd suggest using debug_assert
and dumping as much state as could be helpful to debug to log in the message. instead in run mode, consider if it's better to conditionally continue execution by if let Some(..) ..
and if let Ok(..) ..
. in some cases it may indeed be best for the node to stop.
let (tx_id, transaction) = | ||
entry.map_err(|e| Box::new(SenderRecoveryStageError::StageError(e.into())))?; | ||
let tx_id = tx_id.key().expect("key to be formated"); | ||
let tx = tx.value().expect("value to be formated"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example here, could it be possible that one tx value is not formatted but other tx values are? in that case, it's better not to panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, we expect the values both in the database and static files always be decodable, so if it fails, it's a critical unrecoverable error and something went wrong: either user interfered with the database, or we have a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, in case of a bug, would be nice to get more context, same advice you gave me last week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, here, and in tracing, it's nice if the variable names match the log message keys exactly. makes it easier to debug. so,
let my_var = 1;
debug!(target: "..",
my_var=my_var,
""
);
…ender-recovery-snapshot
Okay there's some sort of a bug with ETA, probably related to 2024-01-29T19:03:16.582026Z INFO reth_node_core::events: Stage finished executing pipeline_stages=4/13 stage=SenderRecovery checkpoint=836683 target=836683 stage_progress=100.00% stage_eta=4341721years 8months 26days 23h 27m 58s |
ETA bug is unrelated to these changes, the progress is calculated correctly |
With this PR, when recovering senders in the
SenderRecovery
pipeline stage, we will look into both database tableTransactions
and static files forTransactions
segment. It becomes relevant when #5733 is merged.One thing to note with the current implementation is that we no longer retrieve
RawKey
andRawValue
from the database when querying transactions for sender recovery, but instead decode them before passing the transactions to threads for recovery. The bench showed there's no significant difference in performance between the two approaches.Holesky benchmark (
main
vs this PR):In addition to that, we need to keep in mind that now transactions, headers, and receipts can be in static files, so when calculating the stage checkpoint, we need to look in both database and static files. I introduced a trait
StatsReader
for that, and replaced all calls totx.entries::<T>()
with it.