-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix duplicate confirmed rollup detection for descendants #34014
fix duplicate confirmed rollup detection for descendants #34014
Conversation
bank.hash(), | ||
)); | ||
} else if bank.is_frozen() | ||
&& tower.is_slot_duplicate_confirmed(*slot, voted_stakes, total_stake) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we were only calling mark_slots_confirmed
if we got tower.is_slot_confirmed
. this change allows us to notify the fsm as soon as we have tower.is_slot_duplicate_confirmed
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34014 +/- ##
=======================================
Coverage 81.8% 81.8%
=======================================
Files 822 822
Lines 221573 221623 +50
=======================================
+ Hits 181341 181416 +75
+ Misses 40232 40207 -25 |
69254ba
to
1b4a697
Compare
core/src/replay_stage.rs
Outdated
@@ -3877,7 +3925,25 @@ impl ReplayStage { | |||
if bank.is_frozen() && tower.is_slot_confirmed(*slot, voted_stakes, total_stake) { | |||
info!("validator fork confirmed {} {}ms", *slot, duration); | |||
datapoint_info!("validator-confirmation", ("duration_ms", duration, i64)); | |||
confirmed_forks.push((*slot, bank.hash())); | |||
confirmed_forks.push(ConfirmedSlot::new_optimistic_confirmed_slot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't know if we should call this optimistically confirmed because optimistic confirmation doesn't roll up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open to ideas about naming, but also I think we should just fix the plumbing and make it roll up while we're looking at this.
The plumbing rn seems to be a mess:
cluster_info_vote_listener
tracks gossip votes but also receives replay votes through crossbeam fromreplay_stage
cluster_info_vote_listener
tracks gossip votes for propagated check invote_tracker
replay_stage
also tracks replay votes invote_tracker
and rolls up infork_stats
incompute_bank_stats
cluster_info_vote_listener
also separately sends verified gossip votes toreplay_stage
through crossbeamreplay_stage
tracksDuplicateConfirmed
throughfork_stats
but also gets notified through crossbeam fromcluster_info_vote_listener
There's a lot of back and forth propagation of votes and stake info in multiple places which seems inefficient. My vote would be to just share 1 vote tracker/fork stats between cluster_info_vote_listener
and replay_stage
to avoid the extra work. This would also let us rollup replay votes and include gossip votes that would get us over the key threshold.
If we don't want to do a big refactor I think there's a simpler solution for rollup:
solana/ledger/src/blockstore_processor.rs
Lines 171 to 175 in d58db6e
bank_utils::find_and_send_votes( | |
batch.sanitized_transactions(), | |
&tx_results, | |
replay_vote_sender, | |
); |
When sending replayed votes from
replay_stage
to cluster_info_vote_listener
, send the slot hashes as well.
Then
solana/core/src/cluster_info_vote_listener.rs
Lines 677 to 680 in d58db6e
// We cannot count any other slots in this vote toward optimistic confirmation because: | |
// 1) There may have been a switch between the earlier vote and the last vote | |
// 2) We do not know the hash of the earlier slot | |
if slot == last_vote_slot { |
If we have
!is_gossip
we don't have to filter only on the last slot here, we can track the whole tower since we have the accompanying hash in slot hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify this^ is out of scope for this pr and should not be a blocker for merging. we can discuss in #34279.
I agree that optimistically confirmed is an overloaded name and am happy to rename if you have a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimistically confirmed actually can't be safely rolled up, it has to be all votes on the same slot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just called it supermajority_voted
?
core/src/replay_stage.rs
Outdated
if *slot <= root_slot { | ||
continue; | ||
} else if let Some(prev_hash) = duplicate_confirmed_slots.insert(*slot, *frozen_hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good here to add a check for the ConfirmationType::DuplicateConfirmed
in case we add any other cases that accidentally fall into this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want ConfirmationType::OptimisticallyConfirmed
to fall through here, but if you prefer I can make it explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think explicit check would be good
if *slot <= root_slot { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a deviation from current behavior i.e. some supermajority confirmed slots less than the root may no longer be added as duplicate confirmed and passed to the state machine. Seems ok since whatever the latest root slot was should have notified the state machine that all the ancestors are confirmed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah exactly, since we have the threshold check we must have already processed this signal. But also probably not a huge deal if we remove this check.
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nits
2fab75a
to
be77751
Compare
Problem
State machine is only notified of duplicate confirmed rollup when optimistically confirmed
Summary of Changes
Make the check granular, notify when >52%
dedup requests against duplicate confirmed slots already sent to the fsm
Fixes #