-
Notifications
You must be signed in to change notification settings - Fork 4.6k
validator: remove optional remote accounts hash consistency check #31279
Conversation
Codecov Report
@@ Coverage Diff @@
## master #31279 +/- ##
=======================================
Coverage 81.5% 81.5%
=======================================
Files 733 733
Lines 207009 206941 -68
=======================================
+ Hits 168731 168735 +4
+ Misses 38278 38206 -72 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with removing the --halt-on-trusted-validators-accounts-hash-mismatch
CLI flag.
Without the flag, if a node calculates the accounts hash incorrectly, it'll find out later (basically whenever the account is next used, which has a maximum of rent collection duration). Maybe this is fine? I'm guessing most validators do not have this flag set anyway, so there's no change of behavior for them.
When the Epoch Accounts Hash feature is enabled, then the accounts hash will be part of consensus directly, since it'll be part of the bank hash once per epoch. That'll be the proper way to ensure safety for the whole cluster.
One interesting possibility is w.r.t. snapshot download in bootstrap. If a known validator calculates the accounts hash wrong due to a disk issue and an accounts storage file is bad, then it would be possible for a new validator to download this bad snapshot with the bad account. Again, it'll find out once that account is accessed next.
@HaoranYi, requesting your review here too, since you've recently been interacting with this code. Specifically around the accounts_hash_fault_injector
. Do you rely on --halt-on-trusted-validators-accounts-hash-mismatch
for any testing? If not, can we also remove ``accounts_hash_fault_injector`? (that would be for a different PR)
No, we don't rely on this cli argument for the fault injection test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given Brook's insight about accounts hashes becoming part of consensus, that makes me feel better about ripping this check out altogether.
I thought about whether keeping a warning in place would be useful, but I don't think it would be.
- Suppose a node
N
is running with this flag for a set of known validators{K1, K2, ..., Kn}
- If one of the known validators
Ki
deviates,N
would get a warning. - But,
N
's operator can't do anything to fixKi
directly, so seemingly not super helpful
validator/src/main.rs
Outdated
if matches.is_present("halt_on_known_validators_accounts_hash_mismatch") { | ||
validator_config.halt_on_known_validators_accounts_hash_mismatch = true; | ||
warn!("the `--halt-on-known-validators-accounts-hash-mismatch` argument is deprecated. please remove it from the command line"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkout this struct and following for deprecated args. It
- Allows for consistent warning messages across deprecated args
- Gets deprecated arg handling out of the way of actual logic
But unfortunately, not immediately obvious to move stuff there unless you're already aware of it.
Line 1636 in 04bbf3b
struct DeprecatedArg { |
I think we still want this - I just had one minor request and looks like things need a merge resolution now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Looking at this again after a few weeks, I still think ripping this out is the right move. If this flag had wide adoption, a single node experiencing a bug or fault could cause a domino effect.
Additionally, we can't know if the node that we're checking against or we deviated on a slot, yet, this code makes only our node panic. Hypothetically, we could do some sampling of N
nodes, but to make this robust we're basically trying to implement a stripped down consensus. Better to kill this altogether and let the feature Brooks previously mentioned (accounts hash becoming part of consensus) take effect.
Problem
old debug code lying around doing old debug code things
Summary of Changes
remove it