Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

validator: remove optional remote accounts hash consistency check #31279

Merged
merged 1 commit into from
May 16, 2023

Conversation

t-nelson
Copy link
Contributor

Problem

old debug code lying around doing old debug code things

Summary of Changes

remove it

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Merging #31279 (02e6ee2) into master (7a393e4) will increase coverage by 0.0%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #31279   +/-   ##
=======================================
  Coverage    81.5%    81.5%           
=======================================
  Files         733      733           
  Lines      207009   206941   -68     
=======================================
+ Hits       168731   168735    +4     
+ Misses      38278    38206   -72     

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with removing the --halt-on-trusted-validators-accounts-hash-mismatch CLI flag.

Without the flag, if a node calculates the accounts hash incorrectly, it'll find out later (basically whenever the account is next used, which has a maximum of rent collection duration). Maybe this is fine? I'm guessing most validators do not have this flag set anyway, so there's no change of behavior for them.

When the Epoch Accounts Hash feature is enabled, then the accounts hash will be part of consensus directly, since it'll be part of the bank hash once per epoch. That'll be the proper way to ensure safety for the whole cluster.

One interesting possibility is w.r.t. snapshot download in bootstrap. If a known validator calculates the accounts hash wrong due to a disk issue and an accounts storage file is bad, then it would be possible for a new validator to download this bad snapshot with the bad account. Again, it'll find out once that account is accessed next.

@HaoranYi, requesting your review here too, since you've recently been interacting with this code. Specifically around the accounts_hash_fault_injector. Do you rely on --halt-on-trusted-validators-accounts-hash-mismatch for any testing? If not, can we also remove ``accounts_hash_fault_injector`? (that would be for a different PR)

@brooksprumo brooksprumo requested a review from HaoranYi April 20, 2023 12:14
@HaoranYi
Copy link
Contributor

No, we don't rely on this cli argument for the fault injection test.

Copy link
Contributor

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Brook's insight about accounts hashes becoming part of consensus, that makes me feel better about ripping this check out altogether.

I thought about whether keeping a warning in place would be useful, but I don't think it would be.

  • Suppose a node N is running with this flag for a set of known validators {K1, K2, ..., Kn}
  • If one of the known validators Ki deviates, N would get a warning.
  • But, N's operator can't do anything to fix Ki directly, so seemingly not super helpful

Comment on lines 1607 to 1608
if matches.is_present("halt_on_known_validators_accounts_hash_mismatch") {
validator_config.halt_on_known_validators_accounts_hash_mismatch = true;
warn!("the `--halt-on-known-validators-accounts-hash-mismatch` argument is deprecated. please remove it from the command line");
Copy link
Contributor

@steviez steviez Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout this struct and following for deprecated args. It

  • Allows for consistent warning messages across deprecated args
  • Gets deprecated arg handling out of the way of actual logic

But unfortunately, not immediately obvious to move stuff there unless you're already aware of it.

struct DeprecatedArg {

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label May 5, 2023
@github-actions github-actions bot closed this May 15, 2023
@steviez steviez reopened this May 15, 2023
@steviez
Copy link
Contributor

steviez commented May 15, 2023

I think we still want this - I just had one minor request and looks like things need a merge resolution now

@t-nelson t-nelson removed the stale [bot only] Added to stale content; results in auto-close after a week. label May 16, 2023
steviez
steviez previously approved these changes May 16, 2023
Copy link
Contributor

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Looking at this again after a few weeks, I still think ripping this out is the right move. If this flag had wide adoption, a single node experiencing a bug or fault could cause a domino effect.

Additionally, we can't know if the node that we're checking against or we deviated on a slot, yet, this code makes only our node panic. Hypothetically, we could do some sampling of N nodes, but to make this robust we're basically trying to implement a stripped down consensus. Better to kill this altogether and let the feature Brooks previously mentioned (accounts hash becoming part of consensus) take effect.

@t-nelson
Copy link
Contributor Author

Screenshot from 2023-05-16 13-20-00

no idea why we're wasting ci/dev resources to keep this list sorted...

@t-nelson t-nelson merged commit ad67fd5 into solana-labs:master May 16, 2023
@t-nelson t-nelson deleted the avhp branch May 16, 2023 20:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants