Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4R: Ensure Tendermint Validator Update Invariants - Part II #2298

Merged

Conversation

alexanderbez
Copy link
Contributor

@alexanderbez alexanderbez commented Sep 10, 2018

During the debugging and implementation of #2238, another case of invalid Tendermint updates arose but with a different cause. I narrowed it down to what I believe was the trace of events. Can be generalized as such:

Given ValidatorA,

  • Height 1: Bonded/enters validator set (bondHeight: 1, unbondingHeight: 0)
  • Height 10: Unbonded for whatever reason (bondHeight: 1, unbondingHeight: 10)
  • Height 11: Receives a delegation, however, not enough to enter validator set. Note, due to the way Keeper#UpdateValidator is implemented, the bond height and intra counter will be updated (see: here). (bondHeight: 11, unbondingHeight: 10)
  • Height 15: Due to some series of txs in the same block, ValidatorA bonds and unbonds causing an update to the Tendermint updates store (unbond). However, because it's bondHeight is 10, Keeper#GetValidTendermintUpdates believes it was previously bonded and as such sends an update to remove ValidatorA (unbonds/power zero). However, ValidatorA was never in the bonded set to begin with (removed at height 10).

tl;dr when a previously unbonded validator goes from unbonded, to bonded, back to unbonded in the same block as a result of some other change (e.g. cliff validator losing enough power and then gaining it back), the previously unbonded validator never has it's bond height updated.

Not sure if implemented solution is the correct/best one, but I simply set the bondHeight in bondValidator. Also, always setting the bondHeight for an unbonded validator in Keeper#UpdateValidator is a bit confusing to me. Why do we do this?

closes: #2277
closes by proxy: #2189
ref: #2238, #2189


  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added entries in PENDING.md with issue #

  • rereviewed Files changed in the github PR explorer


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@codecov
Copy link

codecov bot commented Sep 10, 2018

Codecov Report

❗ No coverage uploaded for pull request base (bez/2189-del-non-exist-val-bug@ebb2b2e). Click here to learn what that means.
The diff coverage is 50%.

@@                        Coverage Diff                        @@
##             bez/2189-del-non-exist-val-bug    #2298   +/-   ##
=================================================================
  Coverage                                  ?   63.75%           
=================================================================
  Files                                     ?      140           
  Lines                                     ?     8652           
  Branches                                  ?        0           
=================================================================
  Hits                                      ?     5516           
  Misses                                    ?     2756           
  Partials                                  ?      380

@alexanderbez alexanderbez changed the title WIP: Ensure Tendermint Validator Update Invariants - Part II R4R: Ensure Tendermint Validator Update Invariants - Part II Sep 10, 2018
x/stake/keeper/validator.go Outdated Show resolved Hide resolved
@@ -482,10 +481,15 @@ func (k Keeper) UpdateBondedValidators(

// if we've reached jailed validators no further bonded validators exist
if validator.Jailed {
if validator.Status == sdk.Bonded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional defensive coding 👍

@@ -494,13 +498,6 @@ func (k Keeper) UpdateBondedValidators(
newValidatorBonded = true
}

// increment the total number of bonded validators and potentially mark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we delete these lines?

Copy link
Contributor Author

@alexanderbez alexanderbez Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if validator.Status != sdk.Bonded { // ... } was done twice and the internal logic was essentially duplicated. I put it into a single conditional/block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

x/stake/keeper/validator.go Show resolved Hide resolved
@@ -683,6 +679,8 @@ func (k Keeper) bondValidator(ctx sdk.Context, validator types.Validator) types.
panic(fmt.Sprintf("should not already be bonded, validator: %v\n", validator))
}

validator.BondHeight = ctx.BlockHeight()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is right. The original intent was to copy the bondHeight from the old validator in bondIncrement if already bonded, else set it to the current block height. Can we figure out why that isn't happening first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwgoes the bond height never gets set for a validator that enters the set (temporarily) as a result of another leaving...at least in the case I saw.

Copy link
Contributor

@cwgoes cwgoes Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the bug then, makes sense.

We just need to ensure that this doesn't break the existing logic - which it might not, but it does overlap I think.

@cwgoes
Copy link
Contributor

cwgoes commented Sep 10, 2018

Using BondHeight to determine whether a validator was previously bonded definitely won't work if we don't update the bond height when the validator is unbonded, bonded, then unbonded again. We should have done that, though, in bondIncrement, since the old validator wouldn't have been bonded. Let's see why that didn't happen.

@alexanderbez
Copy link
Contributor Author

@cwgoes I'll address most of your comments shortly, but maybe we should sync up -- perhaps I'm missing something.

@cwgoes
Copy link
Contributor

cwgoes commented Sep 11, 2018

@cwgoes I'll address most of your comments shortly, but maybe we should sync up -- perhaps I'm missing something.

I'm not sure (replied re: sync up on Slack) - this fix might be OK but let's definitely read over the logic again, and I think we're at minimum duplicating a case that previously worked (in addition to the case that didn't).

Setting the bond height in bondValidator seems like the right approach - at least to me - I would suggest we get rid of the previous logic, which is now redundant, and also ensure that we set the intra-tx counter etc. correctly in bondValidator.

@alexanderbez
Copy link
Contributor Author

alexanderbez commented Sep 11, 2018

@cwgoes indeed, I agree we should have one place that does this -- bondValidator.

However, when I tried this, it screwed up iteration in other methods such as UpdateBondedValidators and updateCliffValidator because the ValidatorsByPowerIndexKey is used in conjunction with BondHeight and BondIntraTxCounter which are currently set before any iterator is created.

See: getValidatorPowerRank.

@alexanderbez
Copy link
Contributor Author

@cwgoes getting pretty deep into trying to figure out another way to set bond height and still have valid iteration.

I've come to the conclusion that in order to move setting the bond height, UpdateBondedValidators will need significant refactor. I think we should merge this as is and create an issue for a cleaner approach to setting the bond height and by proxy, UpdateBondedValidators.

I think this is a safe approach for now.

Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK

I think this PR is safe and does fix the immediate issue, but we should absolutely simplify this logic, #prelaunch.

@cwgoes cwgoes merged commit f7f20f1 into bez/2189-del-non-exist-val-bug Sep 12, 2018
@cwgoes cwgoes deleted the bez/2277-tendermint-val-update-fix branch September 12, 2018 06:45
@cwgoes
Copy link
Contributor

cwgoes commented Sep 12, 2018

See #2312.

chillyvee pushed a commit to chillyvee/cosmos-sdk that referenced this pull request Mar 1, 2024
* Update statesync.bash

* Update statesync.bash

* Update statesync.bash

Removed evmos comments, added a few other comments. 
Commented out go install

---------

Co-authored-by: Milan Mulji <98309852+mmulji-ic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants