R4R: Ensure Tendermint Validator Update Invariants #2238

alexanderbez · 2018-09-04T22:55:37Z

Update GetTendermintUpdates to enforce the following invariants:
- If the validator is found in the staking store, only validators with non-zero power or zero-power that were bonded at the previous block height are returned to Tendermint.
- Otherwise, the validator is simply returned to Tendermint (meaning it was removed from the staking store).

closes: #2189, #2241
ref: #2188

Targeted PR against correct branch (see CONTRIBUTING.md)
Linked to github-issue with discussion and accepted design OR link to spec that describes this work.
Wrote tests
Updated relevant documentation (docs/)
Added entries in PENDING.md with issue #
rereviewed Files changed in the github PR explorer

For Admin Use:

Added appropriate labels to PR (ex. wip, ready-for-review, docs)
Reviewers Assigned
Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

Only return ABCI validators that match certain criteria

alexanderbez · 2018-09-04T23:00:22Z

x/stake/keeper/validator.go

+ if found {
+ // The validator is new or already exists in the store and must adhere to
+ // Tendermint invariants.
+ prevBonded := val.BondHeight < ctx.BlockHeight()


I'm not sure if this check is sound, but on the surface is seems correct.

codecov · 2018-09-05T13:30:08Z

Codecov Report

❗ No coverage uploaded for pull request base (develop@fb0cc0b). Click here to learn what that means.
The diff coverage is 78.26%.

@@            Coverage Diff             @@
##             develop    #2238   +/-   ##
==========================================
  Coverage           ?   63.75%           
==========================================
  Files              ?      140           
  Lines              ?     8652           
  Branches           ?        0           
==========================================
  Hits               ?     5516           
  Misses             ?     2756           
  Partials           ?      380

x/stake/keeper/validator.go

ValarDragon · 2018-09-05T15:56:36Z

Can you disable governance txs (so it runs fast) and run test sim Gaia slow locally

alexanderbez · 2018-09-05T16:02:12Z

@ValarDragon sure thing!

alexanderbez · 2018-09-05T16:07:21Z

Too good to be true! Still seems like an issue. Will look into it.

Update: Was able to reproduce with 500 blocks and a seed of 1536164987760853304 running ^TestStakeWithRandomMessages$ only.

alexanderbez · 2018-09-05T17:15:28Z

Ok, so what I have implemented does indeed fix and address the issue at hand, but this may be something else. Perhaps related to #2241.

I can follow the events as such, given validatorA:

height 16: bonded (bondedHeight: 6, unbondingHeight: 0)
height 18: unbonded (bondedHeight: 6, unbondingHeight: 18)
height 56: bonded (bondedHeight: 56, unbondingHeight: 18)
height 68: unbonded (bondedHeight: 56, unbondingHeight: 68)
height 75: ??? (validatorA not sent in GetTendermintUpdates -- should be since next line indicates it's unbonded)
height 88: unbonded (bondedHeight: 75, unbondingHeight: 88) -> simulation fails because TM does not have validatorA.

So validatorA does not go from zero, to non-zero, back to zero as discussed in #2188 it seems, but something else seems to be happening. Correct me if I'm wrong.

/cc @cwgoes @rigelrozanski

ValarDragon · 2018-09-05T17:27:11Z

Since the test_sim_gaia_fast now passes, this implies that this did fix a bug! Lets leave the issue open though until we solve the second issue as well.

alexanderbez · 2018-09-05T17:30:14Z

Ok! Should we update that issue with my notes?

ValarDragon · 2018-09-05T18:39:06Z

Please do!

rigelrozanski

I find the design of this fix to be counterintuitive. What's been implemented here is a fix where at the time of retrieving the Tendermint Updates we ad-hoc perform a check on whether it should actually be sent to tendermint for an update or not. This logic is tucked away in the get method (there are no other Get methods with custom filter logic like this)

A better fix is one where the Tendermint updates simply do not get added to the TendermintUpdates store if it's not appropriate to do so (see this method

cosmos-sdk/x/stake/keeper/validator.go

Line 637 in 53253aa

bzABCI := k.cdc.MustMarshalBinary(validator.ABCIValidatorZero())

) -

To know whether or not a particular validator was bonded in the previous block (and therefore should in fact be removed from tendermint) can be determined using the current height (from the ctx) and the validator's fields BondHeight and UnbondingHeight - before the unbonding height has been updated for this transaction if BondHeight > UnbondingHeight && BondHeight > CurrentHeight then the Tendermint update to 0 should be sent, all other situations it should not be sent (I think)

alexanderbez · 2018-09-06T13:12:42Z

@rigelrozanski thanks for the review!

To be quite honest, I don't really follow what you're suggesting in terms of how to change this, but I'll try to respond based on my understanding as is:

I find the design of this fix to be counterintuitive... This logic is bizarre and tucked away in the get method...

Mhmmm perhaps, but I'd have to respectfully disagree in the sense that it is "bizarre". While it might not be the optimal solution in terms of placement, I think there are a handful solutions that would get this job done and the one provided is one of them. I'm certainly open to more optimal, rather more clear and concise, solutions.

A better fix is one where the Tendermint updates simply do not get added to the Tendermint Updates store if it's not appropriate to do so...

Actually, this was my first inclination and attempt -- I didn't even want to modify GetTendermintUpdates (as you're suggesting). However, it's not that simple. There are 5 occurrences where store.Set(GetTendermintUpdatesKey(...), bz) is invoked, all with different and unique bits of branching logic.

e.g. a validator going from zero to non-zero power would trigger this update procedure. Then when it goes back to zero, it's already in the update store, so having this logic there would do us no good.

Please clarify if this is what you had in mind? Maybe I'm completely misunderstanding and I could very well be totally wrong here and a more concise solution exists.

To know whether or not a particular validator was bonded in the previous block ... - after before the unbonding height has been updated for this transaction ...

Hmm, wording is a bit off I think and it's confusing me. Are you suggesting the following?

When unbonding a validator, check if: BondHeight > UnbondingHeight && BondHeight > CurrentHeight. If so, do not update the Tendermint updates store.

rigelrozanski · 2018-09-07T17:23:29Z

you're right - the logic is not bizarre (bad use of language) - it's really good! (my bad!) However let me attempt to explain my idea clearer.

RE: confusing wording sorry there was a typo, just edited after before -> before

When unbonding a validator, check if: BondHeight > UnbondingHeight && BondHeight > CurrentHeight. If so, do not update the Tendermint updates store.

not quite, my pseudocode in the next comment should clear things up though

There are 5 occurrences where store.Set(GetTendermintUpdatesKey(...), bz) is invoked, all with different and unique bits of branching logic.

yes but there is only one occurrence where it should be updating for a zero validator (the one which I linked too)... It's possible that there are edge cases on the other 4 occurrences where the validator is also being updated as a zero validator (maybe not?) - either way what I think is a more optimal design is to create a special function which contains all the set logic for the tendermint store. so throughout the code rather than calling store.Set(GetTendermintUpdatesKey(...), bz) directly we would call this function instead. Something like (rough code):

func SetTendermintStore(ctx sdk.Context, validator stake.Validator, abciVal stake.AbciVal) {
   store := ctx.GetStore()

   if abciVal.Amount.IsZero() && 
      validator.BondHeight < UnbondingHeight{
      // already unbonded
      return   
   }  
   bz := marshal(abciVal) 
   store.Set(GetTendermintUpdatesKey(valAddr), bz)
   return
}

alexanderbez · 2018-09-07T19:32:36Z

Per discussion with @jaekwon @cwgoes and @rigelrozanski, we've concluded it's best and safer to keep the new logic as is but change the naming of GetTendermintUpdates to be more indicative of it's underlying logic.

cwgoes

utACK

Discussed in person.

cwgoes · 2018-09-08T09:09:58Z

@alexanderbez Let's separate #2238 (comment) into a separate issue.

cwgoes · 2018-09-08T09:11:13Z

Ref #2277

cwgoes · 2018-09-08T09:14:13Z

Should we merge this despite it failing the simulation? I guess not, but it does definitely fix a bug.

Let's work on the simulation fix in a separate PR targeted to this one to separate concerns.

rigelrozanski

Good work on the design :)

alexanderbez · 2018-09-10T12:47:41Z

Will do @cwgoes

* Add a unit test showing invalid TM validator updates with bonded vals * Re-add defensive check in UpdateBondedValidators for jailed validators * Cleanup and set bond height in keeper#bondValidator * Get pool after getting context with new block height in unit test * Fix broken unit tests * Update prevBonded value * Rename validator to oldValidator in bondIncrement

cwgoes

Tested ACK

Ideas for future simplification: #2312

Aleksandr Bezobchuk added 3 commits September 4, 2018 17:08

Update GetTendermintUpdates

8e79209

Only return ABCI validators that match certain criteria

Fix previous bug on missing ABCI validator updates and fix unit tests

a04c52b

Add fatal to enforce validator update invariant to simulation

fd26187

alexanderbez added T:Bug C:x/staking prelaunch labels Sep 4, 2018

alexanderbez requested review from cwgoes, ebuchman and rigelrozanski as code owners September 4, 2018 22:55

alexanderbez commented Sep 4, 2018

View reviewed changes

alexanderbez mentioned this pull request Sep 5, 2018

Gaiad network crash in executing redelegation transactions #2241

Closed

Update GetTendermintUpdates contract

540bae4

Aleksandr Bezobchuk added 3 commits September 5, 2018 09:52

Implement TestGetTendermintUpdatesBonded unit test

cdc288a

Add additional case to TestGetTendermintUpdatesBonded

9dc4930

Update pending log

b0d25af

alexanderbez changed the title ~~WIP: Ensure Tendermint Validator Update Invariants~~ R4R: Ensure Tendermint Validator Update Invariants Sep 5, 2018

alexanderbez added ready-for-review and removed wip labels Sep 5, 2018

abelliumnt reviewed Sep 5, 2018

View reviewed changes

x/stake/keeper/validator.go Outdated Show resolved Hide resolved

alexanderbez mentioned this pull request Sep 5, 2018

WIP: Demonstrate staking bug #2188

Closed

rigelrozanski previously requested changes Sep 5, 2018

View reviewed changes

Rename GetTendermintUpdates

05de308

alexanderbez requested a review from zramsay as a code owner September 7, 2018 19:20

Aleksandr Bezobchuk added 2 commits September 7, 2018 15:22

Merge branch 'develop' into bez/2189-del-non-exist-val-bug

03de311

Fix linting

ff4ffff

Merge branch 'develop' into bez/2189-del-non-exist-val-bug

ebb2b2e

cwgoes approved these changes Sep 8, 2018

View reviewed changes

cwgoes mentioned this pull request Sep 8, 2018

SDK 0.25 / Gaia-9000 release checklist #2220

Closed

23 tasks

cwgoes mentioned this pull request Sep 8, 2018

Invalid validator state in simulation #2277

Closed

rigelrozanski approved these changes Sep 9, 2018

View reviewed changes

alexanderbez mentioned this pull request Sep 10, 2018

R4R: Ensure Tendermint Validator Update Invariants - Part II #2298

Merged

5 tasks

alexanderbez and others added 3 commits September 12, 2018 14:45

Merge branch 'develop' into bez/2189-del-non-exist-val-bug

c749791

Fix tiny x/mock issue from merge

c16f33f

cwgoes force-pushed the bez/2189-del-non-exist-val-bug branch from 602e626 to c16f33f Compare September 12, 2018 07:08

cwgoes approved these changes Sep 12, 2018

View reviewed changes

cwgoes mentioned this pull request Sep 12, 2018

simulation: Add weighted operation #2303

Merged

2 tasks

cwgoes merged commit 854aca2 into develop Sep 12, 2018

cwgoes deleted the bez/2189-del-non-exist-val-bug branch September 12, 2018 07:16

cwgoes mentioned this pull request Sep 12, 2018

Randomized Fuzzer: should not already be unbonded or unbonding, validator: #2290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R4R: Ensure Tendermint Validator Update Invariants #2238

R4R: Ensure Tendermint Validator Update Invariants #2238

alexanderbez commented Sep 4, 2018 •

edited

Loading

alexanderbez Sep 4, 2018

codecov bot commented Sep 5, 2018 •

edited

Loading

ValarDragon commented Sep 5, 2018 •

edited

Loading

alexanderbez commented Sep 5, 2018

alexanderbez commented Sep 5, 2018 •

edited

Loading

alexanderbez commented Sep 5, 2018 •

edited

Loading

ValarDragon commented Sep 5, 2018

alexanderbez commented Sep 5, 2018

ValarDragon commented Sep 5, 2018

rigelrozanski left a comment •

edited

Loading

alexanderbez commented Sep 6, 2018

rigelrozanski commented Sep 7, 2018

alexanderbez commented Sep 7, 2018

cwgoes left a comment

cwgoes commented Sep 8, 2018

cwgoes commented Sep 8, 2018

cwgoes commented Sep 8, 2018

rigelrozanski left a comment

alexanderbez commented Sep 10, 2018

cwgoes left a comment

R4R: Ensure Tendermint Validator Update Invariants #2238

R4R: Ensure Tendermint Validator Update Invariants #2238

Conversation

alexanderbez commented Sep 4, 2018 • edited Loading

alexanderbez Sep 4, 2018

Choose a reason for hiding this comment

codecov bot commented Sep 5, 2018 • edited Loading

Codecov Report

ValarDragon commented Sep 5, 2018 • edited Loading

alexanderbez commented Sep 5, 2018

alexanderbez commented Sep 5, 2018 • edited Loading

alexanderbez commented Sep 5, 2018 • edited Loading

ValarDragon commented Sep 5, 2018

alexanderbez commented Sep 5, 2018

ValarDragon commented Sep 5, 2018

rigelrozanski left a comment • edited Loading

Choose a reason for hiding this comment

alexanderbez commented Sep 6, 2018

rigelrozanski commented Sep 7, 2018

alexanderbez commented Sep 7, 2018

cwgoes left a comment

Choose a reason for hiding this comment

cwgoes commented Sep 8, 2018

cwgoes commented Sep 8, 2018

cwgoes commented Sep 8, 2018

rigelrozanski left a comment

Choose a reason for hiding this comment

alexanderbez commented Sep 10, 2018

cwgoes left a comment

Choose a reason for hiding this comment

alexanderbez commented Sep 4, 2018 •

edited

Loading

codecov bot commented Sep 5, 2018 •

edited

Loading

ValarDragon commented Sep 5, 2018 •

edited

Loading

alexanderbez commented Sep 5, 2018 •

edited

Loading

alexanderbez commented Sep 5, 2018 •

edited

Loading

rigelrozanski left a comment •

edited

Loading