Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4R: Ensure Tendermint Validator Update Invariants #2238

Merged
merged 14 commits into from
Sep 12, 2018

Conversation

alexanderbez
Copy link
Contributor

@alexanderbez alexanderbez commented Sep 4, 2018

  • Update GetTendermintUpdates to enforce the following invariants:
    • If the validator is found in the staking store, only validators with non-zero power or zero-power that were bonded at the previous block height are returned to Tendermint.
    • Otherwise, the validator is simply returned to Tendermint (meaning it was removed from the staking store).

closes: #2189, #2241
ref: #2188


  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added entries in PENDING.md with issue #

  • rereviewed Files changed in the github PR explorer


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

if found {
// The validator is new or already exists in the store and must adhere to
// Tendermint invariants.
prevBonded := val.BondHeight < ctx.BlockHeight()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this check is sound, but on the surface is seems correct.

@codecov
Copy link

codecov bot commented Sep 5, 2018

Codecov Report

❗ No coverage uploaded for pull request base (develop@fb0cc0b). Click here to learn what that means.
The diff coverage is 78.26%.

@@            Coverage Diff             @@
##             develop    #2238   +/-   ##
==========================================
  Coverage           ?   63.75%           
==========================================
  Files              ?      140           
  Lines              ?     8652           
  Branches           ?        0           
==========================================
  Hits               ?     5516           
  Misses             ?     2756           
  Partials           ?      380

@alexanderbez alexanderbez changed the title WIP: Ensure Tendermint Validator Update Invariants R4R: Ensure Tendermint Validator Update Invariants Sep 5, 2018
@ValarDragon
Copy link
Contributor

ValarDragon commented Sep 5, 2018

Can you disable governance txs (so it runs fast) and run test sim Gaia slow locally

@alexanderbez
Copy link
Contributor Author

@ValarDragon sure thing!

@alexanderbez
Copy link
Contributor Author

alexanderbez commented Sep 5, 2018

Too good to be true! Still seems like an issue. Will look into it.

Update: Was able to reproduce with 500 blocks and a seed of 1536164987760853304 running ^TestStakeWithRandomMessages$ only.

@alexanderbez
Copy link
Contributor Author

alexanderbez commented Sep 5, 2018

Ok, so what I have implemented does indeed fix and address the issue at hand, but this may be something else. Perhaps related to #2241.

I can follow the events as such, given validatorA:

  • height 16: bonded (bondedHeight: 6, unbondingHeight: 0)
  • height 18: unbonded (bondedHeight: 6, unbondingHeight: 18)
  • height 56: bonded (bondedHeight: 56, unbondingHeight: 18)
  • height 68: unbonded (bondedHeight: 56, unbondingHeight: 68)
  • height 75: ??? (validatorA not sent in GetTendermintUpdates -- should be since next line indicates it's unbonded)
  • height 88: unbonded (bondedHeight: 75, unbondingHeight: 88) -> simulation fails because TM does not have validatorA.

So validatorA does not go from zero, to non-zero, back to zero as discussed in #2188 it seems, but something else seems to be happening. Correct me if I'm wrong.

/cc @cwgoes @rigelrozanski

@ValarDragon
Copy link
Contributor

Since the test_sim_gaia_fast now passes, this implies that this did fix a bug! Lets leave the issue open though until we solve the second issue as well.

@alexanderbez
Copy link
Contributor Author

Ok! Should we update that issue with my notes?

@ValarDragon
Copy link
Contributor

Please do!

Copy link
Contributor

@rigelrozanski rigelrozanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the design of this fix to be counterintuitive. What's been implemented here is a fix where at the time of retrieving the Tendermint Updates we ad-hoc perform a check on whether it should actually be sent to tendermint for an update or not. This logic is tucked away in the get method (there are no other Get methods with custom filter logic like this)

A better fix is one where the Tendermint updates simply do not get added to the TendermintUpdates store if it's not appropriate to do so (see this method

bzABCI := k.cdc.MustMarshalBinary(validator.ABCIValidatorZero())
) -

To know whether or not a particular validator was bonded in the previous block (and therefore should in fact be removed from tendermint) can be determined using the current height (from the ctx) and the validator's fields BondHeight and UnbondingHeight - before the unbonding height has been updated for this transaction if BondHeight > UnbondingHeight && BondHeight > CurrentHeight then the Tendermint update to 0 should be sent, all other situations it should not be sent (I think)

@alexanderbez
Copy link
Contributor Author

@rigelrozanski thanks for the review!

To be quite honest, I don't really follow what you're suggesting in terms of how to change this, but I'll try to respond based on my understanding as is:


I find the design of this fix to be counterintuitive... This logic is bizarre and tucked away in the get method...

Mhmmm perhaps, but I'd have to respectfully disagree in the sense that it is "bizarre". While it might not be the optimal solution in terms of placement, I think there are a handful solutions that would get this job done and the one provided is one of them. I'm certainly open to more optimal, rather more clear and concise, solutions.

A better fix is one where the Tendermint updates simply do not get added to the Tendermint Updates store if it's not appropriate to do so...

Actually, this was my first inclination and attempt -- I didn't even want to modify GetTendermintUpdates (as you're suggesting). However, it's not that simple. There are 5 occurrences where store.Set(GetTendermintUpdatesKey(...), bz) is invoked, all with different and unique bits of branching logic.

e.g. a validator going from zero to non-zero power would trigger this update procedure. Then when it goes back to zero, it's already in the update store, so having this logic there would do us no good.

Please clarify if this is what you had in mind? Maybe I'm completely misunderstanding and I could very well be totally wrong here and a more concise solution exists.

To know whether or not a particular validator was bonded in the previous block ... - after before the unbonding height has been updated for this transaction ...

Hmm, wording is a bit off I think and it's confusing me. Are you suggesting the following?

When unbonding a validator, check if: BondHeight > UnbondingHeight && BondHeight > CurrentHeight. If so, do not update the Tendermint updates store.

@rigelrozanski
Copy link
Contributor

you're right - the logic is not bizarre (bad use of language) - it's really good! (my bad!) However let me attempt to explain my idea clearer.

  • RE: confusing wording sorry there was a typo, just edited after before -> before

When unbonding a validator, check if: BondHeight > UnbondingHeight && BondHeight > CurrentHeight. If so, do not update the Tendermint updates store.

not quite, my pseudocode in the next comment should clear things up though

There are 5 occurrences where store.Set(GetTendermintUpdatesKey(...), bz) is invoked, all with different and unique bits of branching logic.

yes but there is only one occurrence where it should be updating for a zero validator (the one which I linked too)... It's possible that there are edge cases on the other 4 occurrences where the validator is also being updated as a zero validator (maybe not?) - either way what I think is a more optimal design is to create a special function which contains all the set logic for the tendermint store. so throughout the code rather than calling store.Set(GetTendermintUpdatesKey(...), bz) directly we would call this function instead. Something like (rough code):

func SetTendermintStore(ctx sdk.Context, validator stake.Validator, abciVal stake.AbciVal) {
   store := ctx.GetStore()

   if abciVal.Amount.IsZero() && 
      validator.BondHeight < UnbondingHeight{
      // already unbonded
      return   
   }  
   bz := marshal(abciVal) 
   store.Set(GetTendermintUpdatesKey(valAddr), bz)
   return
}

@alexanderbez
Copy link
Contributor Author

Per discussion with @jaekwon @cwgoes and @rigelrozanski, we've concluded it's best and safer to keep the new logic as is but change the naming of GetTendermintUpdates to be more indicative of it's underlying logic.

Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@cwgoes cwgoes dismissed rigelrozanski’s stale review September 8, 2018 09:05

Discussed in person.

@cwgoes cwgoes mentioned this pull request Sep 8, 2018
23 tasks
@cwgoes
Copy link
Contributor

cwgoes commented Sep 8, 2018

@alexanderbez Let's separate #2238 (comment) into a separate issue.

@cwgoes
Copy link
Contributor

cwgoes commented Sep 8, 2018

Ref #2277

@cwgoes
Copy link
Contributor

cwgoes commented Sep 8, 2018

Should we merge this despite it failing the simulation? I guess not, but it does definitely fix a bug.

Let's work on the simulation fix in a separate PR targeted to this one to separate concerns.

Copy link
Contributor

@rigelrozanski rigelrozanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on the design :)

@alexanderbez
Copy link
Contributor Author

Will do @cwgoes

alexanderbez and others added 3 commits September 12, 2018 14:45
* Add a unit test showing invalid TM validator updates with bonded vals

* Re-add defensive check in UpdateBondedValidators for jailed validators

* Cleanup and set bond height in keeper#bondValidator

* Get pool after getting context with new block height in unit test

* Fix broken unit tests

* Update prevBonded value

* Rename validator to oldValidator in bondIncrement
@cwgoes cwgoes force-pushed the bez/2189-del-non-exist-val-bug branch from 602e626 to c16f33f Compare September 12, 2018 07:08
Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK

Ideas for future simplification: #2312

@cwgoes cwgoes mentioned this pull request Sep 12, 2018
2 tasks
@cwgoes cwgoes merged commit 854aca2 into develop Sep 12, 2018
@cwgoes cwgoes deleted the bez/2189-del-non-exist-val-bug branch September 12, 2018 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

x/staking: A sequence of transactions causes one to delete a non-existent validator
5 participants