Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoching: event and hook upon a certain threshold amount of slashed voting power #35

Merged
merged 57 commits into from
Jul 8, 2022

Conversation

SebastianElvis
Copy link
Member

@SebastianElvis SebastianElvis commented Jun 29, 2022

Fixes BM-44

This PR introduces the event and hook upon a certain threshold (at the moment 1/3 and 2/3) of voting power is slashed in a single epoch. The epoching module counts slashed validators by subscribing the BeforeValidatorSlashed hook of the staking module.

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! We just need to nail down the fraction and power handling.


epochNumber := h.k.GetEpochNumber(ctx)
numSlashedVals := h.k.GetSlashedValidatorSetSize(ctx, epochNumber)
numMaxVals := h.k.stk.GetParams(ctx).MaxValidators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MaxValidators just shows how many there can't be more of, perhaps 100 by default. But if we only have, say, 20, we should use 20.

There is an API available externally to return the validator set by block height. Would be nice if it was available internally as well, and we could call it with the first block height where the new validator set is active in the epoch.

In any case, as discussed, it should be based on total power.

Comment on lines 52 to 53
// increment set size
size := k.GetSlashedValidatorSetSize(ctx, epochNumber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment about fractionals and double entries.

prefix := append(types.SlashedValidatorKey, epochNumberBytes...)

// add each valAddr, which is the key
// the prefix `SlashedValidatorKey || epochNumber` has been excluded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh really, what does the exclusion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be more specific in the code comment. Namely, let SlashedValidatorKey || epochNumber || valAddr be the key, if we create an iterator with prefix SlashedValidatorKey || epochNumber, then each iterated item's key will be valAddr. Perhaps "stripped" will be a better word here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I don't believe this to be true. Looking at how KVStorePrefixIterator is implemented, it just calls the underlying kvs.Iterate function which goes from a start to and end.

I don't see any code that would do any stripping, while prefix.NewStore does exactly that.

defer iterator.Close()
for ; iterator.Valid(); iterator.Next() {
key := iterator.Key()
store.Delete(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously you commented that this key will have been trimmed of its prefix and it can be parsed as a validator address. It doesn't sound like it can be used to delete a value from store.

@vitsalis and @gitferry use prefix.NewStore to derive versions of the store that can I think handle this sort of thing, but this looks like a vanilla store that expects the full key.

Copy link
Member Author

@SebastianElvis SebastianElvis Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitsalis and @gitferry use prefix.NewStore to derive versions of the store that can I think handle this sort of thing, but this looks like a vanilla store that expects the full key.

This is doable and has the same effect as the current one. Osmosis does the same as here (e.g., https://github.com/osmosis-labs/osmosis/blob/v10.0.0/x/epochs/keeper/epoch.go#L13-L26) while Cosmos SDK uses NewStore and PrefixStore more often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code you are pointing at in Osmosis is adding a prefix to the epoch and uses that as a key. That is perfectly fine, and we rigthfully use the same technique.

What I was trying to highlight is that store doesn't know about prefixes unless you put them in the key, or you use prefix.NewStore. If what you said about iterating with KVStorePrefixIterator was true, that it strips the prefix, then surely the key it returns would _not_ work with store.Delete` because it would be just the tail of the key, and it would need to be prefixed again.

That would be a poor API, and that's why they have the prefix.NewStore wrapper to take care of this.

So, this code is fine, I'm just saying that what you said sounds like a contradiction.

@SebastianElvis
Copy link
Member Author

SebastianElvis commented Jul 8, 2022

Now this PR counts voting power rather than number of slashed validators. It does so by maintaining a new store on the validator set in the beginning of each epoch. This PR also replaces the append approach with the prefix one as per yesterday's discussion. Feel free to have a look again!

NOTE: although the validator set size is not usued at the moment, I kept it since it may be useful in the future.
After a second thought I changed maintaining the slashed validator set size to maintaining the slashed voting power. This way we don't need to look up the entire validator set everytime upon slashing a validator.

@SebastianElvis SebastianElvis changed the title epoching: event and hook upon a certain threshold number of slashed validators epoching: event and hook upon a certain threshold amont of slashed voting power Jul 8, 2022
@SebastianElvis SebastianElvis changed the title epoching: event and hook upon a certain threshold amont of slashed voting power epoching: event and hook upon a certain threshold amount of slashed voting power Jul 8, 2022
store := ctx.KVStore(k.storeKey)

// remove all epoch msgs
iterator := sdk.KVStorePrefixIterator(ctx.KVStore(k.storeKey), types.QueuedMsgKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
iterator := sdk.KVStorePrefixIterator(ctx.KVStore(k.storeKey), types.QueuedMsgKey)
iterator := sdk.KVStorePrefixIterator(store, types.QueuedMsgKey)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to use the "state" approach @gitferry adopted and create a prefix.NewStore(store, types.QueuedMsgKey) to be available in the Keeper/State for its whole lifetime, and then you can just call k.queuedMsgs.Iterate(nil, nil)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work, but it seems that Cosmos SDK and Osmosis create a new KVStore handle for each keeper function. Not sure about the pitfalls or concerns here, so perhaps let's follow their approach in this PR at the moment.

sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)

// setSlashedVotingPower sets the slashed voting power
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// setSlashedVotingPower sets the slashed voting power
// setSlashedVotingPower sets the total sum of voting power that has been slashed in the epoch

I assume that's what it is. I'd try to avoid docstrings that literally repeat what the function name says, just using space instead of camelCase, without explaining what it really means.

Comment on lines 63 to 65
// insert KV pair, where
// - key: valAddr
// - value: empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// insert KV pair, where
// - key: valAddr
// - value: empty
// insert into "set of addresses" as KV pair, where
// - key: valAddr
// - value: empty

Comment on lines +89 to +90
// ClearSlashedValidators removes all slashed validators in the set
// TODO: This is called upon the epoch is checkpointed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this exists, there should probably be one for the validator set power as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean GetSlashedVotingPower here or GetTotalVotingPower in epoch_val_set.go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean if you delete the slashed, it's probably because you also don't need the initial set any more.

I just wanted to mention it, no need to add it right away.

)

// GetValidatorSet returns the set of validators of a given epoch
func (k Keeper) GetValidatorSet(ctx sdk.Context, epochNumber sdk.Uint) map[string]int64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (k Keeper) GetValidatorSet(ctx sdk.Context, epochNumber sdk.Uint) map[string]int64 {
func (k Keeper) GetValidatorSet(ctx sdk.Context, epochNumber sdk.Uint) map[Address]int64 {

If we have a newtype for this, we should use it. We discussed with @gitferry but I'm not sure it's available here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the staking module uses sdk.ValAddress

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ideal if we can use sdk.ValAddress. The problem is that sdk.ValAddress is basically []byte and thus cannot be used as a key of a map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined a string type called ValidatorAddress in checkpointing/types but I would prefer to use sdk.ValAddress. In this case, maybe it would be better to define a newtype string in the epoching module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to return an array of validators instead of a map? the checkpointing module needs to find the index of each validator to build bitmap. If the returned set is a map, how to know the index of each validator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps next PR I will wrap a function for this purpose (e.g., return a slice of addresses ordered by their voting power). At the moment this function aims at getting the voting power distribution at the beginning of each epoch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ordering by address is more stable. Not sure you can rely on Go to deterministically order validators with equal power across different machines.

Comment on lines 148 to 159
iterator := k.stk.LastValidatorsIterator(ctx)
defer iterator.Close()

for ; iterator.Valid(); iterator.Next() {
// extract the validator address from the key (prefix is 1-byte, addrLen is 1-byte)
valAddr := stakingtypes.AddressFromLastValidatorPowerKey(iterator.Key())
valAddrStr := string(valAddr)
powerBytes := iterator.Value()
var power sdk.Int
if err := power.Unmarshal(powerBytes); err != nil {
return nil, sdkerrors.Wrap(types.ErrUnmarshal, err.Error())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use IterateLastValidatorPowers to let the stakign module do all the parsing.

// voting power of this validator
thisVotingPower, ok := validatorSet[valAddr.String()]
if !ok {
panic(types.ErrUnknownValidator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might actually happen.

We discussed that inactivity slashing can push a validator off he edge, where its stake becomes so small that it swaps places with the 101st validator. Then that validator could be slashed for inactivity, swapping places again.

In that scenario, the 101st validator was not in the validator set in the beginning of the epoch, so you won't now what power it had.

We should just return if we can't find a validator, treat it like its power was 0. Maybe issue a warning if you believe this might be happening for the wrong reasons, or just do enough testing to convince yourself that it won't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! The immediate slashing will lead to such scenarios. Agree with the approach of considering this new validator in the edge to have no voting power, as Babylon considers the validator set in the epoch beginning to be the validator set throughout this epoch.

Comment on lines 46 to 48
thresholds := []float64{}
thresholds = append(thresholds, float64(1)/float64(3))
thresholds = append(thresholds, float64(2)/float64(3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
thresholds := []float64{}
thresholds = append(thresholds, float64(1)/float64(3))
thresholds = append(thresholds, float64(2)/float64(3))
thresholds := []float64{float64(1)/float64(3), float64(2)/float64(3)}

I thought it can do this.

Copy link
Member Author

@SebastianElvis SebastianElvis Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I used to write this but there were too many float64 in a single line...

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good, well done figuring out the power storing stuff!

I see no blockers, except that panic which I think should be anticipated.

@SebastianElvis
Copy link
Member Author

I have fixed or replied the follow-up comments in this PR. Feel free to review again, thanks Akosh!

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good now, although it seems to have grown, I'm sure it's fine 👍

@SebastianElvis SebastianElvis merged commit 8667398 into main Jul 8, 2022
@SebastianElvis SebastianElvis deleted the epoching-slash-panicing branch July 8, 2022 13:30
@SebastianElvis
Copy link
Member Author

Wow, a small ticket leads to such a huge PR 🤣

Comment on lines 27 to +33
if uint64(ctx.BlockHeight())-1 == epochBoundary.Uint64() {
// increase epoch number
incEpochNumber := k.IncEpochNumber(ctx)
// init the slashed voting power of this new epoch
k.InitSlashedVotingPower(ctx)
// store the current validator set
k.InitValidatorSet(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that epoch boundary + 1 is the right time to take a validator snapshot? Not boundary + 2 by any chance?

I'll have to read https://babylon-chain.atlassian.net/wiki/spaces/BABYLON/pages/10518656/4.+Validator+Power+Transfer+Design again to remind myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I wrote in that document, the block at boundary+1 is still produced by the old validator set, but already reflects the next one in the header. Boundary + 2 is the first produced by the new guys.

If it was the case that it's still the old guard at this point, we just have to make sure that if anyone asks based on height then the correct validator set is still returned, ie. we are in epoch N+1 already, but at boundary+1 the validator set is still the one for epoch N. Also for slashing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but I forgot that we aren't asking Tendermint here, we're asking the staking module, so hopefully it's fine!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah boundary+1 looks to be the right choice to me. The laziness of Tendermint consensus does not affect our logic to me, since Cosmos SDK has abstracted most things about Tendermint consensus away. For example, here the (boundary+1)-th block (which takes a validator set snapshot) in Cosmos' view will be executed at height (boundary+2) in Tendermint's view.

KonradStaniec added a commit that referenced this pull request Oct 25, 2023
* Add query for header main chain depth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants