-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gaiad network crash in executing redelegation transactions #2241
Comments
Interesting. Thanks for submitting this @HaoyangLiu. We can take a look at this ASAP. /cc @rigelrozanski |
Btw @HaoyangLiu, you can use the Was able to reproduce this on a local testnet (4 nodes) by simply calling |
@alexanderbez |
@HaoyangLiu I'd like to look through the staking code more to understand this. |
I'm thinking this should ideally be handled in the SDK state machine. I don't think the second tx should be valid and make it through under these circumstances. Seems like we might need to take a look at Side question, is it valid to create a bunch of |
Thanks for the submission - I'll look further into this issue. This should be fixed on the SDK side not the tendermint side - Related to #2188 |
Thanks @HaoyangLiu - #2238 will fix this, just as you suggest (the latter option): checking whether a validator is bonded or not before we tell Tendermint to remove it. We still want Tendermint to only deleted validators previously in the validator set because it serves as an additional sanity check on the SDK staking state machine. |
Awesome!Hope the PR can be merged soon. |
Oh wow - pretty crazy that this crashes the blockchain - Redelegation is not feature complete in that is only is supporting a single redelegation per delegator-sourceValidator pair - (see #1402) I've got a strong feeling it's related to this ^ but a bug nonetheless as this shouldn't cause a crash it should just be a failed transaction for the time being... @bez or @HaoyangLiu does somebody have output logs they could post here? |
Yes, I will paste logs shortly...and try to tackle this. |
Log:
|
That log looks like the error that we're attempting to remove a cliff validator which is not part of the validator set (aka a cliff bug) |
@rigelrozanski Are you sure? I think the analysis of @HaoyangLiu is correct -- this has to do with rounding of power. In any case, I'll try to replicate this via a unit test. |
hmm well maybe not a cliff validator issue (could very well be rounding!) - but it definitely looks like we're sending a 0 validator update for a non-existent validator which is killing the Tendermint. |
@rigelrozanski yes indeed! I can send two REDs:
return abci.Validator{
PubKey: tmtypes.TM2PB.PubKey(v.ConsPubKey),
Address: v.ConsPubKey.Address(),
Power: v.BondedTokens().RoundInt64(),
} So I suppose we should not allow a redelegation, if you're rounded power is zero? |
If a validator rounding voting power is zero,then Tendermint will remove it but staking mudule will still keep it. As a result, the validator set of Tendermint and staking are conflict. So how about just remove the validator in staking if its rounding voting power is zero. |
This is a really extreme edge case, I hope we should never see a validator with one voting power in production. That said, the validator is still persisting in the staking side of things thus, all staking related functionality for a bonded validator should be available. REALLY the only difference should be that the Tendermint update is not sent, however, redelegation should still be possible (as the validator is still there and real! - just not in tendermint). As well, the validator should NOT simply be removed, those funds should be kept and maintained until withdrawn. (of course you cannot withdrawn less than one token, however it can be redelegated to another validator and then withdrawn) |
We can't do this as the validator is still bonded and rightfully so. What @rigelrozanski is suggesting is cleaner and better approach. I'll amend my PR shortly. Thanks all 👍 |
Although I had a PR out that fixed this, we'll be addressing this and other issues via #2312. |
Actually, re-opening this to keep track of the issue. |
This will be fixed in #2394, which checks the rounded power before determining whether or not to send Tendermint a validator update. Thanks @HaoyangLiu. |
Summary of Bug
When I tried to send redelegate (maybe unbond can also reproduce this issue?) transactions twice between two given validators, the blockchain network encountered a consensus failure and failed to produce new blocks.
The above error log locates in tendermint/state/execution.go
Code for reproduce
Steps to Reproduce
Analysis of Bug
Currently, in staking module, we use
sdk.Dec
as the date type of token amount and shares. When calculating voting power, we convert it to anint64
.After the first redelegation is done, the remaining bonded token on validatorB is
0.025
and its equivalent voting power is zero. So once the first delegation is done, validatorB will be removed from the validator set in Tendermint.When the second redelegation transaction is executed, the
EndBlock
will produce a new validator set change: set validatorB voting power to zero again.However, the validatorB has already removed from validator set. The second remove operation will cause the fatal error.
Ideas about bug fix
The simplest way to fix this issue is to change code in tendermint: check if the validator exist before executing remove operation.
Maybe we can also fix this bug in staking module. Currently in staking, only the the validator bonded token is zero, will the validator be removed. Maybe here we should take its voting power into consideration.
The text was updated successfully, but these errors were encountered: