-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chain reaches consensus failure on binary rebuild and restart #543
Comments
Interesting. @toteki does anything stick out here? |
@adamewozniak when you rebuilt, did you change any code? |
We were talking about it. and the only way for this to happen is for the The error message posted, by the way, comes out to be a 3 minute time reversal, longer than a block time, and might be consistent with the time the chain was stopped before relaunch. Still can't figure out why |
Nope, you should be able to repro this just using those console commands & no change |
@adamewozniak This still happen? I couldn't reproduce |
@RafilxTenfen I'm now able to repo it by running starport, making a change to force starport to rebuild, and then restarting starport. Let me know if you can repro that way, same issue :
cc : @toteki |
I suspect this is just blockTime behaving weirdly when starport restarts, but we should see if we run into it during our chain upgrade tests. If that happens, it might be worth treating unexpected elapsed times differently. |
@toteki I think we should be worried about upgrading v2 -> v3, since the chain would be required to stop and restart similarly to this. Our upgrade tests should include v2.0.0 -> v2.0.1 (when we cut it), I think that would replicate this better - but agreed, it's probably just blocktime behaving weirdly. Edit: Looks like it also only happens once, e.g., if you change the codeblock out to this : // calculate time elapsed since last interest accrual (measured in years for APR math)
yearsElapsed := sdk.NewDec(currentTime - prevInterestTime).QuoInt64(types.SecondsPerYear)
if yearsElapsed.IsNegative() {
// Because this action is not caused by a message, logging and
// events are here instead of msg_server.go
k.Logger(ctx).Debug(
"negative time elapsed",
"block_height", fmt.Sprintf("%d", ctx.BlockHeight()),
"unix_time", fmt.Sprintf("%d", currentTime),
"yearsElapsed", yearsElapsed.String(),
)
return nil
} You get the "negative time elapsed" message, and then it continues as normal |
We could do that - but negative time elapsed is a panic because if it happens for any amount of time (worst case: if block time were ever zero for some reason) then unjustified interest (worst case: 50 years worth) will accrue instantly on the next block with no error message. I want to be sure that it's not a starport-only problem before considering removing the current safety. Has anyone run into the bug without starport during this week's gov upgrade testing? |
I don't believe this has been seen in a while. Please confirm / close, either @dreamcodez, @RafilxTenfen, or @toteki |
Been a while - I suspect it's because our local testing has evolved beyond relaunching with starport (e.g. noob scripts and multinode now). That would mean that blocktime moving backwards was just a starport edge case we don't have a problem. |
Reopening this since @RafilxTenfen has seen it while trying to do a fork of our testnet |
Closing this back up, @RafilxTenfen was able to get the fork past this just by updating the genesis file :^) |
Hey, I was able to reproduce this with an single node
|
So, this is basically happening, because when we export a genesis it saves a json as the genesis state, this genesis state has the time at which the genesis was generated ex.: |
In my opinion, tendermint should save the last block time in the state as well... One idea to solve this would be for us to use block heigh difference instead of time, do you see a way to use this instead of the time lapsed? @toteki |
Possible approaches:
|
I liked the |
Related to #1533 |
So our genesis is not consistent. Instead of adding an exception logic in
|
But the exporting / importing stuffs from blocks is done in tendermint/cosmos-sdk layer, right? |
I'm unsure that will do it. We're having consensus failures in other places. Can you post the stack trace and error message? As text (not an image)? I would like to see it. |
No. We control how to handle import / export. https://github.com/umee-network/umee/blob/main/x/leverage/genesis.go |
The inaccurate time isn't in leverage export - it's coming from BlockTime() itself going backwards. That's also why Rafael was able to reproduce with |
oh, I thought we have our own |
On gaiad is not going to have an consensus failure, but the next block after exported genesis happen will have the genesis block time, you can reproduce this You can reproduce the consensus failure, running theses steps |
I see, I didn't look at cosmos/gaia#1533 In that case I agree: let's:
|
|
Summary of Bug
When running the beta version of the chain locally, rebuilding the binary in beta mode, stopping the chain, and then restarting, it reaches consensus failure via an error from the
x/leverage
module :ERR CONSENSUS FAILURE!!! err="-0.000002092846270928 years: negative time elapsed since last interest time" module=consensus ...
Not sure if this is a major issue, but still worth jotting down
Version
Please provide the output of the following commands:
$ umeed version
price-feeder/v0.1.0-c66f8961
$ go version
go version go1.17.3 darwin/arm64
$ uname -a
Darwin Adams-MacBook-Pro-2.local 21.3.0 Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T6000 arm64
Steps to Reproduce
Steps to reproduce the behavior:
For Admin Use
The text was updated successfully, but these errors were encountered: