-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upgrade module #3979
Add upgrade module #3979
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #3979 +/- ##
==========================================
Coverage ? 60.03%
==========================================
Files ? 215
Lines ? 15248
Branches ? 0
==========================================
Hits ? 9154
Misses ? 5450
Partials ? 644 |
@jackzampolin so this doesn't really integrate with the |
x/upgrade/client/cli/query.go
Outdated
// GetQueryCmd creates a query sub-command for the upgrade module using cmdName as the name of the sub-command. | ||
func GetQueryCmd(cmdName string, storeName string, cdc *codec.Codec) *cobra.Command { | ||
return &cobra.Command{ | ||
Use: cmdName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we call this list
or show
? the command would be gaiacli q upgrade show
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in Regen Ledger it's just xrncli query upgrade-plan
with no separate sub-command, but happy to add one if that makes sense. show
would work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well here on the SDK we use the module client interface to export and automatically register CLI functionality. See gov
for an example https://github.com/cosmos/cosmos-sdk/blob/develop/x/gov/client/module_client.go#L17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to do that initially but I'm not exporting any tx command for this module because it should be handled by gov. I guess clients could filter nil values here: https://github.com/cosmos/cosmos-sdk/blob/develop/cmd/gaia/cmd/gaiacli/main.go#L150
x/upgrade/keeper.go
Outdated
blockTime := ctx.BlockHeader().Time | ||
blockHeight := ctx.BlockHeight() | ||
|
||
plan, havePlan := keeper.GetUpgradePlan(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally caching this result on the Keeper
to avoid having to decode from the store each block, but not sure that's best practice. Is there a standard way to cache frequently read values from the store in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple of modules that maintain their own caches, but there is no standard way this is done currently. Its a TODO
6a8efff
to
7fc01b1
Compare
interesting ideas, I'll dive into this some more - p.s. test_cover is failing |
noticed that @rigelrozanski. appears to be some floating point issue unrelated to any code this PR touches:
|
@aaronc that bug is due to other non-determinism within the lcd test (floats are not used). That bug has since been fixed on develop |
Because a cache is being kept on the keeper itself currently, it must get passed around as a pointer so that the cache remains consistent.
e7d1e85
to
cbcec49
Compare
Okay, rebased this against develop and now the tests pass. I am trying to do a few tests of this version of the module on a testnet before moving this out of draft state by the way. I had successfully tested a previous version of this module on an older testnet, but I've been running into some issues with my current testnet that are slowing me down... |
cool - yeah I think this feature requires a further conversations (over a call) with @jaekwon on the line. There are many design considerations for this type of a feature to be on mainnet and we obviously don't want to mess this up. |
… keeper Upon testing, halting the chain by methods other than a panic (such as os.Exit) actually causes issues between nodes because the other nodes instead of processing their own doShutdowners to trigger upgrades actually just hang trying to connect after enough nodes exit. The onUpgrader allows nodes to trigger some process before the panic, and willUpgrader allows nodes to prepare for upgrades before they actually need to be applied. Still needs more testing
In the last community dev call @jackzampolin requested that I do a little write-up on how Regen Ledger is using this upgrade module including the devops side of things. So first of all, our chain is using Now, in addition to that we are defining a default upgrade process for nodes using NixOS. For those of you who aren't familiar NixOS is a "purely functional Linux distribution". What this means is that given the same set of configuration files, the same exact system configuration/Linux build should be generated. This allows for an easy deterministic upgrade process and also easy rollbacks. All parts of the system configuration are identified by hashes. So if you have a git repo that points to your configuration files, you can pass NixOS a specific git commit and have it predictably build that configuration. So basically our One thing that this takes care of is the issues new nodes will run into that start to sync with the existing network. If the app's state machine changes substantially in an upgrade (which is to be expected), somebody that wants to spin up a new node and replay state starting from zero, can't just do that with the latest binary. That latest binary will do things that didn't happen with the initial binary and produce a different app state. So you need some sort of "meta-process" that applies upgrades on the new node just like they were applied in a node that had been running since genesis and was upgraded sequentially. I'm sure there are other ways of doing this, but this NixOS approach should theoretically be able to handle this without problems. Basically if you want to create a new node, you use the "genesis" NixOS config files, start your node and then this upgrade process will automatically apply all of the system config changes since genesis. |
Conclusions based on a phone conversation with @aaronc. This final product of this intended design would have:
|
@rigelrozanski So the way this currently works in this PR is a bit different. There's no need to export a genesis file and create a new genesis file. Am I missing something for why this is needed or are we just imagining different procedures? Maybe it would be helpful if I write out the sequence of operations I've been envisioning and which is currently implemented:
app.upgradeKeeper.SetUpgradeHandler("gaia-3", func(ctx sdk.Context, plan upgrade.Plan) {
// perform state migrations
})
gaiacli tx gov submit-upgrade-proposal --name gaia-3 --height 1234567 --info '{"commit":"abcdef12345678"}' --from abc
|
Interesting. I wonder how this fits into the model of param change proposals. The gist from that is, governance now has an internal router for proposals. Each proposal has a |
@aaronc the current thinking behind upgrades is to have all types held in state upgraded by dumping state to genesis json, making changes, then loading from the next height... however, if I understand you correctly rather than dumping json, the legacy upgrade handler which would need to be executed at initialization of the second binary could simply load the old state then overwrite all state elements which require a type change, aka do the conversion without a full dump - I like this... is this what's being proposed? |
@alexanderbez so if governance approved an upgrade, the Handler would call |
@rigelrozanski Exactly. You could of course use this halt behavior to dump state too. But if you have the ability to coordinate halts and also to coordinate a "migration callback", then I don't see any reason why you can't just continue with the same state and transaction history from |
Upgrading live chains has been previously discussed in #1079 and there is a WIP spec in #2116. Neither of these provide an actual implementation of how to coordinate a live chain upgrade on the software level. My understanding and experience with Tendermint chains is that without a software coordination mechanism, validators can easily get into inconsistent state because they all need to be stopped at precisely the same point in the state machine cycle.
This PR provides a module for performing live chain upgrades that has been developed for Regen Ledger and tested against our testnets. It may or may not be what Cosmos SDK wants, but just sharing it in case it is...
This module attempts to take a minimalist approach to coordinating a live chain upgrade and can be integrated with any governance mechanism. Here are a few of its features:
BeginBlock
when an upgrade is required and doesn't allow it to restart until new software with the expected upgrade is startedThis PR doesn't currently include any integration with the Cosmos gov module, but that could be easily done if this upgrade method works for Cosmos hub.
docs/
) - includes through go package docssdkch add [section] [stanza] [message]
Files changed
in the github PR explorerFor Admin Use: