-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WASM files to state sync #816
Conversation
This helps state sync to track the WASM files too, as they're not in IAVL
@@ -682,6 +682,11 @@ func NewWasmApp( | |||
app.scopedIBCKeeper = scopedIBCKeeper | |||
app.scopedTransferKeeper = scopedTransferKeeper | |||
app.scopedWasmKeeper = scopedWasmKeeper | |||
|
|||
app.SnapshotManager().RegisterExtensions( | |||
wasm.NewWasmSnapshotter(filepath.Join(wasmDir, "wasm", "state", "wasm")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably a nicer way of doing this, but honestly this directory structure is buried so deep inside https://github.com/CosmWasm/cosmwasm/blob/be4d94a8d73ceab359226b8ea0a5b4be032a07e6/packages/vm/src/cache.rs#L105 so maybe it's better to leave it that way for now. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that top level wasm comes from wasmd from long ago.
I have thought of changing it, but it would be a very painful breaking change to all running nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, notice wasmDir := filepath.Join(homePath, "wasm")
on line 448 or so. That already contains the first wasm.
It should be filepath.Join(wasmDir, "state", "wasm")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file structure is private and does not provide any stability guarantees. There is not even a guarantee Wasm is stored as file on disk in the next version of cosmwasm-vm. The file system should not be accessed from wasmd. Instead there is GetCode
and Create
in wasmvm to load and store a Wasm blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another "wasm" along the way
Line 99 in 3fb7371
wasmer, err := wasmvm.NewVM(filepath.Join(homeDir, "wasm"), supportedFeatures, contractMemoryLimit, wasmConfig.ContractDebugMode, wasmConfig.MemoryCacheSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file system should not be accessed from wasmd. Instead there is GetCode and Create in wasmvm to load and store a Wasm blob.
This is a good point, we can iterate over the values in the iavl tree rather than filenames. That is cleaner. And it will also "force" the node restoring from snapshotting to do the precompiling of the wasm files upon syncing, not later, providing more stable execution times later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds legit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
It looks solid in general. Added a few comments, also an idea how to make a simple unit test just copying arbitrary files based on names between two dirs. That would be a nice sanity check before manual testing.
What do you think of a custom SnapshotItem payload format (not just appending bytes)?
@@ -682,6 +682,11 @@ func NewWasmApp( | |||
app.scopedIBCKeeper = scopedIBCKeeper | |||
app.scopedTransferKeeper = scopedTransferKeeper | |||
app.scopedWasmKeeper = scopedWasmKeeper | |||
|
|||
app.SnapshotManager().RegisterExtensions( | |||
wasm.NewWasmSnapshotter(filepath.Join(wasmDir, "wasm", "state", "wasm")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, notice wasmDir := filepath.Join(homePath, "wasm")
on line 448 or so. That already contains the first wasm.
It should be filepath.Join(wasmDir, "state", "wasm")
return err | ||
} | ||
|
||
// In case snapshotting needs to be deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah although I'm still not sure whether that's needed.
return err | ||
} | ||
|
||
// snapshotItem is 64 bytes of the file name, then the actual WASM bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works. But maybe we want a real (eg. more future-proof) format.
Eg. like a Protobuf object with two fields and a type. Especially if we ever want to sync anything else
return nil | ||
} | ||
|
||
func (ws *WasmSnapshotter) Restore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do a unit test with some simple data, by feeding the output of snapshot into restore (or another instance with different directory) and ensuring the proper files are copied verbatim (and misnamed files ignored)?
|
||
func (ws *WasmSnapshotter) Restore( | ||
height uint64, format uint32, protoReader protoio.Reader, | ||
) (snapshot.SnapshotItem, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to read the entire data and return nothing ever. Why does it return (snapshot.SnapshotItem, error)
rather than error
? (eg. Is there something I/we are missing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not your design, but wonder why it is there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I took this from the sdk tests. Not sure if the return value goes anywhere here, but I can confirm that this works without any issue (tested manually on Secret). Also might be irrelevant for this case as we're dealing with files and not IAVL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this logic is in a loop where it seems like its the extension responsibility to handle the next SnapshotItem
to the next one, which doesn't make sense because a plugin can break another one without knowing.
My guess is that is possible that in the protobuf stream there might be more items and if the format is not the one that wasmd
can understand then it should return it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that is possible that in the protobuf stream there might be more items and if the format is not the one that wasmd can understand then it should return it.
Yes, very good idea. It seems like we should not error on some unknown extension, but just return it, as there may in the future be multiple extensions than need to cooperate somehow. Not sure how this all works together.
Maybe @yihuang could clarify how we should handle this properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, very good idea. It seems like we should not error on some unknown extension, but just return it, as there may in the future be multiple extensions than need to cooperate somehow. Not sure how this all works together.
exactly, we should return the item if it's unknown to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which doesn't make sense because a plugin can break another one without knowing.
yeah, that's possible unfortunately, maybe we can create some helpers at module side to retain the last item automatically, without changing the api in cosmos-sdk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I got it... we either end on EOF, or when we hit something where GetExtensionPayload()
returns nil (another type we don't expect)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which doesn't make sense because a plugin can break another one without knowing.
yeah, that's possible unfortunately, maybe we can create some helpers at module side to retain the last item automatically, without changing the api in cosmos-sdk.
// ExtensionPayloadReader only reads the payloads of current extension.
type ExtensionPayloadReader struct {
protoReader protoio.Reader
LastItem types.SnapshotItem
}
func NewExtensionPayloadReader(protoReader protoio.Reader) *ExtensionPayloadReader {
return &ExtensionPayloadReader{
protoReader: protoReader,
LastItem: types.SnapshotItem{},
}
}
// ReadPayload reads a payload item from the stream.
func (epr *ExtensionPayloadReader) ReadPayload() ([]byte, error) {
epr.LastItem = types.SnapshotItem{}
if err := epr.protoReader.ReadMsg(&epr.LastItem); err != nil {
return nil, err
}
payload := epr.LastItem.GetExtensionPayload()
if payload == nil {
// end of payload stream of current extension
return nil, io.EOF
}
return payload.Payload, nil
}
I think of sth like this could help, the manager pass the ExtensionPayloadReader
extension's Restore
method, but it's api breaking, so maybe change this in next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Assaf
@@ -682,6 +682,11 @@ func NewWasmApp( | |||
app.scopedIBCKeeper = scopedIBCKeeper | |||
app.scopedTransferKeeper = scopedTransferKeeper | |||
app.scopedWasmKeeper = scopedWasmKeeper | |||
|
|||
app.SnapshotManager().RegisterExtensions( | |||
wasm.NewWasmSnapshotter(filepath.Join(wasmDir, "wasm", "state", "wasm")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file structure is private and does not provide any stability guarantees. There is not even a guarantee Wasm is stored as file on disk in the next version of cosmwasm-vm. The file system should not be accessed from wasmd. Instead there is GetCode
and Create
in wasmvm to load and store a Wasm blob.
var wasmFileNameRegex = regexp.MustCompile(`^[a-f0-9]{64}$`) | ||
|
||
func (ws *WasmSnapshotter) Snapshot(height uint64, protoWriter protoio.Writer) error { | ||
wasmFiles, err := ioutil.ReadDir(ws.wasmDirectory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasmFiles
, i.e. the list of checksums needs to come from the main Tendermint database. Otherwise a node can inject all kind of data here bypassing consensus.
return []uint32{1} | ||
} | ||
|
||
var wasmFileNameRegex = regexp.MustCompile(`^[a-f0-9]{64}$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those file names can change any time. I'm actually thinking about adding a .wasm file extension for node operator friendlyness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, but for a given wasmd version, there will be only one storage format (and realistically, changing this will be a major state-breaking change needing an internal wasmvm migration, so unlikely to happen, or at least very infrequently).
If it does change, we make a new wasmd (eg. 2.0) with snapshot format 2. Upon receiving that the data is parsed in a different way (eg the new internal format). The snapshotter itself (they one reading), will just need to implement the new format when we break this internal storage format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I do agree with using the GetCode and Create calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's better to do it through the proper API, and let future migrations less to worry about WRT snapshotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also "solves" the wasm/wasm/state/wasm
issue here.
@webmaster128 is right, it's better to go through the existing API instead of accessing the filesystem directly. |
Thank you. Let me know if you need additional APIs or more advances features than the two mentioned above. But those are the two basic operations. |
snapshotItem := append([]byte(wasmFile.Name()), wasmBytes...) | ||
|
||
snapshot.WriteExtensionItem(protoWriter, snapshotItem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered compressing the Wasm before transfer? Or is this taken care of at a different level of the state sync stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't consider it, not sure if it's a problem. The wasm files are usually far smaller than the rest of the blockchain's state.
It can be a nice addition though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not sure if it's taken care of by the sdk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, no priority or blocker for sure. I was just curious since Wasm blobs can be compressed nicely. But we can always make that an optional extension later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting observation nonetheless. 💪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasmd already gzips wasm blobs when you submit a tx or through governance.
Line 88 in d5ef3ba
if wasmUtils.IsWasm(wasm) { |
wasmd/x/wasm/client/utils/utils.go
Line 24 in d5ef3ba
func GzipIt(input []byte) ([]byte, error) { |
and then uncompressed when the keeper.Create
is called
Line 18 in d5ef3ba
func uncompress(src []byte, limit uint64) ([]byte, error) { |
maybe some of this can be reused it really reduces the size but it will depend on each chain on how many contracts they have. Although as previously mentioned smaller than the actual chain data.
I was working on this feature, but tests are pending... I have updated with master branch, maybe it can be used as a reference. Snapshot:
Restore:
https://github.com/disperze/wasmd/blob/state-sync-sdk-45.2/x/wasm/keeper/snapshot.go |
@giansalex Thank you! Could you add your work as a second PR to provide better visibility and being able to comment inline? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and all the good comments here. Nice collaboration!
I think @assafmo solution was a very pragmatic approach and automates what people already do with sharing zipped wasm folders.
I completely agree with @webmaster128 suggestions for decoupling and a more resilient solution that uses existing apis. Upstream implementations may change and can break this easily otherwise.
Proper tests can be built with our mock
@giansalex Please do a draft PR, so that we can comment already. 🙏
I peeked at the code and it seems to address all the raised points. Good work
I got a anon review via DM: When restoring all the wasm files, we must ensure to tell the wasmvm to pin the proper codes. If we use the Create method, this will just be one more check on CodeInfo and possibly another call to wasmvm |
InitializePinnedCodes can be used to pin all codes again. |
I'm not sure it that's necessary. Codes info are already stored in the IAVL. |
InitializePinnedCodes is basically the answer to that DM request. It's really about lifting contracts into cache after a node restart or state sync. |
copy-pasting from Discord:
|
@assafmo Very helpful PR showing how to implement such an extension. There have been a number of wonderful comments explaining how to improve the process, and add more features (pinning, compression, etc). I would like to get a version of this in incorporating all the input. Do you want to update this PR with all feedback? Or shall I write one? I'm happy either day. I would try to write it or review it tomorrow (Thursday) to get this in soon, and let people test manually on some tag. |
I'm not sure if I can get to it that soon, so go for it, no ego here. ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building on this, I have made a new PR that attempts to incorporate all the discussion.
Please check out #823
It should be complete now, "just" missing tests, but good for a first review.
|
||
func (ws *WasmSnapshotter) Restore( | ||
height uint64, format uint32, protoReader protoio.Reader, | ||
) (snapshot.SnapshotItem, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I got it... we either end on EOF, or when we hit something where GetExtensionPayload()
returns nil (another type we don't expect)
@giansalex sorry I didn't see your code earlier. I was looking at open PRs... even draft. I will look now, definitely would have saved me time yesterday and some things I can learn from it still |
Closing in favor of #823. |
This helps state sync to track the WASM files too, as they're not in the IAVL.