-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vstorage): Disambiguate empty strings and deletions #6971
Conversation
db9df6e
to
0887c2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions around the Golang code. JS looks good to me!
@gibson042 feel free to abstain if that's good enough for you, otherwise I won't approve until you've weighed in.
023f68b
to
75c211d
Compare
After a discussion with @warner and @gibson042 this morning, I realized that this change relied too much on JS producing entry tuples of the right length. For example in JS I have thus changed the Unmarshal path to explicitly check the type of the value, and added a couple constructors for I am still very uncertain whether any of this if colloquial golang, and I'm looking for experts opinions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the overall purpose of differentiating empty-string values and nil values? I'm not against it, but I'm not clear on what such a change supports—what clients care about the difference. Is this for something like cascading deletion?
case "set": | ||
keeper.SetStorageAndNotify(cctx.Context, msg.Path, msg.Value) | ||
for _, arg := range msg.Args { | ||
var entry types.StorageEntry | ||
entry, err = types.UnmarshalStorageEntry(arg) | ||
if err != nil { | ||
return | ||
} | ||
keeper.SetStorageAndNotify(cctx.Context, entry) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading correctly, "set" and all the other write methods now affect an arbitrary number of entries. Would it instead make sense to separate single-entry vs. multi-entry methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a reason to have different methods for single vs multiple entry updates if the length of the args indicates that intent.
golang/cosmos/x/vstorage/vstorage.go
Outdated
return "null", nil | ||
} | ||
//fmt.Printf("Keeper.GetStorage gave us %bz\n", value) | ||
bz, err := json.Marshal(value) | ||
//fmt.Printf("Keeper.GetStorage gave us %bz\n", entry.Value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line has been carried forward long enough.
//fmt.Printf("Keeper.GetStorage gave us %bz\n", entry.Value()) |
DataPrefixBytes: string(keeper.GetDataPrefix()), | ||
NoDataValue: string(keeper.GetNoDataValue()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, it stinks to leak these implementation details. What would be the consequence of not exporting NoDataValue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoding details were unfortunately already leaking to anyone that reads vstorage data directly from the chain, both in the path encoding, and in the data prefixing. It was a very unfortunate realization I couldn't get away with an internal only change. I don't fully grok how external clients like the casting package fetch data, and maybe we could have an adapter on the RPC server side to avoid leaking these details to clients? However I'd consider this out of scope for this PR. Ping @michaelfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way an external client can verify a Merkle proof for some on-chain data from an RPC node (trusting the validators but not the integrity of the RPC node) is by querying the exact IAVL key/value representation in bytes, and then knowing how they correspond to the application layer. This verification process necessarily crosses abstractions in an uncomfortable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're proposing to remove getStoreKey
and nothing in the SDK uses it, that's fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verification process necessarily crosses abstractions in an uncomfortable way.
Makes sense
If you're proposing to remove
getStoreKey
and nothing in the SDK uses it, that's fine with me.
I remember seeing some potential uses. I didn't feel comfortable enough changing that wiring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the overall purpose of differentiating empty-string values and nil values? I'm not against it, but I'm not clear on what such a change supports—what clients care about the difference. Is this for something like cascading deletion?
The Swingstore replication into IAVL / the MultiStore will be done through a new vstorage path, and it requires being able to store empty strings. An empty string is a valid string value that is simply different than null
in JS :)
Path string `json:"key"` // TODO: rename JSON to "path" | ||
Value string `json:"value"` | ||
Method string `json:"method"` | ||
Args []json.RawMessage `json:"args"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not backwards compatible, but the JS layer is updated in the same commit. There are no external clients.
case "set": | ||
keeper.SetStorageAndNotify(cctx.Context, msg.Path, msg.Value) | ||
for _, arg := range msg.Args { | ||
var entry types.StorageEntry | ||
entry, err = types.UnmarshalStorageEntry(arg) | ||
if err != nil { | ||
return | ||
} | ||
keeper.SetStorageAndNotify(cctx.Context, entry) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a reason to have different methods for single vs multiple entry updates if the length of the args indicates that intent.
DataPrefixBytes: string(keeper.GetDataPrefix()), | ||
NoDataValue: string(keeper.GetNoDataValue()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoding details were unfortunately already leaking to anyone that reads vstorage data directly from the chain, both in the path encoding, and in the data prefixing. It was a very unfortunate realization I couldn't get away with an internal only change. I don't fully grok how external clients like the casting package fetch data, and maybe we could have an adapter on the RPC server side to avoid leaking these details to clients? However I'd consider this out of scope for this PR. Ping @michaelfig
This PR has grown to a LOT of fixups to address review feedback. Here is what the rebased with squashing looks like: https://github.com/Agoric/agoric-sdk/compare/mhofman/5542-refactor-vstorage-rebase Happy to push that (ultimately identical) version to this branch if all reviewers prefer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after resolving remaining comments.
Ugh I forgot to push the changes addressing feedback yesterday. PTAL @gibson042 |
cd1cf14
to
14ed493
Compare
Clean squash/fixup of all the review feedback: cd1cf14..14ed493 |
Use method dependent args in vstorage messages to allow expressing updates (set and append) as entries (list of [key, value] tuples). Enables explicit deletion through setting an entry without value
14ed493
to
3f32351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential future cleanuup.
var generic [2]interface{} | ||
err = json.Unmarshal(msg, &generic) | ||
|
||
if err != nil { | ||
return | ||
} | ||
|
||
path, ok := generic[0].(string) | ||
if !ok { | ||
err = fmt.Errorf("invalid storage entry path: %q", generic[0]) | ||
return | ||
} | ||
|
||
switch generic[1].(type) { | ||
case string: | ||
entry = NewStorageEntry(path, generic[1].(string)) | ||
case nil: | ||
entry = NewStorageEntryWithNoData(path) | ||
default: | ||
err = fmt.Errorf("invalid storage entry value: %q", generic[1]) | ||
} | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do this instead:
var generic [2]*string{}
err = json.Unmarshal(msg, &generic)
if err != nil {
return
}
if generic[0] == nil {
err = fmt.Errorf("storage entry path cannot be null")
return
}
if generic[1] == nil {
entry = NewStorageEntry(generic[0], *generic[1])
} else {
entry = NewStorageEntryWithNoData(generic[9])
}
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh json.Unmarshal
understands string pointers? Yeah that'd have been simpler.
refs: #5542
As usual, best reviewed commit-by-commit
Description
As preliminary changes for #5542, we need to support empty strings as actual values in vstorage, potentially for both leaf and ancestors. golang non-nullable values leaked through the design of vstorage, and setting a value of
''
previously meant deletion.To support empty strings as actual values, we need to:
Security Considerations
None
Scaling Considerations
An alternative would have been to encode or hilbert the state-sync values to avoid empty strings, but that would have been costly as state sync export is likely to become the biggest data generator, and would have leaked further a golang design choice.
Furthermore changing the interface allows batch setting multiple entries to avoid repeated call across the JS -> golang boundary. (Current vstorage users do not yet take advantage of this change, the mailbox probably could with some further refactoring)
Documentation Considerations
vstorage clients need to be updated if they iterate over keys, as they may encounter a value with an unexpected prefix. The casting library is updated as part of this PR.
A migration (or export/import) will be needed at the next upgrade containing this change. The migration was implemented as part of this change, and a note left in the current upgrade handler highlighting the need to call migration at the next upgrade.
Testing Considerations
Added unit tests, updated existing ones.