Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add prehash_compare_key to allow proving nonexistence in sparse trees #136

Merged
merged 14 commits into from
Apr 10, 2023

Conversation

plaidfinch
Copy link
Contributor

Motivation

There are several important merkle trees that hash keys and sparsely store values at positions indicated by the value of the hash, such as the Cosmos Sparse Merkle Tree and Libra/Diem/Aptos/Penumbra's Jellyfish Merkle Tree. We would like this type of tree to be compatible with ICS23.

For existence proofs, this is already the case; however, currently, when verifying a ICS23 non-existence proof, keys are compared based on the preimage of the hash function used to prehash them, even when prehash_key is set in the ProofSpec. This means that when a ProofSpec uses prehash_key, nonexistence proofs cannot be verified even if a client correctly generates them as per the "spirit" of the ICS23 nonexistence proof format. We believe that this is a bug, but this change is framed backwards-compatibly, as noted below, so that it can be applied universally without friction.

See here for other discussion of the impact of this issue:

  • Support removal of key/value pairs penumbra-zone/jmt#24 (comment) (Note: We think that ibc-go can't make any assumption about the keys being hashed or unhashed, since that's not part of its domain; that's the domain of the proof spec. However, this description of the issue is pointing at the same problem that this PR resolves.)
  • Key comparison in proof generated from SMT #83 is the corresponding issue for the SMT (Note: In the current SMT implementation, keys are prehashed entirely externally to the proof and its spec, and the relationship between key and hashed key is proven externally to the proof spec. We believe that this ends up adding implementation burden for counterparty chains, who have to check the additional constraint outside of ICS23 verification that the key being proven matches the hash claimed. Instead of this, the JMT proof spec could be adapted to the SMT, allowing the SMT to internalize the prehashing and avoid the difficulty of managing prehashing externally to the proof spec. @cwgoes: does this seem like something you would want to do?)

Summary of changes

This PR introduces one single boolean new parameter to the top-level ProofSpec: prehash_compare_key. When set to true, this flag causes keys to be consistently compared lexicographically according to their hashes within nonexistence proof verification, using the same hash function as specified by the already-extant prehash_key field.

This is intended as an alternative to #88

We believe this change is the minimum necessary to unblock nonexistence proofs on key-prehashed structures. #88 also attempts to solve this problem, namely that for nonexistence proofs only, any tree that uses prehashing cannot prove nonexistence because the nonexistence proof verifier compares the keys lexically by their preimage, ignoring the prehash_key field of the specification.

However, as currently implemented, #88 does not accomplish this goal. There are 3 issues with #88 as a solution to this problem, which this PR addresses:

  1. It does not actually change the verification procedure for nonexistence proofs to lexically compare all keys by their hash; rather, it lexically compares only the hash of the input key to unhashed neighbor keys. This will not work correctly for any general cryptographic hash function, because comparing a cryptographic hash to an element of its preimage is, effectively, random.
  2. It doesn't merely supply a flag opting into prehashed key comparison; it allows you to specify an entirely different hash function to use for the comparison. We believe this is not representative of any known use case: if keys are prehashed using hash function H, then a different hash function H' can't be used to compute a meaningful comparison on keys, so this is more complex of a specification than necessary.
  3. It also introduces a prehash_compared_value field, which is not necessary to fix this specific issue, and like its prehash_compared_key field, this too is a specification of an arbitrary hash function, which for the same reasons above, we believe is overly general.

Backwards-compatibility

This is a backwards-compatible change, as it requires opt-in via setting the prehash_compare_key flag to true in the ProofSpec. All existing ProofSpecs will continue to behave identically.

Contents

This PR includes implementation for Rust, Go, and Typescript. We have tested it against our own implementation of the Jellyfish Merkle Tree and we believe it should be effective at addressing this issue across the ecosystem.

Thanks

We would like to thank everyone who participated in discussion and development of #88 and other work towards solutions to this issue. We feel very confident that this is the right way forward, but we want to ensure that our contribution does not make anyone feel like their work is unappreciated; to the contrary, the discussion and work leading up to this PR, by everyone, has been necessary to clarify our understanding of the issue. Thanks everyone for your help!

prehash_compare_key indicates whether to compare the keys lexicographically according to their _hashed_ values (implied by the hash function given by prehash_key). This is required for nonexistence proofs in proof specs that use prehashing.
@plaidfinch plaidfinch changed the title Add prehash_compare_key to allow proving nonexistence in sparse trees feat: Add prehash_compare_key to allow proving nonexistence in sparse trees Feb 22, 2023
@AdityaSripal
Copy link
Member

Thank you for including the justifications to include this change over #88, all of them make sense to me.

Will review today!

Copy link
Member

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this direction!! But it looks like this PR is incomplete?

My understanding is that the key in ExistenceProof.Key will still be the preimage, hence why you also hash the rightKey and leftKey before comparison.

However, you are not hashing the passed-in key for existence proofs before comparison?

There is a check on ics23.go that is changed in #88 that seems like it would also need to be changed here.

473d9a5#diff-39a55415fc38a90b85c05989f293fda3b7ee126010cfe63838fd9b8441e47ed1R39

473d9a5#diff-39a55415fc38a90b85c05989f293fda3b7ee126010cfe63838fd9b8441e47ed1R59

I've also suggested a different naming for the field that at least me and Colin found more intuitive

// prehash_compare_key is a flag that indicates whether to use the
// prehash_key specified by LeafOp to compare lexical ordering of keys for
// non-existence proofs.
bool prehash_compare_key = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a clearer name might be helpful here similar to our review of #88.

So effectively the database is providing an interface to the application: store.Set(key, value)

and underneath the hood it is merklizing this pair by storing the hash of the application-provided key as the key in the merkle tree.

Thus, we liked the nomenclature: appKey and treeKey.

The appKey is the key known to the application, the treeKey is the key stored in the tree.

In SMT, the treeKey is the hash of the appKey, while in iAVL they are the same.

So I think prehash_app_key would be clearer here as a name. And we can use the docs here to explain there may be a difference between appKey and treeKey for some trees

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a bikeshed, but if there's going to be a distinction between app_key and tree_key, wouldn't it be even clearer to call the field compare_tree_key? Then compare_tree_key = false means it compares the app key, and compare_tree_key = true means it compares the tree key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now that I understand more of this field.

Perhaps we can call it: prehash_key_before_comparison

The way I read the current field name at the moment, I expect it to be a HashOp type not a boolean.

cc: @colin-axner

@@ -204,6 +204,14 @@ func (p *ExistenceProof) CheckAgainstSpec(spec *ProofSpec) error {
return nil
}

func keyForComparison(spec *ProofSpec, key []byte) []byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also will have to use this function on getExistProofForKey and getNonexistProofForKey correct?

473d9a5#diff-39a55415fc38a90b85c05989f293fda3b7ee126010cfe63838fd9b8441e47ed1R39

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but only on getNonexistProof, because the lexical comparison is only relevant for non-existence proofs

@plaidfinch
Copy link
Contributor Author

Thanks for the careful review @AdityaSripal!

My understanding is that the key in ExistenceProof.Key will still be the preimage, hence why you also hash the rightKey and leftKey before comparison.

However, you are not hashing the passed-in key for existence proofs before comparison?

In existence proofs, the only comparison is for equality, not for lexical ordering, which means that comparing either the preimage of the hashed key, or the hash, will work equally well (because we can assume that H(x) == H(y) implies x == y up to computational intractability). To ensure that this change is minimal, we do not alter anything about existence proof verification, because it is not necessary.

By contrast, nonexistence proofs require lexical ordering comparison on keys (or their hashes), which this PR implements, for nonexistence proofs only.

There is a check on ics23.go that is changed in #88 that seems like it would also need to be changed here.

Only one of the two links is relevant to the changes we propose here: we do not change anything about existence proofs, so the change noted from #88 to existence proofs is not relevant:

473d9a5#diff-39a55415fc38a90b85c05989f293fda3b7ee126010cfe63838fd9b8441e47ed1R39

In regards to the second piece, the below line of code adds an extra layer of hashing to the key at the top level, which is not necessary or sufficient to verify nonexistence proofs:

473d9a5#diff-39a55415fc38a90b85c05989f293fda3b7ee126010cfe63838fd9b8441e47ed1R59

Instead, our implementation exhibits the same prehashing behavior in Go as the original Rust implementation: it changes the lexical ordering comparisons isLeft and isRight to operate according to the hashing described by the prehash_key field, if and only if the opt-in flag in the top-level proof spec is set. Good catch noticing that we missed this part of the Go implementation: this is now fixed, per the diff linked above in this paragraph.

I've also suggested a different naming for the field that at least me and Colin found more intuitive

We're fine with whatever naming works well for you, provided it is well-documented. We are partial to compare_tree_key, as suggested by @hdevalence, but de gustibus non est disputandum.

A high-level note on where some of this confusion may originate: there are two notions of "prehashed key comparison", as below:

  1. As in Add new parameters to compare the given key and value for SMT proof #88, you assume that the top-level key in the proof has an extra layer of hashing already applied to it externally to the proof and proof spec. This is not actually useful for making nonexistence proofs work correctly for sparse merkle trees, for the reasons noted in the original issue.
  2. As in this PR, you don't make any exogenous assumptions about the hashing that has been already applied to the top-level key outside of the proof and proof spec; rather, you incorporate the already-specified prehashing function into the verification of nonexistence proofs, by causing lexical comparison of keys within verifying those proofs to operate on the hash of keys rather than the keys themselves. The fact that this wasn't already the case for nonexistence proofs is, we posit, a bug: this change can be seen not as a true piece of new functionality for ICS23, but a rectification of a gap between the intent of the system design and its implementation. For example, existence proofs correctly respect the prehash_key field; for this to be inconsistently applied in different places throughout verification is not useful in any context.

It's worth keeping in mind that you can still add something like #88 on top of this change, but it's not necessary to do so in order to unblock the ability to use ICS23 nonexistence proofs with sparse merkle trees, just as you already can correctly use ICS23 existence proofs with sparse merkle trees. We suspect that #88 is not as useful once this PR is merged, because we think (but are not certain) that its originating motivation came from an attempt to work around the very bug that this PR resolves.

// prehash_compare_key is a flag that indicates whether to use the
// prehash_key specified by LeafOp to compare lexical ordering of keys for
// non-existence proofs.
bool prehash_compare_key = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now that I understand more of this field.

Perhaps we can call it: prehash_key_before_comparison

The way I read the current field name at the moment, I expect it to be a HashOp type not a boolean.

cc: @colin-axner

if !spec.PrehashCompareKey {
return key
}
hash, _ := doHashOrNoop(spec.LeafSpec.PrehashKey, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is using the PrehashKey defined in the leafspec and the key in the existenceProof is supposed to now be unhashed for SMT proofs, I think the SMT spec needs to be changed to have PrehashKey=SHA_256 instead of NO_HASH

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so the SMT proof spec in this repo need to be changed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code that generates the proofs will also need to be updated following the spec change

@ghost
Copy link

ghost commented Feb 28, 2023

Backwards compatibility nonwithstanding, it seems like applications (if they can) should always set prehash_key_before_comparison to true (even in the absence of actual prehashing) because the false behavior is (as described) inconsistent/not intuitive and/or "buggy"; if so, perhaps this can be documented in ProofSpec.

@cwgoes
Copy link

cwgoes commented Feb 28, 2023

@plaidfinch Thanks for this! I can tell you guys have thought this through. I think that this solution should work for us as well.

@AdityaSripal
Copy link
Member

Backwards compatibility nonwithstanding, it seems like applications (if they can) should always set prehash_key_before_comparison to true (even in the absence of actual prehashing) because the false behavior is (as described) inconsistent/not intuitive and/or "buggy"; if so, perhaps this can be documented in ProofSpec.

I don't see how this is true. If the key is being hashed in order to create the LeafHash, but the key is ordered lexicographically on the key itself (IAVL, Trie, etc), then prehash_key_before_comparison should be false

Copy link
Member

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack pending test fixes.

Great work to everyone involved!!

@avahowell
Copy link
Collaborator

avahowell commented Feb 28, 2023

Backwards compatibility nonwithstanding, it seems like applications (if they can) should always set prehash_key_before_comparison to true (even in the absence of actual prehashing) because the false behavior is (as described) inconsistent/not intuitive and/or "buggy"; if so, perhaps this can be documented in ProofSpec.

Yes, the only reason to have a flag in this case is to maintain strict backwards-compatibility in my opinion. If you have prehash_key_before_comparison: true on a tree with no prehashing, it's the same thing as having prehash_key_before_comparison: false, since this change just inherits the hashing specified by prehash_key.

Where the change would be breaking without the flag is in the case where you use prehash_key, but don't want to compare keys according to the hashing algorithm specified by prehash_key. As we mentioned in the OP, this condition seems like a bug to us. But we kept the flag to maintain strict backwards compatibility

@avahowell
Copy link
Collaborator

I've updated this code to include the changes for the smt_spec required for the SMT. The next step is to update the SMT's proof-generating code itself to be compatible with this new spec.

@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Patch coverage: 56.56% and project coverage change: +11.19 🎉

Comparison is base (f4deb05) 39.35% compared to head (64a5c0e) 50.54%.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #136       +/-   ##
===========================================
+ Coverage   39.35%   50.54%   +11.19%     
===========================================
  Files          16       23        +7     
  Lines        6286     8034     +1748     
  Branches       85       86        +1     
===========================================
+ Hits         2474     4061     +1587     
- Misses       3456     3616      +160     
- Partials      356      357        +1     
Flag Coverage Δ
go 38.26% <27.45%> (+0.15%) ⬆️
rust 92.15% <65.43%> (?)
typescript 42.03% <68.18%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/proofs.pb.go 31.65% <0.00%> (+0.13%) ⬆️
rust/src/cosmos.ics23.v1.rs 17.18% <0.00%> (ø)
js/src/generated/codecimpl.js 25.29% <58.00%> (-0.02%) ⬇️
go/proof.go 59.34% <75.00%> (+0.91%) ⬆️
rust/src/verify.rs 94.88% <86.66%> (ø)
rust/src/api.rs 97.11% <95.12%> (ø)
go/ics23.go 88.11% <100.00%> (ø)
js/src/ics23.ts 71.42% <100.00%> (+0.59%) ⬆️
js/src/proofs.ts 78.50% <100.00%> (+1.05%) ⬆️
js/src/testvectors.spec.ts 99.04% <100.00%> (+0.05%) ⬆️
... and 1 more

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK for the concept and the go code changes. I like this solution a lot! Excellent work y'all :)

go/proof.go Outdated Show resolved Hide resolved
go/ics23.go Show resolved Hide resolved
@plaidfinch
Copy link
Contributor Author

plaidfinch commented Mar 28, 2023

@AdityaSripal We've fixed the SMT proof spec in this PR, and we are happy to fix the SMT test vectors, but we are not sure how these test vectors are being generated. Could you shed some light on that so we can finish fixing the tests and push this PR over the finish line?

As of now, we think we've tracked down how they're being generated to here. Can you confirm that this was the method used?

@avahowell
Copy link
Collaborator

avahowell commented Mar 28, 2023

i updated the smt test vectors to be correctly generated per this change, and they should verify now. i generated the test vectors using a modified version of the smt in the store/v2 branch of the cosmos sdk, which I can PR if desired. Here's the branch, the change required for the smt on the generation side is very small since it already stores a PreimageMap:

avahowell/cosmos-sdk@a3a049b

I believe that with the most recent changes, all the review comments have been addressed

@AdityaSripal
Copy link
Member

Hi yes, please make a PR to the SDK to update the proofgen code there

Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wahoo! Fantastic work to everyone involved! 🎉

Nice job updating the test vectors! I wasn't sure either how the test data was generated

(approving for go changes)

@colin-axner
Copy link
Contributor

Would it be possible to eventually have test data vectors from Penumbra's JMT? It appears the referenced code generation has been removed from the SDK and I'm not sure the SMT implementation is being maintained

@plaidfinch
Copy link
Contributor Author

Would it be possible to eventually have test data vectors from Penumbra's JMT? It appears the referenced code generation has been removed from the SDK and I'm not sure the SMT implementation is being maintained.

We could make some test vectors, yeah. If we match the ad-hoc JSON serialization format used by the SMT generation code, then I think it'd be as simple as checking proofs against them using the JMT spec instead of the SMT spec.

Would you want these test vectors included in this PR before merging it, or should we make a separate PR with them? It'd require a bit of implementation work to add the code to the JMT to make it spit out the test vectors, so my personal preference would be to merge this PR and make a separate PR later to swap in the JMT test vectors for the SMT ones.

Copy link
Member

@romac romac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust version looks good! I just left one question regarding the removal of the derived Eq instance and a small nit, feel free to ignore.

rust/codegen/src/main.rs Show resolved Hide resolved
rust/src/api.rs Outdated Show resolved Hide resolved
@colin-axner
Copy link
Contributor

Would you want these test vectors included in this PR before merging it, or should we make a separate PR with them? It'd require a bit of implementation work to add the code to the JMT to make it spit out the test vectors, so my personal preference would be to merge this PR and make a separate PR later to swap in the JMT test vectors for the SMT ones.

Let's do a separate pr 👍 I don't see a rush in adding the test vectors, just want to make sure that long term ics23 has an up to date testing framework/tests 😄

@plaidfinch
Copy link
Contributor Author

The failing check appears to be a spurious service error for the Go code coverage tool. Could someone re-run it please? If I understand correctly, this branch is ready to be merged, and a new release cut, at this point. Anything else we can help out with?

@romac
Copy link
Member

romac commented Apr 8, 2023

All good on the Rust side! I can do a release of the Rust crate on Tuesday if nobody beats me to it.

@plaidfinch
Copy link
Contributor Author

All good on the Rust side! I can do a release of the Rust crate on Tuesday if nobody beats me to it.

Fantastic! Thanks all for your help bringing this to the finish line! 🎉

@hdevalence
Copy link
Collaborator

Hey, just checking in on this -- are we still good to merge this PR and cut a release?

@crodriguezvega crodriguezvega merged commit cea74ba into cosmos:master Apr 10, 2023
@Olshansk
Copy link

Olshansk commented Apr 13, 2023

Support for SMTs is going to go a long way. Thanks to everyone involved here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants