-
Notifications
You must be signed in to change notification settings - Fork 15
Expose node hashes #105
base: master
Are you sure you want to change the base?
Expose node hashes #105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @bzz and @dennwc)
src/libuast.hpp, line 63 at r1 (raw file):
// NodeHash is a hash of a node subtree. struct NodeHash { uint8_t data[UAST_HASH_SIZE];
Maybe this is already defined elsewhere, but what is the format of these data? A binary sha1 or something?
src/uast.h, line 93 at r1 (raw file):
typedef enum { UAST_BINARY = 0, UAST_YAML = 1 } UastFormat; typedef enum {
Can you please add some comments here, to define what these mean?
Is this meant to be a bitwise-or of flags (e.g., all minus whatever is set?), or will these be mutually exclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @dennwc)
tests/nodes_test.h, line 656 at r1 (raw file):
0x37, 0xcc, 0xfc, 0x42, 0xf, 0x52, 0x65, 0x48, 0x1d, 0x59, 0x18, 0xce, 0x1, 0xad, 0xda, 0xa2, 0x82, 0x4c, 0x74, 0x77, 0xae, 0xa1, 0x26, 0xb5};
May be something obvious, but what is the best way to update this value, in case one would need to re-generate it later?
I.e is there some way to verify correctness of the expected hash value with some external tool e.g piping something to sha256sum
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @bzz, @creachadair, and @dennwc)
src/libuast.hpp, line 63 at r1 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Maybe this is already defined elsewhere, but what is the format of these data? A binary sha1 or something?
Yes, it's raw bytes of some hash. I hoped the uint8_t
type will give an insight into it (as opposed to char
that may mean text data).
src/uast.h, line 93 at r1 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
Can you please add some comments here, to define what these mean?
Is this meant to be a bitwise-or of flags (e.g., all minus whatever is set?), or will these be mutually exclusive?
Good point! Will document that.
tests/nodes_test.h, line 656 at r1 (raw file):
Previously, bzz (Alexander) wrote…
May be something obvious, but what is the best way to update this value, in case one would need to re-generate it later?
I.e is there some way to verify correctness of the expected hash value with some external tool e.g piping something to
sha256sum
, etc
The way to regenerate is to call the UastHash
itself mostly, which will call into the same function in SDK. Manually piping won't work, because SDK constructs a special (fast, non-reversible) serialization of the tree for the hash, whatever it is.
It's not ideal of course, because the only way to check the hash is to trust the current implementation. But at least it will notify us if the binding breaks, or if SDK breaks the hash.
Also, it doesn't seem we have the same test case in SDK for some reason. Adding it will fix the issues you pointed out, at least partially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @bzz and @dennwc)
src/libuast.hpp, line 63 at r1 (raw file):
Previously, dennwc (Denys Smirnov) wrote…
Yes, it's raw bytes of some hash. I hoped the
uint8_t
type will give an insight into it (as opposed tochar
that may mean text data).
I was hoping we might document which hash somewhere. Of course it could be anything, and we can change it later if we need to, but to verify it someone will need to know which it is 🙂
tests/nodes_test.h, line 656 at r1 (raw file):
Previously, dennwc (Denys Smirnov) wrote…
The way to regenerate is to call the
UastHash
itself mostly, which will call into the same function in SDK. Manually piping won't work, because SDK constructs a special (fast, non-reversible) serialization of the tree for the hash, whatever it is.It's not ideal of course, because the only way to check the hash is to trust the current implementation. But at least it will notify us if the binding breaks, or if SDK breaks the hash.
Also, it doesn't seem we have the same test case in SDK for some reason. Adding it will fix the issues you pointed out, at least partially.
That's still useful to know, and might be worth a mention in the comment:
// This value must be updated if the algorithm for structural hashing is changed,
// or if the tree structure for the test module changes in a way that modifies the
// structural hash. To update it, copy the actual value from the test failure and
// update this array.
or words to that effect. I suggest also making sure the test failure will print the actual value so that someone doesn't have to write a special program to update the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, all discussions resolved (waiting on @bzz and @creachadair)
src/libuast.hpp, line 63 at r1 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
I was hoping we might document which hash somewhere. Of course it could be anything, and we can change it later if we need to, but to verify it someone will need to know which it is 🙂
I'd rather not to. It's nearly useless for the client since it cannot verify the hash anyway because of custom node serialization.
tests/nodes_test.h, line 656 at r1 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
That's still useful to know, and might be worth a mention in the comment:
// This value must be updated if the algorithm for structural hashing is changed, // or if the tree structure for the test module changes in a way that modifies the // structural hash. To update it, copy the actual value from the test failure and // update this array.or words to that effect. I suggest also making sure the test failure will print the actual value so that someone doesn't have to write a special program to update the test.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: complete! all files reviewed, all discussions resolved
src/libuast.hpp, line 63 at r1 (raw file):
Previously, dennwc (Denys Smirnov) wrote…
I'd rather not to. It's nearly useless for the client since it cannot verify the hash anyway because of custom node serialization.
That makes sense. It's probably still helpful to document (somewhere) what properties the hash requires. For example maybe:
// The data are a cryptographic-quality fingerprint of the tree structure.
// The specific algorithm is not specified and may change across library versions.
// Clients may compare hash values for equality to match equivalent nodes, but cannot
// recompute hash values directly, as the input depends on details of serialization.
or words to that effect.
If we say this, though, we're also implicitly saying that the client may not safely persist node hashes if they will be read later by a different library version. So maybe we don't want to allow it to change.
Signed-off-by: Denys Smirnov <denys@sourced.tech>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 4 files reviewed, all discussions resolved (waiting on @bzz and @creachadair)
src/libuast.hpp, line 63 at r1 (raw file):
Previously, creachadair (M. J. Fromberger) wrote…
That makes sense. It's probably still helpful to document (somewhere) what properties the hash requires. For example maybe:
// The data are a cryptographic-quality fingerprint of the tree structure. // The specific algorithm is not specified and may change across library versions. // Clients may compare hash values for equality to match equivalent nodes, but cannot // recompute hash values directly, as the input depends on details of serialization.
or words to that effect.
If we say this, though, we're also implicitly saying that the client may not safely persist node hashes if they will be read later by a different library version. So maybe we don't want to allow it to change.
Added the comment. Also opened an issue in SDK for persistent hashes: bblfsh/sdk#410.
Node hashes were already available in
libuast
, but only on C level. This PR adds the same functionality to the C++ layer used by clients. Also adds a test case.Fixes #78
Signed-off-by: Denys Smirnov denys@sourced.tech
This change is