-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework JSON conversions for transaction metadata #1797
Conversation
420da2d
to
5b59ed7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are just my first impressions. I still want to pull this branch and play with it a bit.
md <- Hedgehog.forAll genMetaData | ||
Hedgehog.tripping md jsonFromMetadata jsonToMetadata | ||
json <- jsonFromMetadata <$> Hedgehog.forAll genMetaData | ||
Hedgehog.tripping json identity (fmap jsonFromMetadata . jsonToMetadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? Why is identity
being used there? That defeats the prurpose of the tripping
function's ability to pretty print failed test cases. And why do we convert the generated metadata to JSON before round trip testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - we need to convert the generated metadata to JSON before round-trip testing to "wash out" ambiguous metadatums. For example, MD.S "0x"
becomes MD.B ""
after a round-trip, causing the property to fail. I don't really like this though.
genText :: Gen Text | ||
genText = Gen.choice | ||
[ Gen.ensure (not . Text.isPrefixOf bytesPrefix) | ||
(Text.pack <$> Gen.list (Range.linear 0 64) Gen.alphaNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspiciously like it is generating a subset of the real world so that is can pass round tripping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Reading the PR comments and and looking at the code, I still have no clue what this PR is actually intended to achieve or why. |
Aeson.String txt -> | ||
case Text.stripPrefix bytesPrefix txt of | ||
Nothing -> jsonToMetadataString txt | ||
Just hex -> case Base16.decode (Text.encodeUtf8 hex) of | ||
(bytes, "") -> jsonToMetadataBytes bytes | ||
(_, err) -> Left $ ConversionErrBadBytes err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings prefixed by "0x"
are not necessarily to be interpreted as byte strings. E.g. of such a string: "0xFFFFFF is a nice number"
.
Are we okay with strings like this just being interpreted as "bad bytes"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this kind of check should be done by a separate linter than be rejected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@intricate yes it is fine. Metadata aren't meant to be a place of self expression but rather a compact way to embed custom information in transactions. So rather than free-form text and verbose sentence, one should put the minimal information possible in metadata and have more verbose details stored off chain.
-- - Or they are something else, and we render them as a serialized JSON string | ||
-- | ||
-- So, for metadata coming from JSON in the first place, this will be pretty much invisible. | ||
-- And for more elaborated metadata crafted by other mean (via a CBOR library for instance), | ||
-- the key in the JSON-rendered metadata will look at bit funky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems pretty odd to render a key as escaped JSON. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/other mean/other means/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little funky indeed. Cf the slack conversation.
It gives however some nice debugging and inspection capabilities which is in the end, what we really care about for the Json view.
Another option would be to use a Json schema that can represent any CBOR data, like a diagnostic view but it'll be much more verbose.
In practice, most metadata would originally come from Json so it won't be that weird.
I changed the usage of However I would prefer that the JSON representation here could perfectly map one-to-one with the on-chain metadata. |
I still do not understand what specific problem this PR is trying to address and how it is trying to address it. |
The specific problem - according to me - is that if cardano-wallet uses these json conversion functions, wallet users will submit their JSON metadata with a transaction, and may well find that the JSON is different when they view their metadata in the tx history. This could break naive apps. Also if users submit JSON metadata like:
it will appear in the cardano explorer as:
|
Ok, that is a problem statement, but how does this PR address it? |
Hedgehog.tripping md jsonFromMetadata jsonToMetadata | ||
else do | ||
Hedgehog.label "invalid" | ||
Hedgehog.assert $ isLeft $ jsonToMetadata $ jsonFromMetadata md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know that all metadata values that fail to roundtrip are invalid?
Ok, isValidMetadata
is defined below.
-- ----------------------------------------------------------------------------- | ||
-- pre-wash metadata to remove ambiguous metadatums before | ||
-- testing the round-trip property. | ||
let md' = fromRight md $ jsonToMetadata $ jsonFromMetadata md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, this is going to make the round-trip test much less effective...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I'd rather generate arbitrary JSON values and see that they roundtrip fine using the metadata conversion function, it'll better capture what is the actual property we want. By re-using jsonToMetadata and jsonFromMetadata to construct the input I am afraid we are indeed creating a bias in the generator which will fail to capture some edge cases.
This needs a lot more thought. Rushing to wrong solutions would be a bad idea. |
@erikd I believe the problem is well stated on the Slack thread I opened some days ago. It is as Rodney pointed out above: We want metadata to round-trip when coming from JSON. So you're right that the generator only generate a subset of possible values, because this is exactly what we want. Ideally, the generator should generate a Json Value and round-trip from that. That PR addresses the problem because it makes sure that regardless of the what Json is given, the user get the exact same one after converting to CBOR and to JSON back. It wasn't the case before because of the special transformation done on lists and maps. The trade-off is that we have some funky keys when translating CBOR maps that have non string keys. That is perhaps a little ugly, yet see the slack thread for rationale. |
The Slack thread was huge, and making sense of it is far from trivial. Please summarize, preferably giving examples of:
|
5edd19e
to
b90d45f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know what the problem is that you are trying to solve and I think I have a better solutuon, but until you actually tell me what the problem is, I can't actually tell.
The main problem with the current behavior is that is converts list to maps under some conditions, and does not convert them back to their original value when doing a roundtrip from JSON (see last example). This PR changes this to avoid doing this sort of list/map conversions, such that:
There's actually no issue in that direction. The main pain point is when translating an arbitrary CBOR into JSON. My original intent was to simply fail if a CBOR map with non-string keys was encountered, but after discussing this with @dcoutts we came to an agreement: render non-string keys as string using their corresponding encoded JSON. Keys look a bit funky, but at least, it's easy to introspect their content. Most users won't face this anyway as most would probably start off a JSON object in the first place. To be consistent and avoid structural transformation, the PR also changed back the convention for bytestrings to what was the original idea: base16-encoded strings prefixed with "0x" are encoded as CBOR bytes |
-- | JSON strings that are base16 encoded and prefixed with 'bytesPrefix' will | ||
-- be encoded as CBOR bytestrings. | ||
bytesPrefix :: Text | ||
bytesPrefix = "0x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad idea. We had a solution to this bad idea, but that has been dropped in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why "it's a bad idea".
Also, when "requesting changes" on a PR, it's better to tell the author what changes are expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used the 0x
and it made it difficult to distingush ByteString
from Text
. Property testing immediately found all the holes and the only way to fix it was to represent ByteString
s an object like { "hex" : "ffffff" }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎 property testing demonstrating things working fine.
How about this JSON encoding scheme? It's round-trippable in both directions. There is no tricky stuff like json-in-json or This is a fragment of OpenAPI yaml: x-transactionMetadata: &transactionMetadata
type: object
nullable: true
additionalProperties:
$ref: "#/components/schemas/TransactionMetadatum"
propertyNames:
pattern: '^[0-9]+$'
example:
0: "cardano"
1: 14
2: { "hex": "2512a00e9653fe49a44a5886202e24d77eeb998f" }
3: { "list": [14, 42, 1337] }
4: { "map": [["key", "value"], ["14", 42]] }
components:
schemas:
TransactionMetadatum: &TransactionMetadatum
oneOf:
- title: String
type: string
- title: ByteString
type: object
required:
- hex
properties:
hex:
type: string
pattern: "^[0-9a-fA-F]$"
- title: Integer
type: integer
- title: List
type: object
required:
- list
properties:
list:
type: array
items:
$ref: "#/components/schemas/TransactionMetadatum"
- title: Map
type: object
required:
- map
properties:
map:
type: array
items:
type: array
minItems: 2
maxItems: 2
items:
$ref: "#/components/schemas/TransactionMetadatum"
|
@rvl that would solve problems indeed. May I suggest to call "hex" simply "bytes" for transparency and consistency with others? |
Yes I think that would be better. I was also wondering whether to add "string" and "integer" tags to string and integer values, also for the sake of consistency. |
rebased on top of 1.19.1 |
eac2908
to
4b9bdf1
Compare
Provide conversion for two different JSON <-> TxMetadata mappings:
Still TODO:
|
Original revision disappeared due to forced git push. IntersectMBO/cardano-node#1797
6aec2ae
to
b371ba1
Compare
I don't see any mention of alternate encodings (e.g. strings) for too-big ints (metadata can have u64 uint and u64 nint as it uses CDDL int) in that OpenAPI schema. Granted I am not familiar with it, but I quickly scanned the docs and saw 2 optional format specifiers for |
Can you elaborate on what you mean, since it's not clear to me. Yes the tx metadata can have up to 64bit negative and positive integers. The code checks these limits when converting from JSON. There is no alternative in the tx metadata for integers bigger than 64bit. That is simply the maximum. |
tl;dr: If we have a large number like 2^64 - 1 in the metadata CBOR, how does this convert to JSON? How are numbers like 2^32 or 2^40 represented too? The 65 bit comment was if we were using signed 64-bit to represent both then we would effectively be losing one bit of information, no? [ TxMetadataNumberOutOfRange n
| n > fromIntegral (maxBound :: Word64)
|| n < negate (fromIntegral (maxBound :: Word64))
] Are you referring to this? This is only checking the metadata limits, not related to JSON. My concern was not how the Haskell/CBOR part is doing this, but how we are handling this in JSON. I thought that JSON integers are signed 32-bytes. And beyond that, even what I saw in the OpenAPI docs it just mentioned an optional signed 64-bit integer format tag, which still doesn't represent half (although a half that is not likely to ever be used as they'd be very large/very small) of the possible values of a CDDL |
JSON itself supports arbitrary precision numbers in scientific format, so we're not limited there at all. Some JSON libs in some programming languages have tighter limitations, but that's the problem on their side. We can produce and consume valid JSON with large integers no problem at all. |
cc49ae0
to
6d4efe6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is totally cheating now, for me to review what is now my own PR 😁 (having taken it over).
bf8a544
to
43d4589
Compare
Provide conversion for two different JSON <-> TxMetadata mappings: 1. A mapping that allows almost any JSON value to be converted into tx metadata. This does not require a specific JSON schema for the input. It does not expose the full representation capability of tx metadata. 2. A mapping that exposes the full representation capability of tx metadata, but relies on a specific JSON schema for the input JSON.
43d4589
to
a75c650
Compare
bors merge |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
pTxMetadataJsonSchema :: Parser TxMetadataJsonSchema | ||
pTxMetadataJsonSchema = | ||
( Opt.flag' () | ||
( Opt.long "--json-metadata-no-schema" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that two hyphens in Opt.long "--json-metadata-no-schema"
and on line 922 Opt.long "--json-metadata-detailed-schema"
are not needed.
…dgen dependency for non-wasm builds (#85) * Implement JSON schema CDDL for metadata Based on: IntersectMBO/cardano-node#1797 * fix DetailedSchmea + add more tests + fix documentation + remove wasm-bindgen for non-wasm builds * remove debug panic from json_encoding_basic test
a. Encode JSON as CBOR considering only the following "optimizations"
b. Hexadecimal sequences starting with 0x are encoded as CBOR bytestring.
c. JSON keys that are numbers are encoded as CBOR numbers.
d. CBOR is shown as JSON when possible, or as a string representing JSON-encoded data when not (i.e. when map keys aren't numbers or strings).
e. CBOR bytestrings are represented as hexadecimal strings prefixed with 0x