Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework JSON conversions for transaction metadata #1797

Merged
merged 1 commit into from
Sep 16, 2020

Conversation

KtorZ
Copy link
Contributor

@KtorZ KtorZ commented Sep 2, 2020

a. Encode JSON as CBOR considering only the following "optimizations"
b. Hexadecimal sequences starting with 0x are encoded as CBOR bytestring.
c. JSON keys that are numbers are encoded as CBOR numbers.
d. CBOR is shown as JSON when possible, or as a string representing JSON-encoded data when not (i.e. when map keys aren't numbers or strings).
e. CBOR bytestrings are represented as hexadecimal strings prefixed with 0x

  • we expect are that we can round trip any JSON without constraints (modulo null and boolean values)
  • for CBOR, we should be able to round trip a subset with some simple-ish constraints.

Copy link
Contributor

@erikd erikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are just my first impressions. I still want to pull this branch and play with it a bit.

md <- Hedgehog.forAll genMetaData
Hedgehog.tripping md jsonFromMetadata jsonToMetadata
json <- jsonFromMetadata <$> Hedgehog.forAll genMetaData
Hedgehog.tripping json identity (fmap jsonFromMetadata . jsonToMetadata)
Copy link
Contributor

@erikd erikd Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? Why is identity being used there? That defeats the prurpose of the tripping function's ability to pretty print failed test cases. And why do we convert the generated metadata to JSON before round trip testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - we need to convert the generated metadata to JSON before round-trip testing to "wash out" ambiguous metadatums. For example, MD.S "0x" becomes MD.B "" after a round-trip, causing the property to fail. I don't really like this though.

genText :: Gen Text
genText = Gen.choice
[ Gen.ensure (not . Text.isPrefixOf bytesPrefix)
(Text.pack <$> Gen.list (Range.linear 0 64) Gen.alphaNum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspiciously like it is generating a subset of the real world so that is can pass round tripping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@erikd
Copy link
Contributor

erikd commented Sep 2, 2020

Reading the PR comments and and looking at the code, I still have no clue what this PR is actually intended to achieve or why.

Comment on lines 132 to 216
Aeson.String txt ->
case Text.stripPrefix bytesPrefix txt of
Nothing -> jsonToMetadataString txt
Just hex -> case Base16.decode (Text.encodeUtf8 hex) of
(bytes, "") -> jsonToMetadataBytes bytes
(_, err) -> Left $ ConversionErrBadBytes err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings prefixed by "0x" are not necessarily to be interpreted as byte strings. E.g. of such a string: "0xFFFFFF is a nice number".

Are we okay with strings like this just being interpreted as "bad bytes"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this kind of check should be done by a separate linter than be rejected here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@intricate yes it is fine. Metadata aren't meant to be a place of self expression but rather a compact way to embed custom information in transactions. So rather than free-form text and verbose sentence, one should put the minimal information possible in metadata and have more verbose details stored off chain.

Comment on lines 70 to 153
-- - Or they are something else, and we render them as a serialized JSON string
--
-- So, for metadata coming from JSON in the first place, this will be pretty much invisible.
-- And for more elaborated metadata crafted by other mean (via a CBOR library for instance),
-- the key in the JSON-rendered metadata will look at bit funky.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems pretty odd to render a key as escaped JSON. 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/other mean/other means/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little funky indeed. Cf the slack conversation.

It gives however some nice debugging and inspection capabilities which is in the end, what we really care about for the Json view.

Another option would be to use a Json schema that can represent any CBOR data, like a diagnostic view but it'll be much more verbose.

In practice, most metadata would originally come from Json so it won't be that weird.

@rvl
Copy link
Contributor

rvl commented Sep 3, 2020

I changed the usage of Hedgehog.tripping as suggested, and added invalid cases to the generators.

However I would prefer that the JSON representation here could perfectly map one-to-one with the on-chain metadata.

@erikd
Copy link
Contributor

erikd commented Sep 3, 2020

I still do not understand what specific problem this PR is trying to address and how it is trying to address it.

@rvl
Copy link
Contributor

rvl commented Sep 3, 2020

The specific problem - according to me - is that if cardano-wallet uses these json conversion functions, wallet users will submit their JSON metadata with a transaction, and may well find that the JSON is different when they view their metadata in the tx history. This could break naive apps.

Also if users submit JSON metadata like:

"metadata": { "42": [["ada", 1], ["ada", 2]] }

it will appear in the cardano explorer as:

"metadata": { "42": { "ada": 1 /* or 2 */ } }

@erikd
Copy link
Contributor

erikd commented Sep 3, 2020

Ok, that is a problem statement, but how does this PR address it?

Hedgehog.tripping md jsonFromMetadata jsonToMetadata
else do
Hedgehog.label "invalid"
Hedgehog.assert $ isLeft $ jsonToMetadata $ jsonFromMetadata md
Copy link
Contributor

@erikd erikd Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that all metadata values that fail to roundtrip are invalid?

Ok, isValidMetadata is defined below.

-- -----------------------------------------------------------------------------
-- pre-wash metadata to remove ambiguous metadatums before
-- testing the round-trip property.
let md' = fromRight md $ jsonToMetadata $ jsonFromMetadata md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, this is going to make the round-trip test much less effective...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd rather generate arbitrary JSON values and see that they roundtrip fine using the metadata conversion function, it'll better capture what is the actual property we want. By re-using jsonToMetadata and jsonFromMetadata to construct the input I am afraid we are indeed creating a bias in the generator which will fail to capture some edge cases.

@erikd
Copy link
Contributor

erikd commented Sep 3, 2020

This needs a lot more thought. Rushing to wrong solutions would be a bad idea.

@KtorZ
Copy link
Contributor Author

KtorZ commented Sep 3, 2020

@erikd I believe the problem is well stated on the Slack thread I opened some days ago. It is as Rodney pointed out above:

We want metadata to round-trip when coming from JSON.

So you're right that the generator only generate a subset of possible values, because this is exactly what we want. Ideally, the generator should generate a Json Value and round-trip from that.

That PR addresses the problem because it makes sure that regardless of the what Json is given, the user get the exact same one after converting to CBOR and to JSON back. It wasn't the case before because of the special transformation done on lists and maps.

The trade-off is that we have some funky keys when translating CBOR maps that have non string keys. That is perhaps a little ugly, yet see the slack thread for rationale.

@erikd
Copy link
Contributor

erikd commented Sep 3, 2020

The Slack thread was huge, and making sense of it is far from trivial.

Please summarize, preferably giving examples of:

  • How the new serialisation differs from the old
  • What the new serialisation handles that the old one does not.

@KtorZ KtorZ force-pushed the KtorZ/2098/adjust-metadata-json branch from 5edd19e to b90d45f Compare September 3, 2020 11:15
Copy link
Contributor

@erikd erikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what the problem is that you are trying to solve and I think I have a better solutuon, but until you actually tell me what the problem is, I can't actually tell.

@KtorZ
Copy link
Contributor Author

KtorZ commented Sep 3, 2020

@erikd

JSON CBOR (Before) CBOR (Now) re-JSON (Before) re-JSON (Now)
"0xFF" 6430786666 (text) 41FF (bytes) "0xFF" "0xFF"
{ "hex": "ff"} 41FF (bytes) A163686578626666 (map of bytes) { "hex": "ff"} { "hex": "ff"}
[] A0 (empty map) 80 (empty list) {} []
{ "a": 1 } A1616101 (map of int) A1616101 (map of int) { "a": 1 } { "a": 1 }
[["a", 1], ["a", 2]] A2616102616101 (map) 828261610182616102 (list of list) { "a": 1 } OR { "a": 2 } [["a", 1], ["a", 2]]

The main problem with the current behavior is that is converts list to maps under some conditions, and does not convert them back to their original value when doing a roundtrip from JSON (see last example). This PR changes this to avoid doing this sort of list/map conversions, such that:

  • JSON maps are converted to CBOR maps
  • JSON lists are converted to CBOR lists

There's actually no issue in that direction. The main pain point is when translating an arbitrary CBOR into JSON. My original intent was to simply fail if a CBOR map with non-string keys was encountered, but after discussing this with @dcoutts we came to an agreement: render non-string keys as string using their corresponding encoded JSON. Keys look a bit funky, but at least, it's easy to introspect their content. Most users won't face this anyway as most would probably start off a JSON object in the first place.

To be consistent and avoid structural transformation, the PR also changed back the convention for bytestrings to what was the original idea: base16-encoded strings prefixed with "0x" are encoded as CBOR bytes
This is slightly simpler from a user perspective and also lead to a simpler implementation.

erikd
erikd previously requested changes Sep 3, 2020
-- | JSON strings that are base16 encoded and prefixed with 'bytesPrefix' will
-- be encoded as CBOR bytestrings.
bytesPrefix :: Text
bytesPrefix = "0x"
Copy link
Contributor

@erikd erikd Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad idea. We had a solution to this bad idea, but that has been dropped in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why "it's a bad idea".

Also, when "requesting changes" on a PR, it's better to tell the author what changes are expected.

Copy link
Contributor

@erikd erikd Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used the 0x and it made it difficult to distingush ByteString from Text. Property testing immediately found all the holes and the only way to fix it was to represent ByteStrings an object like { "hex" : "ffffff" }.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎 property testing demonstrating things working fine.

@rvl
Copy link
Contributor

rvl commented Sep 4, 2020

How about this JSON encoding scheme? It's round-trippable in both directions. There is no tricky stuff like json-in-json or 0x prefixes.

This is a fragment of OpenAPI yaml:

x-transactionMetadata: &transactionMetadata
  type: object
  nullable: true
  additionalProperties:
    $ref: "#/components/schemas/TransactionMetadatum"
  propertyNames:
    pattern: '^[0-9]+$'
  example:
    0: "cardano"
    1: 14
    2: { "hex": "2512a00e9653fe49a44a5886202e24d77eeb998f" }
    3: { "list": [14, 42, 1337] }
    4: { "map": [["key", "value"], ["14", 42]] }

components:
  schemas:
    TransactionMetadatum: &TransactionMetadatum
      oneOf:
        - title: String
          type: string
        - title: ByteString
          type: object
          required:
            - hex
          properties:
            hex:
              type: string
              pattern: "^[0-9a-fA-F]$"
        - title: Integer
          type: integer
        - title: List
          type: object
          required:
            - list
          properties:
            list:
              type: array
              items:
                $ref: "#/components/schemas/TransactionMetadatum"
        - title: Map
          type: object
          required:
            - map
          properties:
            map:
              type: array
              items:
                type: array
                minItems: 2
                maxItems: 2
                items:
                  $ref: "#/components/schemas/TransactionMetadatum"

@intricate
Copy link
Contributor

@rvl: I'd much prefer something like that.

@erikd @dcoutts: Thoughts?

@KtorZ
Copy link
Contributor Author

KtorZ commented Sep 4, 2020

@rvl that would solve problems indeed. May I suggest to call "hex" simply "bytes" for transparency and consistency with others?

@rvl
Copy link
Contributor

rvl commented Sep 4, 2020

May I suggest to call "hex" simply "bytes"

Yes I think that would be better. I was also wondering whether to add "string" and "integer" tags to string and integer values, also for the sake of consistency.

@KtorZ
Copy link
Contributor Author

KtorZ commented Sep 9, 2020

rebased on top of 1.19.1

@dcoutts
Copy link
Contributor

dcoutts commented Sep 14, 2020

Provide conversion for two different JSON <-> TxMetadata mappings:

  1. A mapping that allows almost any JSON value to be converted into
    tx metadata. This does not require a specific JSON schema for the
    input. It does not expose the full representation capability of tx
    metadata.

  2. A mapping that exposes the full representation capability of tx
    metadata, but relies on a specific JSON schema for the input JSON.

Still TODO:

  • update the cli
  • simplify the schema error reporting data type

rvl added a commit to input-output-hk/cardano-haskell that referenced this pull request Sep 15, 2020
Original revision disappeared due to forced git push.

IntersectMBO/cardano-node#1797
@dcoutts dcoutts force-pushed the KtorZ/2098/adjust-metadata-json branch from 6aec2ae to b371ba1 Compare September 15, 2020 10:15
@rooooooooob
Copy link

I don't see any mention of alternate encodings (e.g. strings) for too-big ints (metadata can have u64 uint and u64 nint as it uses CDDL int) in that OpenAPI schema. Granted I am not familiar with it, but I quickly scanned the docs and saw 2 optional format specifiers for int32 and int64 for the integer used there but even int64 would only include half of the possible values of a CDDL int since those are almost like a 65-bit integer. The code in this PR also mentions 64 bit integers.

@dcoutts
Copy link
Contributor

dcoutts commented Sep 15, 2020

I don't see any mention of alternate encodings (e.g. strings) for too-big ints (metadata can have u64 uint and u64 nint as it uses CDDL int) in that OpenAPI schema.

Can you elaborate on what you mean, since it's not clear to me. Yes the tx metadata can have up to 64bit negative and positive integers. The code checks these limits when converting from JSON.

There is no alternative in the tx metadata for integers bigger than 64bit. That is simply the maximum.

@rooooooooob
Copy link

rooooooooob commented Sep 15, 2020

tl;dr: If we have a large number like 2^64 - 1 in the metadata CBOR, how does this convert to JSON? How are numbers like 2^32 or 2^40 represented too?

The 65 bit comment was if we were using signed 64-bit to represent both then we would effectively be losing one bit of information, no? int = uint / nint which is almost basically a 65-bit signed int since it's unsigned 64 in both signs. My concern was just when it came to how we would be representing numbers big enough to be in a 64-bit uint/nint but not enough to be in a 64-bit signed integer. For example representing 2^63 + n, 0 <= n < 2^63 or the same but negative.

   [ TxMetadataNumberOutOfRange n
    |    n >         fromIntegral (maxBound :: Word64)
      || n < negate (fromIntegral (maxBound :: Word64))
    ]

Are you referring to this? This is only checking the metadata limits, not related to JSON.

My concern was not how the Haskell/CBOR part is doing this, but how we are handling this in JSON. I thought that JSON integers are signed 32-bytes. And beyond that, even what I saw in the OpenAPI docs it just mentioned an optional signed 64-bit integer format tag, which still doesn't represent half (although a half that is not likely to ever be used as they'd be very large/very small) of the possible values of a CDDL int. I didn't see that optional format tag used so I wasn't sure if it was even using that or how we would be representing >32-bit numbers in JSON when we convert from the haskell metadata which allows up to 64-bit positive or negative unsigned.

@dcoutts
Copy link
Contributor

dcoutts commented Sep 15, 2020

JSON itself supports arbitrary precision numbers in scientific format, so we're not limited there at all.

Some JSON libs in some programming languages have tighter limitations, but that's the problem on their side. We can produce and consume valid JSON with large integers no problem at all.

@dcoutts dcoutts force-pushed the KtorZ/2098/adjust-metadata-json branch 3 times, most recently from cc49ae0 to 6d4efe6 Compare September 15, 2020 22:06
Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally cheating now, for me to review what is now my own PR 😁 (having taken it over).

@dcoutts dcoutts dismissed erikd’s stale review September 15, 2020 22:46

rewritten since review

@dcoutts dcoutts force-pushed the KtorZ/2098/adjust-metadata-json branch from bf8a544 to 43d4589 Compare September 15, 2020 23:45
Provide conversion for two different JSON <-> TxMetadata mappings:

1. A mapping that allows almost any JSON value to be converted into
   tx metadata. This does not require a specific JSON schema for the
   input. It does not expose the full representation capability of tx
   metadata.

2. A mapping that exposes the full representation capability of tx
   metadata, but relies on a specific JSON schema for the input JSON.
@dcoutts
Copy link
Contributor

dcoutts commented Sep 16, 2020

bors merge

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Sep 16, 2020

@iohk-bors iohk-bors bot merged commit 2ef9e7a into master Sep 16, 2020
@iohk-bors iohk-bors bot deleted the KtorZ/2098/adjust-metadata-json branch September 16, 2020 08:27
Copy link
Contributor

@Jimbo4350 Jimbo4350 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

pTxMetadataJsonSchema :: Parser TxMetadataJsonSchema
pTxMetadataJsonSchema =
( Opt.flag' ()
( Opt.long "--json-metadata-no-schema"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that two hyphens in Opt.long "--json-metadata-no-schema" and on line 922 Opt.long "--json-metadata-detailed-schema" are not needed.

rooooooooob added a commit to Emurgo/cardano-serialization-lib that referenced this pull request Sep 19, 2020
SebastienGllmt pushed a commit to Emurgo/cardano-serialization-lib that referenced this pull request Oct 1, 2020
…dgen dependency for non-wasm builds (#85)

* Implement JSON schema CDDL for metadata

Based on: IntersectMBO/cardano-node#1797

* fix DetailedSchmea + add more tests + fix documentation + remove wasm-bindgen for non-wasm builds

* remove debug panic from json_encoding_basic test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants