Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

duplicate key-able fields in JSON configurations #11285

Closed
meowsbits opened this issue Nov 24, 2019 · 8 comments
Closed

duplicate key-able fields in JSON configurations #11285

meowsbits opened this issue Nov 24, 2019 · 8 comments
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M2-config 📂 Chain specifications and node configurations.
Milestone

Comments

@meowsbits
Copy link
Contributor

meowsbits commented Nov 24, 2019

  • Parity Ethereum version: 13729a0
  • Network: Meta, eg. istanbul_test.json

A commit about a month ago introduced a new JSON data structure for chain configurations, modifying builtin parameter fields, eg. 13729a0#diff-775e1f6974b3c14259839d210872eb12L68

The new structure uses a map where <activation_block_number>: { <pricing data> }.

https://github.com/paritytech/parity-ethereum/blob/13729a0f7fbfa5b31de17f68f7a05db1d7da4353/ethcore/res/ethereum/istanbul_test.json#L71-L76

While technically valid JSON, duplicate keys are not recommended.

https://stackoverflow.com/a/23195243

Taken from the above SO answer, the citation at ECMA-262 is of further note:

In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.

since with the U64 encoding format in use which allows both hex- and nonhex-encoded read formats (I'm assuming is for this field given AFAIU it is for many others), there may be unexpected outcomes for this, eg "0x7fffffffffffff" vs. "1".

This is an edge case issue for most "canonical" (non-test) configurations so far (since builtin activation blocks are not equal), but for new or development chains this may be a more pressing issue.

Rel etclabscore/core-geth#64
Rel ethereum/go-ethereum#20241

@jam10o-new jam10o-new added F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M2-config 📂 Chain specifications and node configurations. labels Nov 24, 2019
@jam10o-new jam10o-new added this to the 2.7 milestone Nov 24, 2019
@niklasad1
Copy link
Collaborator

niklasad1 commented Dec 23, 2019

Hey, nice observation thanks for writing this up.

since with the U64 encoding format in use which allows both hex- and nonhex-encoded read formats (I'm assuming is for this field given AFAIU it is for many others), there may be unexpected outcomes for this, eg "0x7fffffffffffff" vs. "1".

Ok, so you mean for example 0x1 and 1 might get interpreted as two different values?

In parity-eth that is not an issue so far because we deserialize the values first and then insert them into a BTreeMap. Thus, the preceding value will overwrite previous value(s).

This is an edge case issue for most "canonical" (non-test) configurations so far (since builtin activation blocks are not equal), but for new or development chains this may be a more pressing issue.

I think we should remove duplicate keys for new or dev chains because only the last value is used but it was more to test that it worked. It is more explicit and easier to reason about

@meowsbits
Copy link
Contributor Author

so you mean for example 0x1 and 1 might get interpreted as two different values?

Mmm.. no. Not sure exactly what I meant there. May have been misguided... let's ignore that comment.

the lexically preceding value will overwrite previous value(s).

I'm guessing you mean "the lexically succeeding value will overwrite previous value", or? If so, that's what I mean. Also, I'm assuming that "lexical ordering" means alphabetization. (https://en.wikipedia.org/wiki/Lexicographical_order)

So let's imagine a builtin as the following:

"0000000000000000000000000000000000000007": {
  "builtin": {
    "name": "alt_bn128_mul",
    "pricing": {
      "42": {
        "price": { "alt_bn128_const_operations": { "price": 40000 }}
      },
      "42": {
        "price": { "alt_bn128_const_operations": { "price": 35000 }}
      },
      "0x2a": {
        "price": { "alt_bn128_const_operations": { "price": 30000 }}
      },
      "0x02a": {
        "price": { "alt_bn128_const_operations": { "price": 10000 }}
      },
      "2a": {
        "price": { "alt_bn128_const_operations": { "price": 5000 }}
      }
    }
  }
}

What would the price be at block 43?

Lexicographical order:

  • 0x02a
  • 0x2a
  • 2a
  • 42, 42 (==)

So price would be either 40000 or 35000, right? Maybe 35000 because it's lower on the page?

Understood on this being a non issue for parity-eth w/r/t reading and deserialization patterns. How does the logic handle (if ever) writing these values? Is it all order of operations? (So you need to write fields in desired activation order so that would-be duplicate keyed fields get overwritten by their later counterparts?)

I feel I should mention that while removing the offending keys from the configurations in the library may workaround the issue for the cases presented, this pattern in general introduces an annoyance for anyone, cough like myself 😹, wanting to work with the configs in other contexts because we have to use custom unmarshaling methods.

@niklasad1
Copy link
Collaborator

I'm guessing you mean "the lexically succeeding value will overwrite previous value", or?

I might have misinterpreted lexically ordering, in this context:

In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.

(in other words last-value-wins).

Text fetched from https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object

So, what I meant is that last value wins (but it assumes that the JSON is parsed top-to-bottom)

@meowsbits
Copy link
Contributor Author

meowsbits commented Dec 23, 2019

Gotcha, where I was thinking "lexicographical ordering" was alphabetical, and then in case of collision alphabetically, then "bottom-wins" by on-page ordering.

Here's an implementation and test I made for our actual use case: etclabscore/core-geth#109. I use sort.Strings to sort lexicographically. In case of duplicate literal values (eg. "42" is a dupe of "42"), then with Go I really have no way of determinig order, since "42" == "42" and Golang doesn't guarantee map iteration order.

@adria0 adria0 added the A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 27, 2020
@adria0 adria0 closed this as completed Jul 27, 2020
@meowsbits
Copy link
Contributor Author

Just noting that this, albeit stale, is still unresolved as a function of the schema. Or? Are there any plans to resolve it?

@adria0
Copy link

adria0 commented Jul 27, 2020

It seems that nobody has been working on this @meowsbits

@meowsbits
Copy link
Contributor Author

Is this a non-issue in the implementation? Or how are duplicate-keyed configuration fields handled?... Order of insertion? Some other comparison operation?

Given the question above, what would be expected?

Asking because this seems like an edge case that has the potential to grow from an annoyance to a bug.

@adria0
Copy link

adria0 commented Jul 28, 2020

@meowsbits, the chainspecs embeeded in openethereum are dual reviewed and later tested with the current ethereum tests, so this is not a problem for us at this moment or for users that are running in a well-known networks.

Do you talk about an annoyance or a bug. May you describe the scenario(s) where this becomes a real problem? This will be easier to understand the specific cases you are talking about.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M2-config 📂 Chain specifications and node configurations.
Projects
None yet
Development

No branches or pull requests

4 participants