-
-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonical JSON is inadequately specified #1232
Comments
I believe both of these only apply to room versions < 6 (before strict canonical JSON was enforced in Synapse). |
Allowing those used to be a bug in Synapse, but it was fixed already matrix-org/synapse#8106
The spec and synapse both mention/use |
So is the only remaining unspecified thing here the handling of duplicate keys? @neilalexander if you agree it might be good to close this issue and open a clearer one. |
The IEEE 754 problems (floating precision, I think the root of the problem here is that round-tripping through arbitrary JSON implementations to canonicalise is inherently dangerous because there might be unpredictable behaviours in different implementations. For example, if you round-trip in Go and there are duplicate keys, you'll get the last value. If you do it in Swift, you get the first value. If you round-trip your over-precise float in Python, you get scientific notation. If you do it in Go, you don't. Ultimately we want to strip whitespace and ensure a consistent key ordering, but I'm not convinced there's a good reason to round-trip the actual values at all. Otherwise two implementations that disagree on how something should appear on the wire will not be able to compare signatures or hashes properly against those from another implementation.
"Do what Python's Not even the Python documentation describes all of those behaviours precisely, so what is anyone implementing in another language supposed to do? What happens if Python changes something? "Shortest UTF-8 JSON encoding" isn't much more useful either — for example, in the |
I think Synapse made a breaking change when disabling them, they aren't allowed in old room versions either (unlike floats, which are allowed in old versions)
Those aren't allowed in canonical JSON, but it's true that the non-canonical JSON in pre-v6 rooms is currently extremely inadequately specified (not specified at all, other than stating it may not be canonical)
I don't think there's any way to avoid roundtripping other than inventing a new out-of-band signature mechanism, since you need to remove the |
They aren't allowed in "today's canonical JSON", but it's worth noting that there's really two "canonical JSONs" here: the one which applies to room versions <= 5 and the other which applies to room version >= 6. We were using the term long before room version 6 came out. For what it's worth, federation request signing also suffers from similar problems as the JSON request body needs to be canonicalised in order to be signed too. It's using the exact same mechanism. If we're sending events in federation request bodies, as we do with Cross-signing is another thing that comes to mind. It matters in any place where we try to sign a blob of JSON in this way.
At the moment, gomatrixserverlib's canonicalisation works by parsing the structure of the JSON and the key values only — it does not round-trip the values themselves. It's a bit more manual but it is far more precise in its operation. In the absence of out-of-band signing, that seems like probably the best option for the future that we have, since it means there's no room for implementations to disagree on value formatting as long as we specify some simple rules for the keys. |
ok, even if there are outstanding issues here, I think there are at least four different issues being conflated here and I'd really rather we discussed them separately rather than attempting to conflate them all together:
|
Basically: each of those four things has different solutions. I don't think that conflating them all into one issue is helpful. |
if we already have solutions for points 1 & 2, then there's not much reason to split 3 out to its own issue: it'd just get fixed as part of the trio anyways. 4 is probably worthy of its own issue, though in fairness it's not likely to ever get looked at given it has such a large scope: it'd be nice to attach a roadmap to that issue if we plan to open it as a separate point. |
1 and 2 are a matter of clarifying the spec. 3 needs discussion about how best to approach it. So no, I don't agree that 3 will automatically get fixed as part of 1 & 2. |
Closing in favour of the following issues:
|
thanks! |
Link to problem area:
https://spec.matrix.org/v1.3/appendices/#canonical-json
Issue
The spec does not adequately and precisely specify what is required of a Canonical JSON implementation. It is therefore impossible to create fully interoperable implementations for the purposes of event signing or hashing.
For example, when Synapse canonicalises JSON, it round-trips (unmarshals it and then remarshals it) using
sort_keys=True
. The complete list of behaviours are not documented but this round-tripping can affect values in the JSON in surprising ways, i.e.integers that appear in scientific notation on the wire are reformatted into floats
1e6
->1000000.0
1e+2
->100.0
floats more precise than IEEE 754 can be rounded or reformatted (probably only affects older room versions)
1.0101010101010101010101010101010108
->1.0101010101010102
12345678901234567890.0
->1.2345678901234567e+19
unicode characters are always escaped, even though this is not mandated in JSON except for certain characters
💅
->\xf0\x9f\x92\x85
世界
->\xe4\xb8\x96\xe7\x95\x8c
numeric values
NaN
,Infinity
and-Infinity
are accepted because of IEEE 754 seemingly, but nothing requires a JSON implementation to use IEEE 754 (and indeed many JSON implementations won't allow these values at all)duplicate JSON keys are removed preserving the last value only, although this is an implementation-specific detail (and some JSON implementations will not pick the lexicographically last value)
Dendrite/gomatrixserverlib currently does not touch the values at all — excess whitespace is removed and keys are sorted but the values are not round-tripped during canonicalisation and therefore remain exactly as they appear on the wire (valid or invalid admittedly).
cc @jplatte
The text was updated successfully, but these errors were encountered: