-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protojson output is non deterministic #1373
Comments
This is operating as intended:
|
I don't really understand how this library is intended to be used in the real world then. Is it just for debugging, and not production usage? Storing protos as json is pretty useful for a wide variety of use cases. Inserting non determinism makes this extremely hard to do, even returning random spaces in a JSON response blob on a website is a bit sloppy. It is not something that is present in the go ecosystem in the legacy protobuf library, gogo protobuf, or Running output through a json formatter is not viable as we experience an incredibly high runtime cost. In the current state this will block our adoption of the library. We rely on outputting protos to json and triggering events when that response changes. If each build may have completely different results, then that becomes very tricky. Its one thing if its a per-protobuf-version thing - then it will just switch on upgrade which is NBD. But randomly switching for every different binary is much harder to handle. Its also impossible to unit test. This is clear from protobuf's own tests putting a magic Is it not reasonable to add an option to be stable? Just because the spec doesn't declare that the format MUST be exactly formatted one way, that doesn't mean the library couldn't pick one way and stick to it. Or at the very least making the change boundary a protobuf version. |
Most of the "real world" just marshals and unmarshals data through gRPC and don't care about the exact serialized form. Non-determinism doesn't interfere with storing protos as JSON since you can still unmarshal the data when reading it out of a database.
It's appropriate for a module to disable it's own non-determinism since it's in full control of the implementation. It's similar to how the tests for the Go runtime itself may disable non-determinism of maps and other runtime artifacts for the sake of testing the Go runtime. The Go runtime tests are allowed to do that since it fully controls the Go runtime implementation.
Not unless there is a well-specified approach for how each language should serialize a stable version of a protobuf message. I recommend asking the protobuf team to prioritize having that behavior be specified. When I was a Google, I wrote a 30 page specification (with examples) and a reference implementation for canonically serializing a message. However, that effort didn't really go anywhere as it wasn't a priority for the protobuf team. The best way to get this to be available is to go to the protobuf team and help them see this a priority.
If every language did that, then we would just end up with:
And none of them would be compatible with each other. If you depended on stability and then migrated your application from C++ to Go or something, then you're stuck since you depending on an artifact of a specific implementation. Also, you should be able to derive a stable database key from the canonical output and use it to index into a database regardless of which programming language was used to serialize the key. The canonical specifications that I wrote are trying to solve this problem in a way that all the language implementations can have the exact same semantics. |
Also, this is probably a duplicate of #1121 |
Unfortunately, protobuf offers no story whatsoever for canonicalization of protobuf or protojson output. It seems the best we can get is to just make it deterministic within each implementation. Related issues: * golang/protobuf#1121 * golang/protobuf#1373
What version of protobuf and what language are you using?
v1.27.1, golang (tried 1.15, 1.16, and 1.17)
What did you do?
Wrote a small code snippet using protojson to marshal some protobuf to json.
What did you expect to see?
Output is consistent.
What did you see instead?
Output is not consistent based on compilation. For a given set of code + compiler settings, the output seems deterministic. However, making random changes (literally like
_ = 1
) will, seemingly randomly, result in a different marshalling output.I am testing like:
Test code:
Example outcome:
Note the 4th result has an extra space. Which of the tests adds space is not deterministic either and changes as random changes to the code are made.
jsonpb results are always constant.
This can be reproduced at howardjohn/istio@5d29413. I am able to reproduce it on two different linux amd64 machines, as well as in docker.
To me, this suggests a compiler bug but opening it here first since it seems to only impact protojson.
The text was updated successfully, but these errors were encountered: