Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/protojson: opt out of whitespace randomization #1082

Closed
kevinburke1 opened this issue Apr 13, 2020 · 7 comments
Closed

encoding/protojson: opt out of whitespace randomization #1082

kevinburke1 opened this issue Apr 13, 2020 · 7 comments

Comments

@kevinburke1
Copy link

Commit protocolbuffers/protobuf-go@582ab3d added randomization to the multi-line json output.

I understand there are good reasons for doing this. I hope you will understand I also have good reasons for not doing this. Among others, generating a protobuf with fake data, re-rendering it on every test run is one of the best ways to detect drift in the configuration, accidental or purposeful.

I can handle the output changing across multiple versions of the protobuf library. It's much easier for me to sync the library version than it is to sync Go versions. If there are breaking changes that generate a diff, I can manage them all at once, instead of

Is there any way to opt out? A UnsafeDisableRandomization flag, or something. Right now I am probably going to make a second pass with a marshaler that does not generate random whitespace.

@dsnet
Copy link
Member

dsnet commented Apr 13, 2020

The reason why we were unable to make improvements to both text and JSON was because of the many hard-coded tests that expected the output to be byte-for-byte identical. We're not going to repeat the mistake again with the new prototext and protojson implementations.

I understand that stable output has benefits, but unless there is a canonical definition (which there isn't) for how to serialize protobufs in text and JSON, we're not going to pretend like one exists.

If you want a frozen output, then use github.com/golang/jsonpb. The output will be stable, but will also continue to have buggy behavior in some cases and it won't received any improvements. If you want a decent amount of stability, then pass the output through a formatter. For example, just parsing the JSON with encoding/json and formatting it again should provide some degree of stability.

@dsnet dsnet changed the title protojson: opt out of whitespace randomization encoding/protojson: opt out of whitespace randomization Apr 13, 2020
@kevinburke1
Copy link
Author

kevinburke1 commented Apr 13, 2020

This is what I'm doing, for the moment

	marshaler = &protojson.MarshalOptions{
		// no point turning this on if we're reformatting immediately
		// Multiline:       true,
		EmitUnpopulated: true,
		Indent:          "",
		UseProtoNames:   true,
	}
	data, err := marshaler.Marshal(cfg)
	if err != nil {
		return err
	}
	var rm json.RawMessage = data
	data2, err := json.MarshalIndent(rm, "", " ")
	if err != nil {
		return err
	}
	data2 = append(data2, '\n')
	return w.Write(data2)

@dsnet
Copy link
Member

dsnet commented Apr 13, 2020

That seems like a reasonable workaround?

@dsnet
Copy link
Member

dsnet commented May 1, 2020

Closing as we're not going to be changing the randomization. Analysis of all tests at Google (we have a lot) for text and JSON usages indicates that 89.5% do not care about the exact output. Of the remaining, 10.4% were brittle tests that really should have been written to perform a semantic comparison on the structured messages (rather than a byte-for-byte comparison on the serialized output). The remaining 0.1% were cases like generated configurations and/or code where a degree of stability is actually needed. For that 0.1%, the workaround is to pass the output to a JSON formatter or text formatter to produce more stable output.

If randomization pushes the 10.4% to properly perform semantic comparisons, then I'll be really happy. Since a workaround exists for the 0.1% with a legitimate need for some degree of stability, I don't think a opt-out is justified.

@brandonchinn178
Copy link

brandonchinn178 commented Apr 19, 2024

I don't think a opt-out is justified.

It's slightly amusing that your own tests disable randomization, and yet you don't want to expose that to users 😅

https://github.com/protocolbuffers/protobuf-go/blob/c2a26e757ed57746693fca53de0b6136a8e46b74/encoding/protojson/encode_test.go#L32-L33

@dsnet
Copy link
Member

dsnet commented Apr 19, 2024

Those frustrated by the randomization should be focusing on the development of a stable output format: #1121

@puellanivis
Copy link
Collaborator

I don't think a opt-out is justified.

It's slightly amusing that your own tests disable randomization, and yet you don't want to expose that to users 😅

The protobuf library is subject to the contract of the standardization. However, certain things (e.g. ordering of fields in a message) are undefined in the contract, but this library is still an authority on what its own self should be producing. Knowing how we have chosen those implementation details, we can generate golden tests dependent upon our own implementation details. We remain, after all, the authority on what we expect our code to generate, even if the contract is silent on certain details. We’re allowed to make stronger assumptions about the output than the standard provides. After all, if we change our output, we can also change our tests at the same time, and then the deltas of golden examples even documents how our output has changed.

Meanwhile, users of this package are not in that same situation. If this package changed the ordering of fields, it would break anyone else using golden test, even though the two variations are semantically identical. Their tests would break, even though the output itself remains valid and within the standard. So their golden tests fail, even though their code has not broken. It’s generally bad form to break people’s tests for non-functional issues.

Trying to keep Hyrum’s Law at bay is part of us looking forward to a potentially future formal clarification of deterministic output. We should hope that even if that standard requires us to make drastic changes, that no one using this package now should break tests one we implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants