Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to include fields that are equal to the default value in the JSON encoding #1526

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrabiciu
Copy link

This adds the ability to include fields that are equal to the default value in the json encoding.

This is done by adding a traversalOptions var to Visitor as well as adding a includeDefaultValues to JSONEncodingOptions which controls the JSONEncodingVisitor's traversalOptions

I've modified the generation of traverse to check traversalOptions before calling visit on each individual field.

@mrabiciu
Copy link
Author

mrabiciu commented Dec 13, 2023

This commit has all of the functional changes 59a6212

@thomasvl
Copy link
Collaborator

Do the other languages end up encoding the empty arrays when visiting default values? i.e. - does it make sense to visit on those or just skip the visit calls still? We probably want to make sure our JSON support ends up doing the same things the other upstream languages do.

In any case, since this is public api, we should spell out in the comments what the expected behavior is for all cases:

  • repeated fields
  • map fields
  • extension fields (all types)
  • proto2 syntax files (since non repeated field types have presence all have presence)
  • proto3 syntax files where the field isn't optional (expect the visit)
  • proto3 syntax files where a field is optional (likely should be the same as proto2 syntax files)

And then use a custom listener in the test to ensure all these behaviors for all the different message types.

@thomasvl
Copy link
Collaborator

For the comments spelling out the behaviors, I guess we should use the terms from the upstream pages:

@thomasvl thomasvl requested a review from tbkka December 13, 2023 20:39
@mrabiciu
Copy link
Author

Do the other languages end up encoding the empty arrays when visiting default values?

Yes I'm using the python package for comparison and if you set including_default_value_fields=True it includes empty lists and empty maps.

https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

including_default_value_fields – If True, singular primitive fields, repeated fields, and map fields will always be serialized. If False, only serialize non-empty fields. Singular message fields and oneof fields are not affected by this option.

@mrabiciu
Copy link
Author

And then use a custom listener in the test to ensure all these behaviors for all the different message types

I'm not sure what you mean by that. Is an encoding test that asserts the encoded json is as expected not enough?

@mrabiciu mrabiciu force-pushed the simple-default-traversal branch from 7319183 to 6d44684 Compare December 14, 2023 00:15
@thomasvl
Copy link
Collaborator

And then use a custom listener in the test to ensure all these behaviors for all the different message types

I'm not sure what you mean by that. Is an encoding test that asserts the encoded json is as expected not enough?

Haven't looked in detail, it might be. I wasn't sure if json had any specific logic of it's own vs. ensure all the methods expected did/didn't get called.

@thomasvl
Copy link
Collaborator

I finally got a chance to start looking at this, and I'm going to suggest we step back to hopefully won't be a bike shed –

As drafted, the PR calls the JSONEncodingOptions addition includeDefaultValues. And Visitor gets TraversalOptions which has visitDefaultValues.

The looking at the C++ api, their JSON is always_print_primitive_fields

The Java api has: alwaysOutputDefaultValueFields & includingDefaultValueFields

I don't think we want to eventually try to do the level of the api that Java has (the Set support would really complicate the visitor pattern).

So on the naming side, my 2¢ would be:

I think default gets a little confusing around the behavior with repeated fields and map<> fields. So I think I sorta like the C++ naming because it is about primitive fields, there still is a minor stretch in that a repeated or map field is primitive (it's an array or a map), but it a single message field clearly isn't a primitive field. So…

  • Visitor gets TraversalOptions with alwaysVisitPrimitiveFields.
  • JSONEncodingOptions gets alwaysPrintPrimitiveFields

The JSON value lines up with the two other names we already have that start alwaysPrint…, and the naming gnerally lines up with the C++ api (which is what I think we normally look to).

Thoughts?

@tbkka what makes sense to you?

@tbkka
Copy link
Collaborator

tbkka commented Dec 14, 2023

I like Thomas' naming ideas here.

Copy link
Collaborator

@thomasvl thomasvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the test case added is in the regenerate commit, so I almost missed that addition.

@@ -47,6 +47,10 @@ class MessageFieldGenerator: FieldGeneratorBase, FieldGenerator {
return false
}
}

var usesDefaultValueFlagForTraversal: Bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we don't want to have to check syntax any more (working towards Editions). Would just looking at say hasFieldPresence and if the field is repeated/map vs. singular work here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference in behavior between proto2 and proto3 though. At least in the python lib which I was using for reference. In proto3 unset optional primitives are omitted from json, and in proto2 unset optional primitives are included. I'm not sure if theres a better way to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Descriptor support should already be tracking most of that: https://github.com/apple/swift-protobuf/blob/main/Sources/SwiftProtobufPluginLibrary/Descriptor.swift#L638-L642 What's the python code you're looking at for reference?

Copy link
Author

@mrabiciu mrabiciu Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at python code, just observed how the python lib behaves when serializing a message with every kind of field type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of what I mean

Proto3 behavior

syntax = "proto3";

message Proto3Message {
    int32 singularInt32 = 1;
    repeated int32 repeatedInt32 = 2;
    optional int32 optionalInt32 = 3;
}
print(MessageToJson(Proto3Message(), including_default_value_fields=True))
{
  "singularInt32": 0,
  "repeatedInt32": []
}

Proto2 behavior

message Proto2Message {
    required int32 singularInt32 = 1;
    repeated int32 repeatedInt32 = 2;
    optional int32 optionalInt32 = 3;
}
print(MessageToJson(Proto2Message(), including_default_value_fields=True))
{
  "singularInt32": 0,
  "repeatedInt32": [],
  "optionalInt32": 0
}

hasPresence will be true for optionalInt32 for both proto2 and proto3 but we want to print it in proto2 and omit it in proto3

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I can work on that. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a docs rollout is happening:

https://protobuf.dev/programming-guides/proto2/#json-options

Always emit fields without presence: Fields that don’t support presence and that have their default value are omitted by default in JSON output (for example, an implicit presence integer with a 0 value, implicit presence string fields that are empty strings, and empty repeated and map fields). An implementation may provide an option to override this behavior and output fields with their default values.

As of v25.x, the C++, Java, and Python implementations are nonconformant, as this flag affects proto2 optional fields but not proto3 optional fields. A fix is planned for a future release.

https://protobuf.dev/programming-guides/proto3/#json-options

Always emit fields without presence: Fields that don’t support presence and that have their default value are omitted by default in JSON output (for example, an implicit presence integer with a 0 value, implicit presence string fields that are empty strings, and empty repeated and map fields). An implementation may provide an option to override this behavior and output fields with their default values.

Copy link
Author

@mrabiciu mrabiciu Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by those docs for proto2

(for example, an implicit presence integer with a 0 value, implicit presence string fields that are empty strings, and empty repeated and map fields). An implementation may provide an option to override this behavior and output fields with their default values.

Don't integer and string fields always have presence under proto2? Under what scenario would we actually emit these in proto2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic to just use hasPresence for proto2 the same way as in proto3. But I think those docs could use some clarification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to open an issue upstream with the feedback. Some times it helps to hear directly from devs.

Sources/protoc-gen-swift/MessageFieldGenerator.swift Outdated Show resolved Hide resolved
Sources/protoc-gen-swift/MessageFieldGenerator.swift Outdated Show resolved Hide resolved
Sources/protoc-gen-swift/MessageFieldGenerator.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/Visitor.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/Visitor.swift Outdated Show resolved Hide resolved
@thomasvl
Copy link
Collaborator

@mrabiciu looks like you might need to sync also to resolve some conflicts.

@thomasvl
Copy link
Collaborator

@tbkka can you take a look, I think given the new docs from the protobuf team, we're likely pretty good to go on this.

@mrabiciu mrabiciu force-pushed the simple-default-traversal branch from 8b366c4 to 49c7175 Compare February 21, 2024 20:55
@mrabiciu
Copy link
Author

Rebased on main and squashed all my commits

@mrabiciu mrabiciu force-pushed the simple-default-traversal branch from 49c7175 to ba9708f Compare February 22, 2024 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants