Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonpb: change Marshal/Unmarshal to return error if any required field is not set #472

Merged
merged 7 commits into from
Jan 5, 2018
Merged

Conversation

cybrcodr
Copy link
Contributor

Change Marshal/Unmarshal to return error if any required field is not set.

For Unmarshal, this means JSON is either missing any required field or has required field set to null.

This is an initial attempt at fixing #440. While I've modified Marshal to produce an error for required fields that are not set, I simply reuse this in Unmarshal to check for required fields which is probably not as efficient.

Anyways, I won't have time to improve on this this week and am simply putting this PR out in case someone would like to try it out and/or feel free to pick it up from here.

I've also yet to test the impact of this change on our existing codebase.

is not set.

For Unmarshal, this means JSON is either missing any required field or
has required field set to null.
Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Marshaling code looks good, but the Unmarshal is a no-go. :(

jsonpb/jsonpb.go Outdated
}

// Marshal it back to check for missing required field error.
// TODO: Parse through the message and check for any missing required field w/o having to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO will need to be resolved before this could be considered for merging. Marshaling the JSON just after Unmarshaling is going to build in a lot of overhead that wouldn’t be a good idea to pass around. (Especially for everyone using proto3, where required fields have been entirely removed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this needs to be improved. Sorry if I wasn't clear in my initial posting. I didn't intend to merge this yet. Since I didn't have time this week to continue doing this, I decided to post this PR for others to try it out and see if anyone may be interested in improving it further.

Note though that a proto3 message can contain a proto2 message and vice-versa. So, as far as I know, the code will still need to traverse all the fields and subfields of the message looking for required ones that are not set.

@puellanivis
Copy link
Collaborator

Hm… you are correct, the best kind of correct.

Though, I think anyone using proto3 might have entirely forgotten that fields were even allowed to be required… or maybe just those of us who got the Google “required considered harmful” meme drilled forcefully into our head by SREs…

This change is to improve upon PR #472.  Instead of simply calling
Marshal within Unmarshal to check for required fields, traverse
the proto message instead looking only for required fields.
Copy link
Contributor Author

@cybrcodr cybrcodr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated Unmarshal's logic for checking required fields to traverse the message looking for those instead of simply calling Marshal as latter is most likely more inefficient.

The code assumes the use of field tags in proto messages as with most of the existing codebase. I'm not sure if this is good enough though.

I've added more tests as well for the use cases that I'm aware of.

jsonpb/jsonpb.go Outdated
// IsNil will panic on most value kinds.
switch value.Kind() {
case reflect.Chan, reflect.Func, reflect.Interface:
if value.IsNil() {
continue
}
case reflect.Ptr:
if prop.Required && value.IsNil() {
return fmt.Errorf("required field %q is not set", prop.Name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, for error messages like this, should I use Name or OrigName? Or is it dependent on whether message is for marshaling or unmarshaling?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties.Name is fine. It's even documented as such: "name of the field, for error messages"

@cybrcodr cybrcodr requested a review from dsnet December 20, 2017 23:22
jsonpb/jsonpb.go Outdated
// This function is called from Unmarshal and assumes certain actions are done ahead.
func checkRequiredFields(pb proto.Message) error {
// Most well-known type messages do not contain any required field. The "Any" type may have
// required fields, however the Unmarshaler should have invoked proto.Marshal on the value field
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean proto.Unmarshal here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. During unmarshaling of an "Any" JSON to a Go proto message, after it resolves and constructs the embedded message, it will actual do a proto marshal to a serialized form and store the bytes into Any.Value field. See https://github.com/golang/protobuf/blob/dev/jsonpb/jsonpb.go#L751.

I've updated the comment to explain this better.

Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, but I still have concerns… they maybe just require a code comment or two explaining things though.

jsonpb/jsonpb.go Outdated

func checkRequiredFieldsInValue(v reflect.Value) error {
if pm, ok := v.Interface().(proto.Message); ok {
if err := checkRequiredFields(pm); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn’t in a loop, then it’s not necessary to check the error value before returning it. i.e. you can just use return checkRequiredFields(pm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I previously have this block inlined above multiple times and then pulled it out to this helper func but forgot that this added check is no longer required. Fixed.

jsonpb/jsonpb.go Outdated

// IsNil will panic on most value kinds.
switch field.Kind() {
case reflect.Map:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this before, and you told me how a proto3 could contain a proto2 message, but maps are exclusive to proto3, so do they really need to be checked?

Consider adding some code comments explaining the rational of checking maps despite their appearance being expected to be mutually exclusive with required fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, maps are allowed in proto2. I have a test case for this.

jsonpb/jsonpb.go Outdated
return err
}

return checkRequiredFields(pb)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to work the code into UnmarshalNext, rather than iterate over all the fields a second time? (albeit, yes, fairly fast code.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. The current unmarshaling logic does rely on json.Unmarshal to populate fields. To embed the check for required in there means that it will still need to check for all fields that have been unmarshaled.

But you are correct that doing the check w/in the unmarshaling code may be more efficient. The unmarshaling code is quite complicated though that I prefer to avoid mixing this logic in right now. A refactor/rewrite of the unmarshaling code should account for this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jsonpb logic is pretty non-performant to begin with. We should aim for correctness in this PR.
This package is in need a major re-factoring (really, the entire protobuf repo). Let's revisit this when that refactoring occurs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though UnmarshalNext is used by almost nobody, why is this check in Unmarshal as opposed to UnmarshalNext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for catching. Fixed.

jsonpb/jsonpb.go Outdated
// Oneof fields need special handling.
if sfield.Tag.Get("protobuf_oneof") != "" {
// field is an interface containing &T{real_value}.
v := field.Elem().Elem() // interface -> *T -> T
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered the case where a oneof contains a simple scalar (string, bytes, varint), which in proto3 is not a pointer?

What will this code do in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current proto compiler will always produce an interface field for a oneof field. It will also produce wrapper structs, one for each oneof field. And hence, this block will grab the oneof field inside the wrapper struct which can be any type.

Added more comments and clean up based on feedback from PR#472.
Copy link
Contributor Author

@cybrcodr cybrcodr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I've updated the comments and did the clean up you suggested.

jsonpb/jsonpb.go Outdated
// Oneof fields need special handling.
if sfield.Tag.Get("protobuf_oneof") != "" {
// field is an interface containing &T{real_value}.
v := field.Elem().Elem() // interface -> *T -> T
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current proto compiler will always produce an interface field for a oneof field. It will also produce wrapper structs, one for each oneof field. And hence, this block will grab the oneof field inside the wrapper struct which can be any type.

jsonpb/jsonpb.go Outdated

// IsNil will panic on most value kinds.
switch field.Kind() {
case reflect.Map:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, maps are allowed in proto2. I have a test case for this.

jsonpb/jsonpb.go Outdated
return err
}

return checkRequiredFields(pb)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. The current unmarshaling logic does rely on json.Unmarshal to populate fields. To embed the check for required in there means that it will still need to check for all fields that have been unmarshaled.

But you are correct that doing the check w/in the unmarshaling code may be more efficient. The unmarshaling code is quite complicated though that I prefer to avoid mixing this logic in right now. A refactor/rewrite of the unmarshaling code should account for this logic.

jsonpb/jsonpb.go Outdated
// This function is called from Unmarshal and assumes certain actions are done ahead.
func checkRequiredFields(pb proto.Message) error {
// Most well-known type messages do not contain any required field. The "Any" type may have
// required fields, however the Unmarshaler should have invoked proto.Marshal on the value field
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. During unmarshaling of an "Any" JSON to a Go proto message, after it resolves and constructs the embedded message, it will actual do a proto marshal to a serialized form and store the bytes into Any.Value field. See https://github.com/golang/protobuf/blob/dev/jsonpb/jsonpb.go#L751.

I've updated the comment to explain this better.

jsonpb/jsonpb.go Outdated

func checkRequiredFieldsInValue(v reflect.Value) error {
if pm, ok := v.Interface().(proto.Message); ok {
if err := checkRequiredFields(pm); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I previously have this block inlined above multiple times and then pulled it out to this helper func but forgot that this added check is no longer required. Fixed.

jsonpb/jsonpb.go Outdated
// IsNil will panic on most value kinds.
switch value.Kind() {
case reflect.Chan, reflect.Func, reflect.Interface:
if value.IsNil() {
continue
}
case reflect.Ptr:
if prop.Required && value.IsNil() {
return fmt.Errorf("required field %q is not set", prop.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties.Name is fine. It's even documented as such: "name of the field, for error messages"

jsonpb/jsonpb.go Outdated
return err
}

return checkRequiredFields(pb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jsonpb logic is pretty non-performant to begin with. We should aim for correctness in this PR.
This package is in need a major re-factoring (really, the entire protobuf repo). Let's revisit this when that refactoring occurs.

jsonpb/jsonpb.go Outdated
return err
}

return checkRequiredFields(pb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though UnmarshalNext is used by almost nobody, why is this check in Unmarshal as opposed to UnmarshalNext?

jsonpb/jsonpb.go Outdated
return nil
}

v := reflect.ValueOf(pb).Elem()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to verify that pb is a pointer to a struct. If it isn't, then return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'm guessing we'll simply ignore message if not pointer to struct.

jsonpb/jsonpb.go Outdated
// A oneof field is an interface implemented by wrapper structs containing the actual oneof
// fields. Field is an interface containing &T{real_value}.
if sfield.Tag.Get("protobuf_oneof") != "" {
v := field.Elem().Elem() // interface -> *T -> T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check:

  1. for nil on both the interface.
  2. that the interface element is a pointer to a struct
  3. that the pointer is not nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return err
}
}
case reflect.Slice:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "bytes" scalar for proto will fall under this case and will need to be specially handled.
IIRC, []byte{} means that a zero-length bytes is set, while []byte(nil) means that it was never set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I forgot about that. Added handling of non-repeated slice.

Yes, I'm going with []byte{} to mean it is set.

Make checking in Marshal to be done up front in order to account for
custom marshalers, i.e. do not rely on custom marshalers to do the
checking.

Reuse the same checkRequiredFields func on both Marshal and Unmarshal.

Added more check logic per feedback from PR#472.
Copy link
Contributor Author

@cybrcodr cybrcodr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

Per conversation with @dsnet, I've also changed Marshal to call checkRequiredFields before it marshals the given message. This is for consistency in ensuring that the required field checking logic is done by jsonpb and not rely on any custom marshaler/unmarshaler.

jsonpb/jsonpb.go Outdated
return err
}

return checkRequiredFields(pb)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for catching. Fixed.

jsonpb/jsonpb.go Outdated
return nil
}

v := reflect.ValueOf(pb).Elem()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'm guessing we'll simply ignore message if not pointer to struct.

return err
}
}
case reflect.Slice:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I forgot about that. Added handling of non-repeated slice.

Yes, I'm going with []byte{} to mean it is set.

jsonpb/jsonpb.go Outdated
// A oneof field is an interface implemented by wrapper structs containing the actual oneof
// fields. Field is an interface containing &T{real_value}.
if sfield.Tag.Get("protobuf_oneof") != "" {
v := field.Elem().Elem() // interface -> *T -> T
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cybrcodr
Copy link
Contributor Author

cybrcodr commented Jan 4, 2018

Thanks for the review, @dsnet.

@puellanivis do you have more feedback/comments on this PR before I merged it in? Thanks.

Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entirely superficial/nitpicky concerns.

jsonpb/jsonpb.go Outdated
continue
}
v = v.Elem()
if v.Kind() != reflect.Struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& v.NumFields() > 0 maybe, just to be sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to check for NumField() == 0 so that Field(0) below won't panic. That would be a bug in protoc-gen-go, but I went ahead and added the check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but any value of NumFields() less than 0 would be an error state. And checking for greater than zero, and “is not zero” is either: identical with unsigned ints, or just extra CYA for signed ints.

Sure NumField shouldn’t ever return a negative number, but iI find it to generally be a good idea to remember that technically it might… even if only erroneously. So, I typically prefer to avoid weak/narrow tests, when a stronger/broader test is available for the same cost. (I also usually test if len(array) < 1 rather than just len(array) == 0 for the same reasons.)

It could be viewed as an example of “overengineering” but I prefer to cover all load-scenarios rather than only the specific load-scenarios that one thinks might cause a problem.

I mean, if you can order 100 nuts for $100 in either 3 lbs load-hanlding, or 1 lb load-handling, even if you expect that you will only need at most 1 lb of load-handling, it doesn’t cost anything extra to get the higher load handling ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@cybrcodr cybrcodr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

jsonpb/jsonpb.go Outdated
continue
}
v = v.Elem()
if v.Kind() != reflect.Struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to check for NumField() == 0 so that Field(0) below won't panic. That would be a bug in protoc-gen-go, but I went ahead and added the check.

Copy link
Contributor Author

@cybrcodr cybrcodr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm going to merge this now. If there are other concerns, we can always create another PR to fix.

jsonpb/jsonpb.go Outdated
continue
}
v = v.Elem()
if v.Kind() != reflect.Struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jhump
Copy link
Contributor

jhump commented Feb 2, 2018

FYI, this change effectively breaks the support for dynamic/custom messages that was added in #325. The way in which required fields are checked herein assumes that the message is normal generated message, without providing a way to override/customize.

In particular, if a given struct has any unexported field that contains a pointer value (including a slice or map with an element that contains a pointer value) and that pointer value is not nil, then this will cause a panic here:

panic: reflect.Value.Interface: cannot return value obtained from unexported field or method 

I'll open an issue for it.

dsnet pushed a commit that referenced this pull request Feb 3, 2018
As mentioned in #509, jsonpb panics when marshaling/unmarshaling a non-generated message that has an unexported pointer field. This change will restore dynamic messages to working with jsonpb (they work fine in master; just broken in this dev branch as of #472).
@golang golang locked and limited conversation to collaborators Jun 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants