Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds safeyaml.JSONtoYAML for safer marshaling #139

Merged
merged 6 commits into from
Jan 18, 2019
Merged

Conversation

bmoylan
Copy link
Contributor

@bmoylan bmoylan commented Jan 16, 2019

This change is Reviewable


// JSONtoYAML converts json data to yaml while preserving order of object fields.
// Invoke yaml.Marshal on the returned interface{}.
func JSONtoYAML(jsonBytes []byte) (interface{}, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to feedback on this API; other options are

func JSONtoYAML(jsonBytes []byte) (yamlBytes []byte, err error) {}
func JSONtoYAML(jsonMarshaler interface{}) (yamlBytes []byte, err error) {}
func JSONtoYAML(jsonMarshaler interface{}) (yamlMarshaler interface{}, err error) {}

@bmoylan bmoylan requested a review from nmiyake January 16, 2019 20:05
Copy link
Contributor

@nmiyake nmiyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @bmoylan)


safeyaml/jsontoyaml.go, line 17 at r1 (raw file):

Previously, bmoylan (Brad Moylan) wrote…

Open to feedback on this API; other options are

func JSONtoYAML(jsonBytes []byte) (yamlBytes []byte, err error) {}
func JSONtoYAML(jsonMarshaler interface{}) (yamlBytes []byte, err error) {}
func JSONtoYAML(jsonMarshaler interface{}) (yamlMarshaler interface{}, err error) {}

My preference is with the first signature/approach -- with that one, it's explicit that JSON bytes are being converted to YAML bytes. It also has the advantage that the use of yaml.v2 becomes an internal implementation detail -- the function could later be modified/altered to use a completely different library (or an internal shaded version of yaml.v2 etc.) and would still behave as described.

It does mean that the function will be making its own decision about decoders/encoders etc., but I think that's a pretty acceptable trade-off (especially if the function is modeled as a transform of []byte input from JSON to YAML representation)


safeyaml/jsontoyaml.go, line 23 at r1 (raw file):

}

var errClosingArrayDelim = fmt.Errorf("unexpected ']' delimiter")

Since these errors are not exported, may make sense to simply construct and return them directly (see other comment)


safeyaml/jsontoyaml.go, line 26 at r1 (raw file):

var errClosingObjectDelim = fmt.Errorf("unexpected '}' delimiter")

func tokenizerToYaml(dec *json.Decoder) (interface{}, error) {

Should be tokenizerToYAML


safeyaml/jsontoyaml.go, line 36 at r1 (raw file):

	switch v := tok.(type) {
	case string:
		return v, nil

Can unify the common cases:

case string, bool, float64:
  return v, nil

safeyaml/jsontoyaml.go, line 83 at r1 (raw file):

		case ']':
			return nil, errClosingArrayDelim
		case '}':

Can unify the cases as:

		case ']', '}':
			return nil, fmt.Errorf("unexpected '%v' delimiter", v)

Copy link
Contributor Author

@bmoylan bmoylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @nmiyake)


safeyaml/jsontoyaml.go, line 17 at r1 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

My preference is with the first signature/approach -- with that one, it's explicit that JSON bytes are being converted to YAML bytes. It also has the advantage that the use of yaml.v2 becomes an internal implementation detail -- the function could later be modified/altered to use a completely different library (or an internal shaded version of yaml.v2 etc.) and would still behave as described.

It does mean that the function will be making its own decision about decoders/encoders etc., but I think that's a pretty acceptable trade-off (especially if the function is modeled as a transform of []byte input from JSON to YAML representation)

I added JSONtoYAMLBytes but left the existing JSONtoYAML so it can be used in MarshalYAML implementations.


safeyaml/jsontoyaml.go, line 23 at r1 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Since these errors are not exported, may make sense to simply construct and return them directly (see other comment)

I made the top-level variables because I check equality on them to break out of the for loops iterating over arrays and objects below


safeyaml/jsontoyaml.go, line 26 at r1 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Should be tokenizerToYAML

Done.


safeyaml/jsontoyaml.go, line 36 at r1 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Can unify the common cases:

case string, bool, float64:
  return v, nil

Done.


safeyaml/jsontoyaml.go, line 83 at r1 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Can unify the cases as:

		case ']', '}':
			return nil, fmt.Errorf("unexpected '%v' delimiter", v)

No, we need to know which type of collection we are leaving (hence the two error variables, one for each for loop above)

Copy link
Contributor

@nmiyake nmiyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @bmoylan)


safeyaml/jsontoyaml.go, line 17 at r1 (raw file):

Previously, bmoylan (Brad Moylan) wrote…

I added JSONtoYAMLBytes but left the existing JSONtoYAML so it can be used in MarshalYAML implementations.

Would DecodeJSON (fully qualified as safeyaml.DecodeJSON) or JSONtoMapSlice be more precise?

Maybe this is just my particular bias, but when I see "YAML" I generally think of YAML bytes/string, so one of the naming schemes described here would be more intuitive to me. However, don't feel super strongly about this, so also fine to proceed with current naming if you strongly prefer that.


safeyaml/jsontoyaml.go, line 83 at r1 (raw file):

Previously, bmoylan (Brad Moylan) wrote…

No, we need to know which type of collection we are leaving (hence the two error variables, one for each for loop above)

Ah I see, so there are valid cases where the error is returned but the error is just interpreted as control flow instead. Got it.


safeyaml/jsontoyaml.go, line 15 at r2 (raw file):

)

// JSONtoYAML converts json data to yaml while preserving order of object fields.

Top-line summary should be updated to more precisely reflect functionality. Something like:

JSONtoYAML decodes JSON bytes into an interface where all objects and maps are decoded as a yaml.MapSlice to preserve key ordering. This can be useful if a type defines custom JSON marshal logic and translating the JSON representation to a YAML representation while preserving key ordering is sufficient for serialization. For example: ...

You don't have to use this exact wording, but do think the current wording (particularly, "converts json data to yaml") should be changed to be more precise ("decoding JSON data using yaml.MapSlice for maps and objects").


safeyaml/jsontoyaml.go, line 30 at r2 (raw file):

	dec := json.NewDecoder(bytes.NewReader(jsonBytes))
	dec.UseNumber()
	return tokenizerToYAML(dec)

After doing some local testing, it looks like there's one class of cases that isn't caught here, which is the case where the input contains values after a valid JSON block.

Simplest example is {}{. The built-in JSON functions will error with invalid character '{' after top-level value, while the tokenizer-based function will ignore everything after the valid JSON and thus successfully return an empty object (see https://stackoverflow.com/a/25187779 for link to source code where this is checked using scanner).

After playing around locally, I think it's sufficient to check whether dec.More returns true after successful parsing. Would recommend something like:

        val, err := tokenizerToYAML(dec)
	if err != nil {
		return val, err
	}
	if dec.More() {
		errText := "invalid input after top-level value"
		if buffered, err := ioutil.ReadAll(dec.Buffered()); err == nil {
			errText += fmt.Sprintf(": %q", string(buffered))
		}
		return nil, fmt.Errorf(errText)
	}
	return val, nil

(We could also just skip reading the buffered portion and just say that there's input after the top-level value without specifying what, but I think the information is helpful).

Should also add a test case to test for this.

Based on my testing I don't think the addition of the test for More breaks anything, but we should verify.

Copy link
Contributor Author

@bmoylan bmoylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @nmiyake)


safeyaml/jsontoyaml.go, line 17 at r1 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Would DecodeJSON (fully qualified as safeyaml.DecodeJSON) or JSONtoMapSlice be more precise?

Maybe this is just my particular bias, but when I see "YAML" I generally think of YAML bytes/string, so one of the naming schemes described here would be more intuitive to me. However, don't feel super strongly about this, so also fine to proceed with current naming if you strongly prefer that.

Went with safeyaml.JSONtoYAMLMapSlice


safeyaml/jsontoyaml.go, line 15 at r2 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Top-line summary should be updated to more precisely reflect functionality. Something like:

JSONtoYAML decodes JSON bytes into an interface where all objects and maps are decoded as a yaml.MapSlice to preserve key ordering. This can be useful if a type defines custom JSON marshal logic and translating the JSON representation to a YAML representation while preserving key ordering is sufficient for serialization. For example: ...

You don't have to use this exact wording, but do think the current wording (particularly, "converts json data to yaml") should be changed to be more precise ("decoding JSON data using yaml.MapSlice for maps and objects").

Done.


safeyaml/jsontoyaml.go, line 30 at r2 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

After doing some local testing, it looks like there's one class of cases that isn't caught here, which is the case where the input contains values after a valid JSON block.

Simplest example is {}{. The built-in JSON functions will error with invalid character '{' after top-level value, while the tokenizer-based function will ignore everything after the valid JSON and thus successfully return an empty object (see https://stackoverflow.com/a/25187779 for link to source code where this is checked using scanner).

After playing around locally, I think it's sufficient to check whether dec.More returns true after successful parsing. Would recommend something like:

        val, err := tokenizerToYAML(dec)
	if err != nil {
		return val, err
	}
	if dec.More() {
		errText := "invalid input after top-level value"
		if buffered, err := ioutil.ReadAll(dec.Buffered()); err == nil {
			errText += fmt.Sprintf(": %q", string(buffered))
		}
		return nil, fmt.Errorf(errText)
	}
	return val, nil

(We could also just skip reading the buffered portion and just say that there's input after the top-level value without specifying what, but I think the information is helpful).

Should also add a test case to test for this.

Based on my testing I don't think the addition of the test for More breaks anything, but we should verify.

Done, though I decided not to include the extra content just to ensure we do not leak anything sensitive.

Copy link
Contributor

@nmiyake nmiyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some minor feedback.

Reviewed 2 of 2 files at r3.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @nmiyake and @bmoylan)


safeyaml/jsontoyaml.go, line 30 at r2 (raw file):

Previously, bmoylan (Brad Moylan) wrote…

Done, though I decided not to include the extra content just to ensure we do not leak anything sensitive.

Yup, makes sense


safeyaml/jsontoyaml.go, line 35 at r3 (raw file):

	}
	if dec.More() {
		return nil, fmt.Errorf("failed to convert json to yaml: invalid input after top-level json value")

Since this is modeled as a library function, would prefer to keep this more compact and analogous to the native output: "invalid character input after top-level value”

Client libraries can add further context (like the fact that the operation is semantically performing a JSON -> YAML conversion) in their error message if they so choose


safeyaml/jsontoyaml_test.go, line 58 at r3 (raw file):

			} else {
				require.Error(t, err)
				require.Contains(t, err.Error(), test.Err)

Since the content isn't dynamic, can we just assert equality rather than "Contains"?

Copy link
Contributor Author

@bmoylan bmoylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @nmiyake)


safeyaml/jsontoyaml.go, line 35 at r3 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Since this is modeled as a library function, would prefer to keep this more compact and analogous to the native output: "invalid character input after top-level value”

Client libraries can add further context (like the fact that the operation is semantically performing a JSON -> YAML conversion) in their error message if they so choose

Done.


safeyaml/jsontoyaml_test.go, line 58 at r3 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Since the content isn't dynamic, can we just assert equality rather than "Contains"?

Done.

Copy link
Contributor

@nmiyake nmiyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @bmoylan)


safeyaml/jsontoyaml.go, line 35 at r4 (raw file):

	}
	if dec.More() {
		return nil, fmt.Errorf("invalid input after top-level json value")

Capitalize "JSON"

Copy link
Contributor Author

@bmoylan bmoylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @nmiyake)


safeyaml/jsontoyaml.go, line 35 at r4 (raw file):

Previously, nmiyake (Nick Miyake) wrote…

Capitalize "JSON"

Done.

@nmiyake
Copy link
Contributor

nmiyake commented Jan 18, 2019

Reviewable isn't allowing submission of review, but lgtm -- thanks!

@nmiyake nmiyake merged commit 4862f9d into master Jan 18, 2019
@nmiyake nmiyake deleted the bm/jsontoyaml branch January 18, 2019 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants