Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml unmarshal for OpenAPIv2 types #279

Closed

Conversation

alexzielenski
Copy link
Contributor

@alexzielenski alexzielenski commented Feb 19, 2022

There is no fast way to deserialize protobuf into kube-openapi types. The obvious choice is to follow this sequence:

PB -> google/gnostic -> JSON -> kube-openapi

However in CLI tools this is slow. On mine and others' systems it takes over half a second to just do the conversion, let alone the operation the user requested. Too slow for interactivity.

This PR facilitates the intermediate step necessary to perform the following faster conversion:

PB -> google/gnostic -> yaml.v3.Node -> kube-openapi

This method compares favorably to the choice of using JSON as an intermediary.

OpenAPI V2 Conversion Benchmarks:
These benchmarks start with swagger.json pulled from a running k8s cluster and measure conversions between various representations of the openapi spec. In the below tests swagger refers to kube-openapi's pkg/validation/spec.Swagger

Ran a benchmark to compare different conversion:
https://github.com/alexzielenski/kube-openapi-gnostic-benchmark

Using JSON:

BenchmarkSlowConversion/json->swagger-8         	       2	 557015227 ns/op	95739288 B/op	 1381635 allocs/op
BenchmarkSlowConversion/swagger->json-8         	       8	 128882670 ns/op	71913162 B/op	  274072 allocs/op
BenchmarkSlowConversion/json->gnostic-8         	       5	 216769974 ns/op	81453435 B/op	 1248390 allocs/op
BenchmarkSlowConversion/gnostic->pb-8           	     121	   9842726 ns/op	 2899970 B/op	       1 allocs/op
BenchmarkSlowConversion/pb->gnostic-8           	     100	  15442423 ns/op	 9480707 B/op	  123829 allocs/op
BenchmarkSlowConversion/gnostic->yaml-8         	      42	  28809063 ns/op	32855155 B/op	  264562 allocs/op

BenchmarkSlowConversion/yaml->json-8            	      20	  61439852 ns/op	25325099 B/op	  463786 allocs/op
BenchmarkSlowConversion/json->swagger#01-8      	       2	 524846406 ns/op	95787616 B/op	 1381624 allocs/op

Using YAML:

BenchmarkFastConversion/json->swagger-8         	       2	 548771326 ns/op	95736376 B/op	 1381630 allocs/op
BenchmarkFastConversion/swagger->json-8         	       8	 147762185 ns/op	67593998 B/op	  274060 allocs/op
BenchmarkFastConversion/json->gnostic-8         	       5	 228517622 ns/op	81446896 B/op	 1248395 allocs/op
BenchmarkFastConversion/gnostic->pb-8           	     100	  10217647 ns/op	 2899968 B/op	       1 allocs/op
BenchmarkFastConversion/pb->gnostic-8           	     100	  12665889 ns/op	 9480707 B/op	  123829 allocs/op
BenchmarkFastConversion/gnostic->yaml-8         	      37	  27832678 ns/op	32855294 B/op	  264562 allocs/op

BenchmarkFastConversion/yaml->swagger-8         	      21	  58512404 ns/op	23687182 B/op	  510369 allocs/op

yaml->swagger at 58.5ms is 10x faster than yaml->json + json->swagger#01 at 586.2ms

OpenAPI V3 yaml patch is forthcoming in another PR

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 19, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @alexzielenski!

It looks like this is your first PR to kubernetes/kube-openapi 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-openapi has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 19, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alexzielenski
To complete the pull request process, please assign sttts after the PR has been reviewed.
You can assign the PR to them by writing /assign @sttts in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alexzielenski alexzielenski changed the title [WIP] yaml unmarshal for OpenAPI types yaml unmarshal for OpenAPI types Feb 25, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2022
@alexzielenski alexzielenski changed the title yaml unmarshal for OpenAPI types yaml unmarshal for OpenAPIv2 types Feb 25, 2022
@alexzielenski
Copy link
Contributor Author

/assign @Jefftree

@mengqiy
Copy link
Member

mengqiy commented Feb 28, 2022

json->spec.swagger 550 ms
yaml->spec.swagger 58 ms
pd->spec.swagger = PB -> google/gnostic -> yaml.v3.Node -> spec.swagger 99 ms

It seems the fastest way to get a spec.swagger is to deserialize yaml, though the apiserver doesn't serve openapi in yaml format. I guess when using it in a CLI tool, we can convert json or pd to yaml once and reuse it by deserializing yaml when needed to get faster performance.

yamlNode := &yaml.Node{
	Kind:        yaml.DocumentNode,
	Content:     []*yaml.Node{rawInfo},
	HeadComment: "",
}

var decodedSwagger spec.Swagger
err := yamlNode.Decode(&decodedSwagger)
// check error

Users will need to use something like above to convert yaml to spec.Swagger.
I guess if you can provide a method like PraseYAMLDocument(in string) (*spec.Swagger, error) to wrap the logic above, it would be more accessible to the users. @alexzielenski WDYT?

Copy link
Member

@Jefftree Jefftree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also try replacing the swagger -> json -> proto marshalling to use YAML instead? (https://github.com/kubernetes/kube-openapi/blob/master/pkg/handler/handler.go#L109)

@@ -133,14 +136,43 @@ func (v *VendorExtensible) UnmarshalJSON(data []byte) error {
return nil
}

func (v *VendorExtensible) UnmarshalYAML(value *yaml.Node) error {
// var d map[string]interface{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed?

// Provides a fast path for decoding YAML scalar node as a string
// If the node's value can be simply returned directly, then it is. Otherwise,
// the yaml.v3.Node.Decode slow path is taken
func DecodeYAMLString(n *yaml.Node, s *string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the performance difference with this change? I'm curious if this might be better committed in the YAML library rather than here since there isn't any k8s specific logic.

/cc @apelisse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on not putting general-purpose yaml decoding in this lib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it is not desirable long-term. I will prepare a PR for upstream. However, I do not see an alternative in the short term? It will likely take a while for the change to be merged and appear in a tagged release.

Copy link
Contributor Author

@alexzielenski alexzielenski Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefftree The difference with this change is significant. Below are benchmark results for openapi v2 with this optimization removed:

goos: darwin
goarch: amd64
pkg: github.com/alexzielenski/parsebench
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkFastConversion
BenchmarkFastConversion/json->swagger
BenchmarkFastConversion/json->swagger-8         	       2	 503133747 ns/op	95739432 B/op	 1381637 allocs/op
BenchmarkFastConversion/swagger->json
BenchmarkFastConversion/swagger->json-8         	       9	 122539288 ns/op	66432463 B/op	  274050 allocs/op
BenchmarkFastConversion/json->gnostic
BenchmarkFastConversion/json->gnostic-8         	       5	 225056310 ns/op	81462545 B/op	 1248402 allocs/op
BenchmarkFastConversion/gnostic->pb
BenchmarkFastConversion/gnostic->pb-8           	     100	  11495301 ns/op	 2899970 B/op	       1 allocs/op
BenchmarkFastConversion/pb->gnostic
BenchmarkFastConversion/pb->gnostic-8           	     100	  13719325 ns/op	 9480698 B/op	  123829 allocs/op
BenchmarkFastConversion/gnostic->yaml
BenchmarkFastConversion/gnostic->yaml-8         	      40	  30027913 ns/op	32855169 B/op	  264562 allocs/op
BenchmarkFastConversion/yaml->swagger
BenchmarkFastConversion/yaml->swagger-8         	      13	  92594648 ns/op	33392264 B/op	  675799 allocs/op

~93ms in this run on my machine. Over a number of runs i saw a range of 80-100ms for yaml->swagger.

This is compared to ~60ms with the optimization enabled.

return err
}

if strings.HasPrefix(keyStr, "x-") || strings.HasPrefix(keyStr, "X-") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the capitalization check capturing an edge case or do we already have places where X- is passed in?

Copy link
Contributor Author

@alexzielenski alexzielenski Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra check for "X-" is here only to mirror the UnmarshalJSON behavior. I think you wrote that ;)

(Unmarshal JSON calls lk := strings.ToLower(k) before making the comparison)

@@ -163,6 +196,20 @@ func (s *SchemaOrStringArray) UnmarshalJSON(data []byte) error {
return nil
}

func (s *SchemaOrStringArray) UnmarhsalYAML(value *yaml.Node) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Unmarhsal/Unmarshal

if assert.NoError(t, err) {
assert.EqualValues(t, actual, spec)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a comment for this PR, but we should add a roundtrip test in k/k for converting to YAML and back

Comment on lines +123 to +124
if (n.Tag == "!!str" || n.Tag == "tag:yaml.org,2002:!!string") ||
(n.Tag == "" || n.Tag == "!") && n.Style&(yaml.SingleQuotedStyle|yaml.DoubleQuotedStyle|yaml.LiteralStyle|yaml.FoldedStyle) != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this copied from something in the yaml library? I'm not familiar with what this means or if it is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +21 to +23
Name string `json:"name,omitempty" yaml:"name,omitempty"`
URL string `json:"url,omitempty" yaml:"url,omitempty"`
Email string `json:"email,omitempty" yaml:"email,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think we should start adding yaml tags to our API types. Having benchmarked the yaml decoding in the past, I'm also really surprised it is faster than the json decoding (it was generally significantly slower in the past). Do we know which bits the json decoding is super slow on?

Copy link
Contributor Author

@alexzielenski alexzielenski Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaml package since v3 released 2019 operates on an AST; did you look before then?

My benchmarks measured unmarshalling from the AST type instead of text. This is sufficient for my use case of providing a path to convert from protobuf: protoubf -> google/gnostic -> kube-openapi. For me it is important that this conversion runs at interactive speeds (for a CLI tool)

I haven't benchmarked it, but I expect going from YAML text -> AST -> Kube-Openapi would be comparable/slower than JSON.

Copy link
Member

@liggitt liggitt Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want a google/gnostic → kube-openapi path, it would make more sense to me to build that transformation and round-trip test it. Using yaml AST bits is clever, but I don't think we should decorate API types with yaml tags and push consumers in that direction

Copy link

@natasha41575 natasha41575 Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should decorate API types with yaml tags and push consumers in that direction

I'm curious why you're against having yaml tags here? The proto->gnostic->kube-openapi conversion that this facilitates is a critical performance improvement that kpt and kustomize would like to have as soon as possible, and I'm not sure that I understand the disadvantages of adding yaml tags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an alternative solution that is probably going to be even faster, but it will take a few weeks to implement. This is also a blocker for next code-freeze so we're trying to move fast on that!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the update!

@alexzielenski
Copy link
Contributor Author

alexzielenski commented Mar 1, 2022

json->spec.swagger 550 ms yaml->spec.swagger 58 ms pd->spec.swagger = PB -> google/gnostic -> yaml.v3.Node -> spec.swagger 99 ms

It seems the fastest way to get a spec.swagger is to deserialize yaml, though the apiserver doesn't serve openapi in yaml format. I guess when using it in a CLI tool, we can convert json or pd to yaml once and reuse it by deserializing yaml when needed to get faster performance.

yamlNode := &yaml.Node{
	Kind:        yaml.DocumentNode,
	Content:     []*yaml.Node{rawInfo},
	HeadComment: "",
}

var decodedSwagger spec.Swagger
err := yamlNode.Decode(&decodedSwagger)
// check error

Users will need to use something like above to convert yaml to spec.Swagger. I guess if you can provide a method like PraseYAMLDocument(in string) (*spec.Swagger, error) to wrap the logic above, it would be more accessible to the users. @alexzielenski WDYT?

@mengqiy
Agreed, the stanza to set up the DocumentNode is not obvious, and such a function is required. Ill add it in tomorrow :)

@alexzielenski
Copy link
Contributor Author

Given consensus that directing users to use YAML is not desirable, I am closing this PR in favor of a better alternative: direct conversion method without using YAML: #283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants