-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yaml unmarshal for OpenAPIv2 types #279
Conversation
Welcome @alexzielenski! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: alexzielenski The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @Jefftree |
json->spec.swagger 550 ms It seems the fastest way to get a spec.swagger is to deserialize yaml, though the apiserver doesn't serve openapi in yaml format. I guess when using it in a CLI tool, we can convert json or pd to yaml once and reuse it by deserializing yaml when needed to get faster performance. yamlNode := &yaml.Node{
Kind: yaml.DocumentNode,
Content: []*yaml.Node{rawInfo},
HeadComment: "",
}
var decodedSwagger spec.Swagger
err := yamlNode.Decode(&decodedSwagger)
// check error Users will need to use something like above to convert yaml to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also try replacing the swagger -> json -> proto marshalling to use YAML instead? (https://github.com/kubernetes/kube-openapi/blob/master/pkg/handler/handler.go#L109)
@@ -133,14 +136,43 @@ func (v *VendorExtensible) UnmarshalJSON(data []byte) error { | |||
return nil | |||
} | |||
|
|||
func (v *VendorExtensible) UnmarshalYAML(value *yaml.Node) error { | |||
// var d map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line needed?
// Provides a fast path for decoding YAML scalar node as a string | ||
// If the node's value can be simply returned directly, then it is. Otherwise, | ||
// the yaml.v3.Node.Decode slow path is taken | ||
func DecodeYAMLString(n *yaml.Node, s *string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the performance difference with this change? I'm curious if this might be better committed in the YAML library rather than here since there isn't any k8s specific logic.
/cc @apelisse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on not putting general-purpose yaml decoding in this lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it is not desirable long-term. I will prepare a PR for upstream. However, I do not see an alternative in the short term? It will likely take a while for the change to be merged and appear in a tagged release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jefftree The difference with this change is significant. Below are benchmark results for openapi v2 with this optimization removed:
goos: darwin
goarch: amd64
pkg: github.com/alexzielenski/parsebench
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkFastConversion
BenchmarkFastConversion/json->swagger
BenchmarkFastConversion/json->swagger-8 2 503133747 ns/op 95739432 B/op 1381637 allocs/op
BenchmarkFastConversion/swagger->json
BenchmarkFastConversion/swagger->json-8 9 122539288 ns/op 66432463 B/op 274050 allocs/op
BenchmarkFastConversion/json->gnostic
BenchmarkFastConversion/json->gnostic-8 5 225056310 ns/op 81462545 B/op 1248402 allocs/op
BenchmarkFastConversion/gnostic->pb
BenchmarkFastConversion/gnostic->pb-8 100 11495301 ns/op 2899970 B/op 1 allocs/op
BenchmarkFastConversion/pb->gnostic
BenchmarkFastConversion/pb->gnostic-8 100 13719325 ns/op 9480698 B/op 123829 allocs/op
BenchmarkFastConversion/gnostic->yaml
BenchmarkFastConversion/gnostic->yaml-8 40 30027913 ns/op 32855169 B/op 264562 allocs/op
BenchmarkFastConversion/yaml->swagger
BenchmarkFastConversion/yaml->swagger-8 13 92594648 ns/op 33392264 B/op 675799 allocs/op
~93ms in this run on my machine. Over a number of runs i saw a range of 80-100ms for yaml->swagger.
This is compared to ~60ms with the optimization enabled.
return err | ||
} | ||
|
||
if strings.HasPrefix(keyStr, "x-") || strings.HasPrefix(keyStr, "X-") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the capitalization check capturing an edge case or do we already have places where X-
is passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra check for "X-" is here only to mirror the UnmarshalJSON behavior. I think you wrote that ;)
(Unmarshal JSON calls lk := strings.ToLower(k)
before making the comparison)
@@ -163,6 +196,20 @@ func (s *SchemaOrStringArray) UnmarshalJSON(data []byte) error { | |||
return nil | |||
} | |||
|
|||
func (s *SchemaOrStringArray) UnmarhsalYAML(value *yaml.Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Unmarhsal/Unmarshal
if assert.NoError(t, err) { | ||
assert.EqualValues(t, actual, spec) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a comment for this PR, but we should add a roundtrip test in k/k for converting to YAML and back
if (n.Tag == "!!str" || n.Tag == "tag:yaml.org,2002:!!string") || | ||
(n.Tag == "" || n.Tag == "!") && n.Style&(yaml.SingleQuotedStyle|yaml.DoubleQuotedStyle|yaml.LiteralStyle|yaml.FoldedStyle) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this copied from something in the yaml library? I'm not familiar with what this means or if it is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from yaml decoding logic for string path:
https://github.com/go-yaml/yaml/blob/496545a6307b2a7d7a710fd516e5e16e8ab62dbc/yaml.go#L463
https://github.com/go-yaml/yaml/blob/496545a6307b2a7d7a710fd516e5e16e8ab62dbc/decode.go#L565
Name string `json:"name,omitempty" yaml:"name,omitempty"` | ||
URL string `json:"url,omitempty" yaml:"url,omitempty"` | ||
Email string `json:"email,omitempty" yaml:"email,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think we should start adding yaml tags to our API types. Having benchmarked the yaml decoding in the past, I'm also really surprised it is faster than the json decoding (it was generally significantly slower in the past). Do we know which bits the json decoding is super slow on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yaml package since v3 released 2019 operates on an AST; did you look before then?
My benchmarks measured unmarshalling from the AST type instead of text. This is sufficient for my use case of providing a path to convert from protobuf: protoubf -> google/gnostic -> kube-openapi
. For me it is important that this conversion runs at interactive speeds (for a CLI tool)
I haven't benchmarked it, but I expect going from YAML text -> AST -> Kube-Openapi would be comparable/slower than JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want a google/gnostic → kube-openapi path, it would make more sense to me to build that transformation and round-trip test it. Using yaml AST bits is clever, but I don't think we should decorate API types with yaml tags and push consumers in that direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should decorate API types with yaml tags and push consumers in that direction
I'm curious why you're against having yaml tags here? The proto->gnostic->kube-openapi conversion that this facilitates is a critical performance improvement that kpt and kustomize would like to have as soon as possible, and I'm not sure that I understand the disadvantages of adding yaml tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an alternative solution that is probably going to be even faster, but it will take a few weeks to implement. This is also a blocker for next code-freeze so we're trying to move fast on that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the update!
@mengqiy |
Given consensus that directing users to use YAML is not desirable, I am closing this PR in favor of a better alternative: direct conversion method without using YAML: #283 |
There is no fast way to deserialize protobuf into kube-openapi types. The obvious choice is to follow this sequence:
PB
->google/gnostic
->JSON
-> kube-openapiHowever in CLI tools this is slow. On mine and others' systems it takes over half a second to just do the conversion, let alone the operation the user requested. Too slow for interactivity.
This PR facilitates the intermediate step necessary to perform the following faster conversion:
PB
->google/gnostic
->yaml.v3.Node
-> kube-openapiThis method compares favorably to the choice of using JSON as an intermediary.
OpenAPI V2 Conversion Benchmarks:
These benchmarks start with
swagger.json
pulled from a running k8s cluster and measure conversions between various representations of the openapi spec. In the below testsswagger
refers tokube-openapi's
pkg/validation/spec.Swagger
Ran a benchmark to compare different conversion:
https://github.com/alexzielenski/kube-openapi-gnostic-benchmark
Using JSON:
Using YAML:
yaml->swagger
at 58.5ms is 10x faster thanyaml->json
+json->swagger#01
at 586.2msOpenAPI V3 yaml patch is forthcoming in another PR