Add direct conversion from Gnostic v2 types to spec.Swagger #283

alexzielenski · 2022-03-09T18:46:32Z

This PR continues conversation from #279 and privately with @liggitt wherein we decided the best path to convert gnostic to kube-openapi was to write a direct conversion.

This PR adds a direct conversion from gnostic openapiv2 types to kube-openapi's spec.Swagger. Below are benchmark results from the same benchmark used in #279:

BenchmarkGnosticConversion/json->swagger
BenchmarkGnosticConversion/json->swagger-8                  2     537177100 ns/op    95737132 B/op    1381637 allocs/op
BenchmarkGnosticConversion/swagger->json
BenchmarkGnosticConversion/swagger->json-8                  9     125024798 ns/op    71075921 B/op     274071 allocs/op
BenchmarkGnosticConversion/json->gnostic
BenchmarkGnosticConversion/json->gnostic-8                  5     229392000 ns/op    81446859 B/op    1248395 allocs/op
BenchmarkGnosticConversion/gnostic->pb
BenchmarkGnosticConversion/gnostic->pb-8                  100      11095008 ns/op     2899968 B/op          1 allocs/op
BenchmarkGnosticConversion/gnostic->yaml
BenchmarkGnosticConversion/gnostic->yaml-8                 42      28717820 ns/op    32855169 B/op     264562 allocs/op

BenchmarkGnosticConversion/pb->gnostic
BenchmarkGnosticConversion/pb->gnostic-8                   97      13012220 ns/op     9480701 B/op     123829 allocs/op
BenchmarkGnosticConversion/gnostic->swagger
BenchmarkGnosticConversion/gnostic->swagger-8              50      25910391 ns/op    22287938 B/op     164414 allocs/op

pb->gnostic->swagger using this method takes ~39ms

this compares favorably to pb->gnostic->yaml->swagger from the last PR in ~97ms

which compared favorably to pb->gnostic->yaml->json->swagger, today's existing solution, which clocked this benchmark at ~628ms

Caveats:

Fields like Maximum, Minimum, MaxItems, etc in gnostic are not implemented as pointers. This means the conversion has no way to differentiate between one of these being 0 vs being unused
Kube-Openapi Swagger type has a number of fields added in OpenAPI v3, despite the fact it represents an OpenAPI v2 object. This fields cannot be populated by the unmarshaler since gnostic errors/ignores unrecognized keys. Thus, if one is roundtripping with gnostic via kube->json->gnostic->kube, those fields would be missing, since json->gnostic would ignore them.
spec.SchemaOrArray in gnostic's implementation only scans for one element, so successive schemas are ignored
A number of our kube-openapi Swagger types do not have VendorExtensible embedded even when OpenAPI spec permits "x-" extensions. This means any extensions used in gnostic for a few types are not carried over. This can be corrected in another PR
Response.Description and Parameter.Description are missing from our type definitions despite being available in openapi v2, so the information is dropped when converted from gnostic. This can be corrected in another PR.

For the use case of kubernetes' swagger.json, most of these caveats do not apply:

The Maximum/Minimum/etc. fields. May be an issue. I have not looked into whether in the OpenAPI spec these values being 0 means "unset" like gnostic treats them.
The openapi v3 validation fields like anyOf, not, etc. are stripped from the kubernetes openapi v2 published description. (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/schema/skeleton.go)
Practical (and kubernetes') usages of Schema.Items (only usage of spec.SchemaOrArray) always have one element. This is because JSON schema spec does not specify it as an array in more recent versions. (prefixItems is used as the array version of old Items): http://json-schema.org/draft/2020-12/json-schema-core.html#items
This is only a problem if the information existed in the first place. Since kubernetes use case is kube->gnostic->kube, there is no risk of losing fields that are available in gnostic but not in kube-openapi.
MIssing fields can be added. But also see above point.

@natasha41575
/cc @apelisse

liggitt · 2022-03-09T18:58:27Z

numbers look really promising, thanks

is this ready for review? I see a lot of question marks and commented out code

alexzielenski · 2022-03-09T19:09:59Z

@liggitt It is ready for review as of my latest force push. You looked too quickly!

apelisse

Minor nits, the conversion code seems good to me, especially as it round-trips properly.

apelisse · 2022-03-09T21:33:12Z

pkg/validation/spec/gnostic_test.go

+func TestGnosticConversionSmallDeterministic2(t *testing.T) {
+	// A failed case of TestGnosticConversionSmallRandom
+	// which failed during development/testing loop
+	gnosticCommonTest(
+		t,
+		fuzz.
+			NewWithSeed(1646770841).
+			NilChance(0.8).
+			MaxDepth(10).
+			NumElements(1, 2),
+	)
+}
+
+func TestGnosticConversionSmallDeterministic3(t *testing.T) {
+	// A failed case of TestGnosticConversionSmallRandom
+	// which failed during development/testing loop
+	gnosticCommonTest(
+		t,
+		fuzz.
+			NewWithSeed(1646772024).
+			NilChance(0.8).
+			MaxDepth(10).
+			NumElements(1, 2),
+	)
+}
+
+func TestGnosticConversionSmallDeterministic4(t *testing.T) {
+	// A failed case of TestGnosticConversionSmallRandom
+	// which failed during development/testing loop
+	gnosticCommonTest(
+		t,
+		fuzz.
+			NewWithSeed(1646791953).
+			NilChance(0.8).
+			MaxDepth(10).
+			NumElements(1, 2),
+	)
+}
+
+
+func TestGnosticConversionSmallRandom(t *testing.T) {
+	seed := time.Now().Unix()
+	t.Log("Using seed: ", seed)
+
+	gnosticCommonTest(
+		t,
+		fuzz.
+			NewWithSeed(seed).
+			NilChance(0.8).
+			MaxDepth(10).


I don't know if we need all these variants of the test. If it were me, I'd have kept it in a single test, and ran a loop a few times (100? depending on how much each invocation takes), re-fuzzing every-time rather than sticking to either pre-determined seed or even changing parameters like this. I don't think the "size" really has an impact for these.

I disagree that the "deterministic" tests add no value. The SmallRandom test case does as you described to fuzz for new failing test cases (though it maybe should be modified to fuzz more than once). Once a failing case is discovered by the random fuzzer it is easy to take the seed used to reproduce the failure, fix it, and add a regression test. Do this enough times and you have a corpus of tricky test cases the code is known to have failed in the past against to guard against regression.

To fuzz in a loop 100 times without printing the random seed or fuzzed object used for each case offers no value to a developer other than to say "it's broken", since the exact case can't be reproduced.

even changing parameters like this. I don't think the "size" really has an impact for these.

I was forced to change the parameters since the fuzz to generate the object would take too long otherwise. The different "sizes" was useful for me during development to work on smaller cases and then larger and larger. The distinction may not be useful as part of the test sweet but I see no need to remove them.

pkg/validation/spec/gnostic_test.go

pkg/validation/spec/gnostic.go

liggitt · 2022-03-10T14:21:49Z

Had a few comments:

zero-value handling for scalars (might be a limitation of gnostic, which is unfortunate... is that causing problems elsewhere?)
null vs empty handling of lists/maps
null pointer checks when ranging over lists of pointers
tracking/propagating whether unexpected data loss occurred

I didn't review the specific gnostic structs carefully to check all the fields were caught... I'm assuming fuzzing will catch simple field copies.

For the interface types we're doing type switches on, do you have the unit test coverage handy to see how well the tests are exercising all the cases?

alexzielenski · 2022-03-10T21:28:35Z

@liggitt With my latest updates I think I have addressed most of your comments:

Zero-handling: Unfortunately there is no way to differentiate 0 with unspecified here. I also looked into the protoreflect api, and even that has logic for treating 0 as not being present. A shame. I haven't seen mention of this causing problems for anyone else, but I also haven't looked very hard.
I've reviewed the creation of lists/maps to make the handling of empty lists/maps consistent: if gnostic has nil, kube-openapi will also use nil. And vice versa.
I've also reviewed pointer usage especially regarding ranges over lists/maps of pointers. Appropriate guards are now in place.
A new ok result has been added to track whether data loss was detected.

I gathered test coverage info for the gnostic tests. gnostic.go is currently 68.8% covered. I'm looking into increasing this. Upon a quick glance, at least not all of the type cases of Parameter are being exercised.

liggitt · 2022-03-10T21:54:27Z

Zero-handling: Unfortunately there is no way to differentiate 0 with unspecified here. I also looked into the protoreflect api, and even that has logic for treating 0 as not being present. A shame. I haven't seen mention of this causing problems for anyone else, but I also haven't looked very hard.

huh, that's a pretty sharp edge. I can easily imagine someone using maximum: 0. not sure what to do about that other than document it, and make sure authoritative validators don't use this method?

I've reviewed the creation of lists/maps to make the handling of empty lists/maps consistent: if gnostic has nil, kube-openapi will also use nil. And vice versa.

👍

I've also reviewed pointer usage especially regarding ranges over lists/maps of pointers. Appropriate guards are now in place.

👍

A new ok result has been added to track whether data loss was detected.

👍

I gathered test coverage info for the gnostic tests. gnostic.go is currently 68.8% covered. I'm looking into increasing this. Upon a quick glance, at least not all of the type cases of Parameter are being exercised.

👍

apelisse · 2022-03-14T16:53:23Z

huh, that's a pretty sharp edge. I can easily imagine someone using maximum: 0. not sure what to do about that other than document it, and make sure authoritative validators don't use this method?

Yeah, I think that's our only option right now. That's a little unfortunate but hopefully that'll be limited to OpenAPI v2.

alexzielenski · 2022-03-15T00:43:07Z

With my latest commit all supported gnostic types and significant LOC are now being exercised within the tests.

alexzielenski · 2022-03-15T00:44:05Z

/retest

k8s-ci-robot · 2022-03-15T00:44:18Z

@alexzielenski: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

go.mod

pkg/validation/spec/gnostic.go

liggitt · 2022-03-19T19:46:12Z

looking really solid, thanks. noted a few places we want to make sure we exercise plumbing with tests

liggitt

one way to improve coverage of tested code paths is to remove unused ones... I noted several places where we have no use of the ok data loss return indicator... trimming that to only places that are propagating errors from License/ContactInfo/ExternalDocumentation will help clarify where we could actually exercise those branches in tests

pkg/validation/spec/gnostic.go

alexzielenski · 2022-03-29T21:37:40Z

/lgtm
/approve

liggitt

a couple last nits, then lgtm

pkg/validation/spec/gnostic.go

also add benchmark for gnostic conversion with swagger

liggitt · 2022-04-01T20:55:23Z

/lgtm
/approve

apelisse · 2022-04-01T21:22:50Z

/approve

Thanks, that's awesome! Can't wait to see the opposite direction 😂

k8s-ci-robot · 2022-04-01T21:23:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexzielenski, apelisse, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [apelisse]
~~pkg/validation/OWNERS~~ [liggitt]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested a review from apelisse March 9, 2022 18:46

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 9, 2022

alexzielenski mentioned this pull request Mar 9, 2022

yaml unmarshal for OpenAPIv2 types #279

Closed

alexzielenski changed the title ~~Add FromGnostic to spec.* types to convert gnostic openapiv2 Document to spec.Swagger~~ Add direct conversion from Gnostic v2 types to spec.Swagger Mar 9, 2022

alexzielenski force-pushed the gnostic_conversion branch from 8d2f9d0 to 0e8d80a Compare March 9, 2022 19:09

alexzielenski force-pushed the gnostic_conversion branch 2 times, most recently from f98efd5 to f9d96bf Compare March 9, 2022 21:36

apelisse reviewed Mar 10, 2022

View reviewed changes