-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: limit initSchema calls from openapi.IsNamespaceScoped #5076
Conversation
Add a benchmark test for IsNamespaceScoped performance when the default schema is in use.
Avoid calling initSchema from openapi.IsNamespaceScoped when possible. Work done in kubernetes-sigs#4152 introduced a precomputed namespace scope map based on the default built-in schema. This commit extends that work by avoiding calls to initSchema when a resource is not found in the precomputed map and the default built-in schema is in use. In those cases, there is no benefit to calling initSchema since the precomputed map is exactly what will be calculated by parsing the default built-in schema.
This PR has multiple commits, and the default merge method is: merge. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @ephesused. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one nit
@@ -5,6 +5,7 @@ package openapi | |||
|
|||
import ( | |||
"path/filepath" | |||
"sigs.k8s.io/kustomize/kyaml/yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this import down with the second group of imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - thanks for the catch.
/ok-to-test |
It turns out there's more complexity in the processing, and I want to take a little time to think about this one before coding further. Thanks to the test coverage - particularly api/krusty's TestCustomOpenApiFieldFromComponentWithOverlays - it's clear this optimization introduces a bug. What's not clear is how to retain the current behavior and provide the optimization. At first, it would seem simple - just delay the initSchema loads until the point where the schema truly is needed. The problem is that by the time the schema truly is needed, the earlier state has been lost. In effect, the question centers around what the initSchema calls would have done for earlier calls to NewGvk when the initSchema invocations were skipped. That's certainly possible, but it's more complex than I had thought it would be. Explanation... For the code at master, the TestCustomOpenApiFieldFromComponentWithOverlays flow does something like so: (api/krusty/kustomizer.go) Kustomizer.Run calls openapi.SetSchema Later, during accumulation: (api/resmap.factory.go) Factory.FromFile chains down to... Later, still during accumulation: (api/internal/target/kusttarget.go) KustTarget.accumulateDirectory calls openapi.SetSchema based on the defined openapi path And a little later: (kyaml/resid/gvk.go) NewGvk leads to... The result is that the stored data now has both the default and the defined openapi path. The change I proposed eliminates the initSchema call from certain NewGvk calls. As a result, the default schema might not load. That absence leads to the testcase failure. Currently, NewGvk has a side effect of loading the default schema when the Gvk is not known. It turns out that the default schema load is a requirement for certain cases - and TestCustomOpenApiFieldFromComponentWithOverlays is one of them. I believe the reason TestCustomOpenApiFieldFromComponentWithOverlays is affected is that com.github.openshift.api.apps.v1.DeploymentConfigSpec is defined via a reference to io.k8s.api.core.v1.PodTemplateSpec. In the absence of the default schema load, the reference is not present. In turn, that affects the results of the patch. |
When namespace scope can be determined by the precomputed map but the type is not present in the precomputed map, delay the parsing of the default built-in schema. If the schema to be initialized is the default built-in schema and the type is not in the precomputed map, then the type will not be found in the default built-in schema. There is no need to parse the default built-in schema for that answer; its parsing may be delayed until it is needed for some other purpose. In cases where the schema is used solely for namespace scope checks, the schema might not ever be parsed. Skipping the parsing reduces both execution time and memory use.
For my previous attempt, the only schema whose parsing was affected was the default built-in schema. That means I don't have to address a general solution - I can craft a simple one targeted at the one affected schema. That's addressed in the latest commit. |
/lgtm |
@ephesused I'm not sure why the tests aren't triggering (and I can't manually trigger them) - could you try pushing a dummy commit and see if it runs the tests? |
I simply merged to make this PR current. I see the workflows waiting to run, pending approval from a maintainer. Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ephesused, natasha41575 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Looks like a test failure with Happy to re-approve once that is fixed. |
Thanks - I'll try to take a look at that later today. |
openapiData initializes with defaultBuiltInSchemaParseStatus set to 0, so schemaNotParsed should have 0 as its value.
That was a bit embarrassing - I don't know how I managed to mess that up four months ago. @natasha41575, I believe the last commit should resolve the issue. I'm never able to run all the tests cleanly in my environment, but the ones I checked look good. For example:
|
/lgtm |
Avoid calling initSchema from openapi.IsNamespaceScoped when possible. Work done in #4152 introduced a precomputed namespace scope map based on the default built-in schema. This commit extends that work by avoiding calls to initSchema when a resource is not found in the precomputed map and the default built-in schema is in use. In those cases, there is no benefit to calling initSchema since the precomputed map is exactly what will be calculated by parsing the default built-in schema.
This PR resolves some of the concern in #4569, but the PR does not fix the issue completely. Specifically, this PR does not address this earlier comment from @KnVerey:
The PR is split into two commits, one for a new benchmark and one for the performance improvement. Using the benchmark for analysis...
Before:
After: