-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompute IsNamespaceScoped to avoid expensive schema reads #4152
Precompute IsNamespaceScoped to avoid expensive schema reads #4152
Conversation
See kptdev/kpt#2469 For the `gcr.io/kpt-fn/set-namespace:v0.1` function, over 50% of CPU time is spent on IsNamespaceScoped. Instead of unmarshalling 100k lines of JSON to determine this, instead just precompute it. We can ensure this never is inaccurate as the test verifies the precomputed result is up to date. In real world kpt pipelines this cuts execution of set-namespace (and similar functions, just an example of a trivial function) from 2.0s to 1.0s. Because these functions are run in long pipelines over many resources, this adds up a lot.
Hi @howardjohn. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -50,6 +50,93 @@ type openapiData struct { | |||
schemaInit bool | |||
} | |||
|
|||
// precomputedIsNamespaceScoped precomputes IsNamespaceScoped for known types. This avoids Schema creation, | |||
// which is expensive | |||
var precomputedIsNamespaceScoped = map[yaml.TypeMeta]bool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how was this computed? did you have a script that parses the openapi document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: if you had a script to do this, add the contents of the script to to kyaml/openapi/scripts/makeOpenApiInfoDotGo.sh
, so that makeOpenApiInfoDotGo.sh
adds the var precomputedIsNamespaceScoped
to kyaml/openapi/kubernetesapi/openapiinfo.go
, automatically. That way, we can recompute this variable every time we update the openapi data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test output shows the exact map in go syntax, so I just copy and pasted from the failure. If it changes, the test will output the diff that needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks for the explanation.
The test output shows the exact map in go syntax, so I just copy and pasted from the failure. If it changes, the test will output the diff that needs to be updated
Can you please add this as a comment above the test, and also instructions for how to update the var in https://github.com/kubernetes-sigs/kustomize/blob/master/kyaml/openapi/README.md, perhaps a new subsection under Update the built-in schema to a new version
?
Adding as a note that this will also resolve #4100. |
@howardjohn Please let me know if you have time to add the requested documentation. I'd love to get this in :) |
@howardjohn: This PR has multiple commits, and the default merge method is: merge. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Should be good to go now |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: howardjohn, natasha41575 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Avoid calling initSchema from openapi.IsNamespaceScoped when possible. Work done in kubernetes-sigs#4152 introduced a precomputed namespace scope map based on the default built-in schema. This commit extends that work by avoiding calls to initSchema when a resource is not found in the precomputed map and the default built-in schema is in use. In those cases, there is no benefit to calling initSchema since the precomputed map is exactly what will be calculated by parsing the default built-in schema.
* test: add openapi.IsNamespaceScoped benchmark Add a benchmark test for IsNamespaceScoped performance when the default schema is in use. * perf: limit initSchema calls from openapi.IsNamespaceScoped Avoid calling initSchema from openapi.IsNamespaceScoped when possible. Work done in #4152 introduced a precomputed namespace scope map based on the default built-in schema. This commit extends that work by avoiding calls to initSchema when a resource is not found in the precomputed map and the default built-in schema is in use. In those cases, there is no benefit to calling initSchema since the precomputed map is exactly what will be calculated by parsing the default built-in schema. * fix: delay parsing of default built-in schema When namespace scope can be determined by the precomputed map but the type is not present in the precomputed map, delay the parsing of the default built-in schema. If the schema to be initialized is the default built-in schema and the type is not in the precomputed map, then the type will not be found in the default built-in schema. There is no need to parse the default built-in schema for that answer; its parsing may be delayed until it is needed for some other purpose. In cases where the schema is used solely for namespace scope checks, the schema might not ever be parsed. Skipping the parsing reduces both execution time and memory use. * fix: correct openapi.go's schemaNotParsed value openapiData initializes with defaultBuiltInSchemaParseStatus set to 0, so schemaNotParsed should have 0 as its value.
See kptdev/kpt#2469
For the
gcr.io/kpt-fn/set-namespace:v0.1
function, over 50% of CPUtime is spent on IsNamespaceScoped. Instead of unmarshalling 100k lines
of JSON to determine this, instead just precompute it. We can ensure
this never is inaccurate as the test verifies the precomputed result is
up to date.
In real world kpt pipelines this cuts execution of set-namespace (and
similar functions, just an example of a trivial function) from 2.0s to
1.0s. Because these functions are run in long pipelines over many
resources, this adds up a lot.
Profile before: