Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute IsNamespaceScoped to avoid expensive schema reads #4152

Merged

Conversation

howardjohn
Copy link
Contributor

@howardjohn howardjohn commented Aug 29, 2021

See kptdev/kpt#2469

For the gcr.io/kpt-fn/set-namespace:v0.1 function, over 50% of CPU
time is spent on IsNamespaceScoped. Instead of unmarshalling 100k lines
of JSON to determine this, instead just precompute it. We can ensure
this never is inaccurate as the test verifies the precomputed result is
up to date.

In real world kpt pipelines this cuts execution of set-namespace (and
similar functions, just an example of a trivial function) from 2.0s to
1.0s. Because these functions are run in long pipelines over many
resources, this adds up a lot.

Profile before:
2021-29-08_16-05-25

See kptdev/kpt#2469

For the `gcr.io/kpt-fn/set-namespace:v0.1` function, over 50% of CPU
time is spent on IsNamespaceScoped. Instead of unmarshalling 100k lines
of JSON to determine this, instead just precompute it. We can ensure
this never is inaccurate as the test verifies the precomputed result is
up to date.

In real world kpt pipelines this cuts execution of set-namespace (and
similar functions, just an example of a trivial function) from 2.0s to
1.0s. Because these functions are run in long pipelines over many
resources, this adds up a lot.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 29, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @howardjohn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 29, 2021
@natasha41575
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 30, 2021
@@ -50,6 +50,93 @@ type openapiData struct {
schemaInit bool
}

// precomputedIsNamespaceScoped precomputes IsNamespaceScoped for known types. This avoids Schema creation,
// which is expensive
var precomputedIsNamespaceScoped = map[yaml.TypeMeta]bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how was this computed? did you have a script that parses the openapi document?

Copy link
Contributor

@natasha41575 natasha41575 Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: if you had a script to do this, add the contents of the script to to kyaml/openapi/scripts/makeOpenApiInfoDotGo.sh, so that makeOpenApiInfoDotGo.sh adds the var precomputedIsNamespaceScoped to kyaml/openapi/kubernetesapi/openapiinfo.go, automatically. That way, we can recompute this variable every time we update the openapi data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test output shows the exact map in go syntax, so I just copy and pasted from the failure. If it changes, the test will output the diff that needs to be updated

Copy link
Contributor

@natasha41575 natasha41575 Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks for the explanation.

The test output shows the exact map in go syntax, so I just copy and pasted from the failure. If it changes, the test will output the diff that needs to be updated

Can you please add this as a comment above the test, and also instructions for how to update the var in https://github.com/kubernetes-sigs/kustomize/blob/master/kyaml/openapi/README.md, perhaps a new subsection under Update the built-in schema to a new version?

@natasha41575
Copy link
Contributor

Adding as a note that this will also resolve #4100.

@natasha41575
Copy link
Contributor

@howardjohn Please let me know if you have time to add the requested documentation. I'd love to get this in :)

@k8s-ci-robot
Copy link
Contributor

@howardjohn: This PR has multiple commits, and the default merge method is: merge.
You can request commits to be squashed using the label: tide/merge-method-squash

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@howardjohn
Copy link
Contributor Author

@howardjohn Please let me know if you have time to add the requested documentation. I'd love to get this in :)

Should be good to go now

@natasha41575 natasha41575 added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 7, 2021
@natasha41575
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 9, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: howardjohn, natasha41575

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit 402f6ca into kubernetes-sigs:master Sep 9, 2021
ephesused added a commit to ephesused/kustomize that referenced this pull request Mar 6, 2023
Avoid calling initSchema from openapi.IsNamespaceScoped when possible.
Work done in kubernetes-sigs#4152 introduced a precomputed namespace scope map based on
the default built-in schema. This commit extends that work by avoiding
calls to initSchema when a resource is not found in the precomputed map
and the default built-in schema is in use. In those cases, there is no
benefit to calling initSchema since the precomputed map is exactly what
will be calculated by parsing the default built-in schema.
k8s-ci-robot pushed a commit that referenced this pull request Sep 8, 2023
* test: add openapi.IsNamespaceScoped benchmark

Add a benchmark test for IsNamespaceScoped performance when the default
schema is in use.

* perf: limit initSchema calls from openapi.IsNamespaceScoped

Avoid calling initSchema from openapi.IsNamespaceScoped when possible.
Work done in #4152 introduced a precomputed namespace scope map based on
the default built-in schema. This commit extends that work by avoiding
calls to initSchema when a resource is not found in the precomputed map
and the default built-in schema is in use. In those cases, there is no
benefit to calling initSchema since the precomputed map is exactly what
will be calculated by parsing the default built-in schema.

* fix: delay parsing of default built-in schema

When namespace scope can be determined by the precomputed map but the
type is not present in the precomputed map, delay the parsing of the
default built-in schema.

If the schema to be initialized is the default built-in schema and the
type is not in the precomputed map, then the type will not be found in
the default built-in schema. There is no need to parse the default
built-in schema for that answer; its parsing may be delayed until it
is needed for some other purpose.

In cases where the schema is used solely for namespace scope checks, the
schema might not ever be parsed. Skipping the parsing reduces both
execution time and memory use.

* fix: correct openapi.go's schemaNotParsed value

openapiData initializes with defaultBuiltInSchemaParseStatus set to 0,
so schemaNotParsed should have 0 as its value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants