perf: Add full program benchmark for kustomize build #5425

chlunde · 2023-10-30T11:37:59Z

This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize.

This update aims to address several objectives:

Demonstrating current performance challenges in Kustomize in a reproducible manner.
Evaluating the effects of performance enhancements.
Guarding against potential performance setbacks and inadvertent quadratic behavior in the future.
Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations.

Usage:

go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build

# sigs.k8s.io/kustomize/kustomize/v5/commands/build.test
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
BenchmarkBuild-8   	       1	8523677542 ns/op
PASS
ok  	sigs.k8s.io/kustomize/kustomize/v5/commands/build	8.798s

Currently, this benchmark requires 3000 seconds to run on my machine. In order to run it on master today, you need to add -timeout=30m to the go test command.

The dataset size was chosen because I believe it represents a real workload which we could get a runtime of less than 10 seconds.

Updates #5084

Notes on PGO:

Real-life profiles would be better, but creating one based on a benchmark should not hurt:

https://go.dev/doc/pgo#collecting-profiles

Will PGO with an unrepresentative profile make my program slower than no PGO?
It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new.

Collecting a profile:

go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build

go build -pgo=./cpu1.pprof  -o kust-pgo ./kustomize
go build -o kust-nopgo ./kustomize

Compare PGO and non-PGO-builds:

./kust-pgo build -o /dev/null testdata/  21.88s user 2.00s system 176% cpu 13.505 total
./kust-nopgo build -o /dev/null testdata/  22.76s user 1.98s system 174% cpu 14.170 total

k8s-ci-robot · 2023-10-30T11:38:09Z

Hi @chlunde. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

shapirus · 2023-10-30T13:13:27Z

One important aspect that I'd like to point out here is that the benchmarks should cover the following cases:

single invocation of kustomize to process a big and complex resource tree
multiple invocations of kustomize to process a small set of manifests on each run (simulates e.g. argocd handling many microservices, or a validation CI job running on the repository hosting a large set of manifests)

These cases are very different in terms of the performance impact that specific parts of code create, yet each of them is quite valid as far as practical usage is concerned.

natasha41575 · 2023-10-30T15:14:04Z

/ok-to-test
/triage accepted

chlunde · 2023-10-30T21:06:56Z

In hindsight, it makes no sense to merge this with the current performance.
Either we wait for a few of them to be merged or we adjust some of the sizing parameters for reasonable numbers (runtime <10m in CI).

I was a bit surprised at how long the baseline performance was, as I had quite a few performance patches locally at the time I wrote the benchmark.

@shapirus yeah, we should have some benchmarks emulating argo/flux too. Do you think this affects how this PR is structured?
I think we should look at the API surface used by argo/flux so we use the right entry points, and we should probably avoid using the file system, so I think it would be a different benchmark.

kustomize/commands/build/build_benchmark_test.go

ncapps · 2023-10-30T21:59:49Z

In hindsight, it makes no sense to merge this with the current performance.
Either we wait for a few of them to be merged or we adjust some of the sizing parameters for reasonable numbers (runtime <10m in CI).

I agree that it would be beneficial to limit the duration and scope of these performance tests. I really like the suggestion from @shapirus. Ideally, we can run meaningful benchmark tests as a PR check. This would help us understand if a specific commit significantly changed the performance.

shapirus · 2023-10-31T09:26:39Z

@shapirus yeah, we should have some benchmarks emulating argo/flux too. Do you think this affects how this PR is structured?

@chlunde As far as I can see, it is basically a separate go program that imports kustomize as a library and then runs tests in its own context. This will only allow to benchmark the "single invocation, big resource tree" scenario, as it makes the startup overhead take place only once (see #5422 (comment)).

To emulate argocd/fluxcd we need to, well, do what they do: invoke an actual kustomize binary (which, in the CI flow, should be already built by the time the benchmarks run, if I understand correctly) a certain number of times and measure the total run time.

My set of benchmarks at https://github.com/shapirus/kustomize-benchmark-suite (Dockerfile serves as a readme there) does exactly that. Probably some core code could be reused for the kustomize CI. Yes it's somewhat crude, but also small enough to be understood and reused.
If we're limited to go (and e.g. can't use shell), it's no problem too: go can spawn processes to run commands and measure their execution time just as well, and with a cleaner code.

Unfortunately I lack the knowledge required to convert them into a proper CI workflow step for kustomize, but it should be easy, if not trivial, for those who don't.

I think we should look at the API surface used by argo/flux so we use the right entry points

There is really no need to do this, I think. Can't tell about fluxcd, but argocd simply runs a standalone kustomize executable binary to build the resulting manifest for each Application that it has been requested for. This is easily simulated with a simple shell script. Let's try to avoid overengineering :).

and we should probably avoid using the file system, so I think it would be a different benchmark.

In the ideal world, yes, but in practice it doesn't make any discernable difference, unless the underlying block device is very slow and/or high latency. On the other hand, it adds extra complications.

To reduce the effects of the disk access overhead sufficiently for our specific practical use case, we can run find ./manifests-tree -type f -exec cat {} > /dev/null \; before running the benchmark to pre-heat page cache by reading all the files we're going to need.

My bigger concern is not the file system, but the shared CPU on the machine where the benchmark runs. If there are other CPU-intensive jobs running on the same machine, they can affect the benchmark execution times quite significantly. There are possible workarounds however:

run the benchmark process with nice -19 (not always possible, requires privileges)
use a sufficiently high number of iterations to make the benchmark run at least a few tens of seconds to reduce the effects of small lags and fluctuations (i.e. you can't reliably benchmark when your test takes, say, a second or less, unless you have hardware dedicated to this test alone)
run the same benchmark several times and use the results of the one that completed in the shortest time (aka had most of the CPU available to itself alone)
use a sufficently high threshold to mark the test failed upon detection of a regression against a previous version (say > 40%) -- however, this makes it necessary to test not only against that one version, but several previous versions to detect the cumulative effect of small incremental regressions, and implementing this logic properly to detect regressions reliably and without false positives may be non-trivial. There are probably existing tools, someone with a QA background might give more ideas.

kustomize/commands/build/build_benchmark_test.go

ephesused · 2023-11-02T19:14:23Z

In poking around at this benchmark, I did notice some behavior that surprised me. I adjusted the value of genconfig[1].resources and the run time jump between 1 and 2 stands out dramatically. @chlunde (or others), do you see the same behavior?

Admittedly, I haven't taken the time to understand the test case well.

$ DEPTH_ONE_RESOURCE_COUNT=1 go test ./kustomize/commands/build -run nope -bench BenchmarkBuild
goos: windows
goarch: amd64                                         
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz          
BenchmarkBuild-8        1000000000               0.4780 ns/op
--- BENCH: BenchmarkBuild-8                                           
    build_benchmark_test.go:162: genconfig[1].resources redefined as 1
PASS                                                                  
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       12.829s

$ DEPTH_ONE_RESOURCE_COUNT=2 go test ./kustomize/commands/build -run nope -bench BenchmarkBuild
goos: windows
goarch: amd64                                         
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz          
BenchmarkBuild-8               1        1639273100 ns/op
--- BENCH: BenchmarkBuild-8
    build_benchmark_test.go:162: genconfig[1].resources redefined as 2
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       3.119s

$ DEPTH_ONE_RESOURCE_COUNT=3 go test ./kustomize/commands/build -run nope -bench BenchmarkBuild
goos: windows
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkBuild-8               1        2591509400 ns/op
--- BENCH: BenchmarkBuild-8
    build_benchmark_test.go:162: genconfig[1].resources redefined as 3
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       3.841s

$ DEPTH_ONE_RESOURCE_COUNT=4 go test ./kustomize/commands/build -run nope -bench BenchmarkBuild
goos: windows
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
BenchmarkBuild-8               1        3785031200 ns/op
--- BENCH: BenchmarkBuild-8
    build_benchmark_test.go:162: genconfig[1].resources redefined as 4
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5/commands/build       5.099s

$

k8s-triage-robot · 2024-02-01T15:52:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

chlunde · 2024-02-01T21:07:47Z

/remove-lifecycle stale

This change introduces a benchmarking test that constructs a complete kustomization tree using various features of Kustomize. This update aims to address several objectives: * Demonstrating current performance challenges in Kustomize in a reproducible manner. * Evaluating the effects of performance enhancements. * Guarding against potential performance setbacks and inadvertent quadratic behavior in the future. * Considering the possibility of incorporating profile-guided optimization (PGO) in future iterations. Usage: $ make run-benchmarks go test -run=x -bench=BenchmarkBuild ./kustomize/commands/build pkg: sigs.k8s.io/kustomize/kustomize/v5/commands/build BenchmarkBuild-8 1 48035946042 ns/op PASS ok sigs.k8s.io/kustomize/kustomize/v5/commands/build 48.357s *Currently*, this benchmark requires 48 seconds to run on my machine. Updates kubernetes-sigs#5084 Notes on PGO: Real-life profiles would be better, but creating one based on a benchmark should not hurt: https://go.dev/doc/pgo#collecting-profiles > Will PGO with an unrepresentative profile make my program slower than no PGO? > It should not. While a profile that is not representative of production behavior will result in optimizations in cold parts of the application, it should not make hot parts of the application slower. If you encounter a program where PGO results in worse performance than disabling PGO, please file an issue at https://go.dev/issue/new. Collecting a profile: go test -cpuprofile cpu1.pprof -run=^$ -bench ^BenchmarkBuild$ sigs.k8s.io/kustomize/kustomize/v5/commands/build go build -pgo=./cpu1.pprof -o kust-pgo ./kustomize go build -o kust-nopgo ./kustomize Compare PGO and non-PGO-builds: ./kust-pgo build -o /dev/null testdata/ 21.88s user 2.00s system 176% cpu 13.505 total ./kust-nopgo build -o /dev/null testdata/ 22.76s user 1.98s system 174% cpu 14.170 total

k8s-ci-robot · 2024-04-16T19:36:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chlunde
Once this PR has been reviewed and has the lgtm label, please assign koba1t for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chlunde · 2024-04-16T19:37:51Z

I've added a few more code comments as requested, rebased on master and updated the code to use labels instead of commonLabels.

I've also reduced the dataset here to run in less than one minute on my machine (48s).
On top of the PR #5427 it takes around 2.4 seconds. @ncapps is there anything else you would like?

stormqueen1990 · 2024-04-21T23:26:34Z

kustomize/commands/build/build_benchmark_test.go

+			fn := fmt.Sprintf("res%d", res)
+			fmt.Fprintf(&buf, " - %v\n", fn)


This is meant to be a list of resources to be rendered in the kustomization, is that correct?
Could this be missing a .yaml extension?
I'm also curious if you need the %v formatter here -- since these are all strings I imagine %s is enough.

stormqueen1990 · 2024-04-21T23:31:03Z

kustomize/commands/build/build_benchmark_test.go

+// as an index into the given configs slice.
+//
+// The function is recursive and will call itself for config as long as resources > 0.
+func makeKustomization(configs []GenConfig, fSys filesys.FileSystem, path, id string, depth int) error {


A couple of questions on this function:

Is this being built with string concatenation to avoid the overhead of YAML serialization?

Is there a reason why passing the *testing.B parameter here and marking it as a helper with b.Helper() would be undesirable?

k8s-triage-robot · 2024-07-21T00:18:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

stormqueen1990 · 2024-07-31T21:53:01Z

Hi again, @chlunde!

Are you still interested in contributing these changes? I had a few open questions in my previous review that are still awaiting for an answer.

/remove-lifecycle stale

k8s-triage-robot · 2024-12-14T06:51:48Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 30, 2023

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2023

k8s-ci-robot requested review from annasong20 and natasha41575 October 30, 2023 11:38

This was referenced Oct 30, 2023

Performance improvements for kustomize build #5084

Open

Researching performance degradation (and improvement) over a number of releases #5422

Open

chlunde marked this pull request as draft October 30, 2023 21:40

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2023

ncapps reviewed Oct 30, 2023

View reviewed changes

kustomize/commands/build/build_benchmark_test.go Show resolved Hide resolved

ephesused reviewed Nov 2, 2023

View reviewed changes

kustomize/commands/build/build_benchmark_test.go Show resolved Hide resolved

kustomize/commands/build/build_benchmark_test.go Outdated Show resolved Hide resolved

chlunde force-pushed the benchmark branch from 2c2d808 to 4ae2545 Compare November 2, 2023 20:31

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2024

chlunde force-pushed the benchmark branch from 4ae2545 to 7e01c97 Compare April 16, 2024 19:13

chlunde force-pushed the benchmark branch from 7e01c97 to 3fe3dfd Compare April 16, 2024 19:36

chlunde marked this pull request as ready for review April 16, 2024 19:37

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2024

k8s-ci-robot requested review from stormqueen1990 and varshaprasad96 April 16, 2024 19:37

chlunde requested review from ncapps and ephesused April 16, 2024 19:37

stormqueen1990 reviewed Apr 21, 2024

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2024

koba1t closed this Jul 23, 2024

koba1t reopened this Jul 23, 2024

koba1t mentioned this pull request Jul 23, 2024

perf: Improve resmap performance with AppendAll and Transform functions #5427

Draft

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2024

koba1t assigned koba1t and stormqueen1990 and unassigned koba1t Sep 15, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add full program benchmark for kustomize build #5425

perf: Add full program benchmark for kustomize build #5425

chlunde commented Oct 30, 2023

k8s-ci-robot commented Oct 30, 2023

shapirus commented Oct 30, 2023

natasha41575 commented Oct 30, 2023

chlunde commented Oct 30, 2023

ncapps commented Oct 30, 2023

shapirus commented Oct 31, 2023 •

edited

Loading

ephesused commented Nov 2, 2023

k8s-triage-robot commented Feb 1, 2024

chlunde commented Feb 1, 2024

k8s-ci-robot commented Apr 16, 2024

chlunde commented Apr 16, 2024

stormqueen1990 Apr 21, 2024

stormqueen1990 Apr 21, 2024

k8s-triage-robot commented Jul 21, 2024

stormqueen1990 commented Jul 31, 2024

k8s-triage-robot commented Dec 14, 2024

		fn := fmt.Sprintf("res%d", res)
		fmt.Fprintf(&buf, " - %v\n", fn)

perf: Add full program benchmark for kustomize build #5425

Are you sure you want to change the base?

perf: Add full program benchmark for kustomize build #5425

Conversation

chlunde commented Oct 30, 2023

k8s-ci-robot commented Oct 30, 2023

shapirus commented Oct 30, 2023

natasha41575 commented Oct 30, 2023

chlunde commented Oct 30, 2023

ncapps commented Oct 30, 2023

shapirus commented Oct 31, 2023 • edited Loading

ephesused commented Nov 2, 2023

k8s-triage-robot commented Feb 1, 2024

chlunde commented Feb 1, 2024

k8s-ci-robot commented Apr 16, 2024

chlunde commented Apr 16, 2024

stormqueen1990 Apr 21, 2024

Choose a reason for hiding this comment

stormqueen1990 Apr 21, 2024

Choose a reason for hiding this comment

k8s-triage-robot commented Jul 21, 2024

stormqueen1990 commented Jul 31, 2024

k8s-triage-robot commented Dec 14, 2024

shapirus commented Oct 31, 2023 •

edited

Loading