Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test] Strange ci-benchmark-scheduler-perf-master behavior #127245

Closed
macsko opened this issue Sep 9, 2024 · 10 comments · Fixed by #128834
Closed

[Failing Test] Strange ci-benchmark-scheduler-perf-master behavior #127245

macsko opened this issue Sep 9, 2024 · 10 comments · Fixed by #128834
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@macsko
Copy link
Member

macsko commented Sep 9, 2024

Which jobs are failing?

ci-benchmark-scheduler-perf-master

Which tests are failing?

Interestingly, we get benchmark results for all of the test cases, but the job is somehow failing (see testgrid).

Since when has it been failing?

6th September. No changes in scheduler_perf since 26th August.

Testgrid link

https://testgrid.k8s.io/sig-scalability-benchmarks#scheduler-perf

Reason for failure (if possible)

I have no idea what's going on with the test. Tests give the correct results and populate the perf-dash dashboard.

Anything else we need to know?

No response

Relevant SIG(s)

/sig scheduling

@macsko macsko added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Sep 9, 2024
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 9, 2024
@macsko
Copy link
Member Author

macsko commented Sep 9, 2024

cc: @pohly @sanposhiho

@pohly
Copy link
Contributor

pohly commented Sep 9, 2024

/assign

The first failed test case is:

=== FAIL: test/integration/scheduler_perf BenchmarkPerfScheduling/SchedulingCSIPVs/5000Nodes_5000Pods (unknown)

That should have caused the per-test log output to be preserved, which might have had more information, but it's not under the artifacts.

Something isn't quite right with output handling - will check.

@pohly
Copy link
Contributor

pohly commented Sep 10, 2024

SchedulingCSIPVs/5000Nodes_5000Pods has 48 as threshold. In https://storage.googleapis.com/kubernetes-jenkins/logs/ci-benchmark-scheduler-perf-master/1833184953530060800/build-log.txt, it has:

name                                                  SchedulingThroughput/Average
PerfScheduling/SchedulingBasic/5000Nodes_10000Pods-6                                                 351 ± 0%

name                                                  SchedulingThroughput/Perc50
PerfScheduling/SchedulingBasic/5000Nodes_10000Pods-6                                                 383 ± 0%

name                                                  SchedulingThroughput/Perc90
PerfScheduling/SchedulingBasic/5000Nodes_10000Pods-6                                                 509 ± 0%

name                                                  SchedulingThroughput/Perc95
PerfScheduling/SchedulingBasic/5000Nodes_10000Pods-6                                                 515 ± 0%

name                                                  SchedulingThroughput/Perc99
PerfScheduling/SchedulingBasic/5000Nodes_10000Pods-6                                                 518 ± 0%

name                                                  runtime_seconds
PerfScheduling/SchedulingBasic/5000Nodes_10000Pods-6                                                44.1 ± 0%

Why is the threshold so low?

... never mind, wrong test.

@pohly
Copy link
Contributor

pohly commented Sep 10, 2024

#125534 changed how go test output is processed. The difference between 0e4cf67..2049360 (working/not working) includes that.

I think we now hit gotestyourself/gotestsum#413 (comment).

pohly added a commit to pohly/test-infra that referenced this issue Sep 10, 2024
There are issues with gotestsum processing the JSON output of
benchmarks. Collecting the original output may help with debugging this.

If the files are not too large, then it may be worthwhile to keep this enabled
even after fixing the current
issue (kubernetes/kubernetes#127245).
pohly added a commit to pohly/test-infra that referenced this issue Sep 10, 2024
There are issues with gotestsum processing the JSON output of
benchmarks. Collecting the original output may help with debugging this.

If the files are not too large, then it may be worthwhile to keep this enabled
even after fixing the current
issue (kubernetes/kubernetes#127245).
@pohly
Copy link
Contributor

pohly commented Sep 13, 2024

I submitted gotestyourself/gotestsum#438. If this doesn't get merged soonish, it's probably better to revert #125534 and re-apply it later with a new gotestsum.

@pohly
Copy link
Contributor

pohly commented Sep 19, 2024

@macsko : how much longer are you willing to wait before we revert?

There have been some discussions around how to fix this in gotestsum, but no conclusion yet.

@macsko
Copy link
Member Author

macsko commented Sep 19, 2024

It's not that urgent until something bad happen with scheduling performance and we don't get an alert.

@pohly
Copy link
Contributor

pohly commented Sep 19, 2024

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 19, 2024
@sanposhiho
Copy link
Member

Welcome back, green lights! :)

Screenshot 2024-11-22 at 10 55 15

@pohly
Copy link
Contributor

pohly commented Nov 22, 2024

I know, it took a while... 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants