Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that performance dashboards work. #44003

Closed
Random-Liu opened this issue Apr 3, 2017 · 13 comments
Closed

Make sure that performance dashboards work. #44003

Random-Liu opened this issue Apr 3, 2017 · 13 comments
Assignees
Labels
area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Milestone

Comments

@Random-Liu
Copy link
Member

Now we have several performance tests:

All of them are using https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/perf_util.go, and printing the benchmark metrics into the test result directly.

http://perf-dash.k8s.io/ and http://node-perf-dash.k8s.io/ parse the test output, get the benchmark metrics and generate the performance dashboard.

However, both dashboards are broken now, because:

  1. All job names were changed in a recent test-infra refactoring, e.g. kubernetes-e2e-gce-scalability => ci-kubernetes-e2e-gce-scalability. However, the configuration in the dashboard is not changed https://github.com/kubernetes/contrib/blob/master/perfdash/config.go and https://github.com/kubernetes/contrib/blob/master/node-perf-dash/node-perf-dash-deployment.yaml.
  2. Each line of test log is prefixed with an extra timestamp now, which breaks the log parsing logic, e.g. https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scalability/1677/build-log.txt.

The whole logic is so brittle. We should generate a dedicated file for performance metrics, and let the dashboard consume the metrics file.

@dchen1107 @ixdy @krzyzacy @gmarek @wojtek-t
/cc @kubernetes/sig-scalability-misc @kubernetes/sig-node-bugs

@Random-Liu Random-Liu added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Apr 3, 2017
@ixdy
Copy link
Member

ixdy commented Apr 3, 2017

In particular, there's a variable set for all CI e2e runs that tells you where to put files - framework.TestContext.ReportDir. Any files saved there will be automatically uploaded to GCS.

(This variable is empty by default, so you'd need to handle that case appropriately.)

@wojtek-t
Copy link
Member

wojtek-t commented Apr 4, 2017

I think there is already some issue for it (though I couldn't find it now).
@shyamjvs wanted to look into it (but now he is on vacation)

@jeremyeder
Copy link

@sjug @mffiedler

@shyamjvs
Copy link
Member

@gmarek is working on it now.

k8s-github-robot pushed a commit that referenced this issue Apr 19, 2017
Automatic merge from submit-queue (batch tested with PRs 44667, 44673)

Allow summaries to be printed out to ReportDir instead of stdout

Fix #44003

cc @shyamjvs
@Random-Liu
Copy link
Member Author

Is this fixed? Both performance dashboards are still not working.

@gmarek
Copy link
Contributor

gmarek commented Apr 25, 2017

Oh sorry - the storing data part is done (at least on our side). At least for our data. Perf-dash was broken consciously, as we didn't redirect it to read dedicated files (we don't have people to do it sadly:/). I'm reopening the bug with a different title.

@wojtek-t @shyamjvs

@gmarek gmarek reopened this Apr 25, 2017
@gmarek gmarek changed the title Performance Tests (density/load) should store dedicated files with metrics Make sure that performance dashboards work. Apr 25, 2017
@Random-Liu
Copy link
Member Author

/cc @vishh

@vishh
Copy link
Contributor

vishh commented Jun 1, 2017

cc @derekwaynecarr

@gmarek
Copy link
Contributor

gmarek commented Jun 14, 2017

On our side (perfdash.k8s.io) everything's fine.

k8s-github-robot pushed a commit that referenced this issue Jun 14, 2017
Automatic merge from submit-queue (batch tested with PRs 47470, 47260, 47411, 46852, 46135)

Logs node e2e perf data to standalone json files

Fixes the node-dash-perf issue in #44003.

- Move perf data types to `test/e2e/perftype/perftype.go` so that the node-perf-dash can depend on.
- Logs the perf data to standalone json files so that node-perf-dash can consume it easily. A sample run of `ci-kubernetes-node-kubelet-benchmark` is at https://console.cloud.google.com/storage/browser/ygg-gke-dev-bucket/e2e-node-test/ci-kubernetes-node-kubelet-benchmark/1.

The corresponding changes in node-perf-dash is at kubernetes-retired/contrib#2628.

**Release note**:
`None`

/sig node
/area node-e2e
/assign @Random-Liu
@yguo0905
Copy link
Contributor

http://node-perf-dash.k8s.io has been fixed. This can be closed.

@derekwaynecarr
Copy link
Member

agreed, its working for me as well.

@yujuhong
Copy link
Contributor

Thanks @yguo0905 for fixing this!

@gmarek
Copy link
Contributor

gmarek commented Jun 16, 2017

Perfect. Thanks a lot @yguo0905!

k8s-github-robot pushed a commit that referenced this issue Jun 27, 2017
Automatic merge from submit-queue (batch tested with PRs 47675, 48001)

Encodes ReportPrefix into the generated metrics file names

Ref: #44003

Adds the test prefix to be part of the name. Otherwise the same test case running on different images will override each other. Nothing needs to be changed at the node-perf-dash side.

See test run at https://console.cloud.google.com/storage/browser/ygg-gke-dev-bucket/e2e-node-test/ci-kubernetes-node-kubelet-benchmark/10.


**Release note**:
```
None
```

/sig node
/area node-e2e
/assign @Random-Liu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests