Skip to content
This repository has been archived by the owner on Sep 5, 2019. It is now read-only.

Emit metric to track null build latency #337

Merged
merged 5 commits into from
Oct 5, 2018

Conversation

imjasonh
Copy link
Member

@imjasonh imjasonh commented Sep 5, 2018

Proposed Changes

  • Optionally emit a metric that tracks how long a simple no-op build takes to schedule, execute and finish watching. This should help us identify serious latency regressions that might affect all builds.

TODO

I'm pretty sure there's more I need to do here, to get these metrics displayed at http://testgrid.knative.dev/knative-build -- like knative/serving has. I'm not sure if that change happens in this repo somehow/somewhere, or in whatever config powers testgrid. /cc @jessiezcc

Release Note

NONE

/hold

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ImJasonH

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adrcunha
Copy link
Contributor

adrcunha commented Sep 5, 2018

Once this is in it's just a matter of creating an aggregation job and a testgrid config. ;)

@imjasonh
Copy link
Member Author

imjasonh commented Sep 6, 2018

Once this is in it's just a matter of creating an aggregation job and a testgrid config. ;)

@adrcunha does this mean this PR can be submitted as-is? Is this really all that's necessary to start emitting metrics? 🤞

@adrcunha
Copy link
Contributor

adrcunha commented Sep 6, 2018

Yes, this PR is mostly done. But I would run e2e-tests.sh --emit-metrics locally to make sure it's logging the metric you want. Then, like I said, it's just a matter of adding a job to Prow and a display config to testgrid (file an issue and assign it to me).

@imjasonh
Copy link
Member Author

imjasonh commented Sep 7, 2018

@adrcunha Am I supposed to see the metric data in logs?

$ EMIT_METRICS=1 ./test/e2e-tests.sh
...snip...
==============================
==== RUNNING GO E2E TESTS ====
==============================
ok  	github.com/knative/build/test/e2e	38.923s
================================
==== RUNNING YAML E2E TESTS ====
================================
...snip...

I don't see metrics data in logs when I run knative/serving's e2e-tests.sh either, but it's possible I'm just missing something.

@adrcunha
Copy link
Contributor

adrcunha commented Sep 7, 2018

Yes. But don't set the environment variable, use the --emit-metrics flag I mentioned.

@imjasonh
Copy link
Member Author

imjasonh commented Sep 11, 2018

@adrcunha I'm apparently unable to run e2e tests locally now, with or without the flag. Getting:

Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:anonymous" cannot create clusterrolebindings.rbac.authorization.k8s.io at the cluster scope: Unknown user "system:anonymous"

(This happens on this branch, and on a fresh checkout from master)

Any idea what I may have done to get into this state? I thought the script started from scratch each time, is it possible some of my local config has crept in somehow? Prow seems to be able to run the tests just fine.

@adrcunha
Copy link
Contributor

I thought #286 was closed, let me take a look.

@imjasonh
Copy link
Member Author

After rebasing to pick up the fix for #286 I ran ./test/e2e-tests.sh --emit-metrics and saw:

...
==============================
==== RUNNING GO E2E TESTS ====
==============================
ok          github.com/knative/build/test/e2e       303.314s
================================
==== RUNNING YAML E2E TESTS ====
================================
>> Starting tests
...

Am I supposed to see metrics in the logs? Or do metrics get sent somewhere during the test that I don't see?

(Also, wow, the tests take 5 minutes now, that's kinda annoying)

@adrcunha
Copy link
Contributor

adrcunha commented Sep 14, 2018

(Also, wow, the tests take 5 minutes now, that's kinda annoying)

Hummm... They took 9s on an execution on 9/7... With #335 they took 190s... And then with #344 it went up to 300s.

Am I supposed to see metrics in the logs?

My apologies, local runs shortcut the test execution, so --emit-metrics is ignored. Unfortunately the workaround is hackish, you need to comment lines 196-199 in vendor/github.com/knative/test-infra/scripts/library.sh.

@adrcunha
Copy link
Contributor

The ugly hack above will be gone once knative/test-infra#120 is merged (and the vendored test-infra bumped again, of course).

@adrcunha
Copy link
Contributor

adrcunha commented Sep 17, 2018 via email

@adrcunha
Copy link
Contributor

I patched your PR and ran the tests:

$ sed -i -e '196,199d' vendor/github.com/knative/test-infra/scripts/library.sh
$ export PROJECT_ID=adrcunha-flex-test
$ ./test/e2e-tests.sh --emit-metrics 
===============================
==== CREATING TEST CLUSTER ====
===============================
...
==============================
==== RUNNING GO E2E TESTS ====
==============================
Running tests with 'go test -tags e2e ./test/e2e/... -count=1 -emitmetrics'
Finished run, return code is 0
Start parsing results, summary:
- TestSimpleBuild :PASS:
- TestFailingBuild :PASS:
- TestBuildLowTimeout :PASS:
- TestPendingBuild :PASS:
- TestPodAffinity :PASS:
- ok github.com/knative/build/test/e2e
Done parsing 5 tests
2018/09/16 22:11:35 Created namespace "build-tests"
=== RUN   TestSimpleBuild
--- PASS: TestSimpleBuild (6.17s)
=== RUN   TestFailingBuild
--- PASS: TestFailingBuild (5.37s)
=== RUN   TestBuildLowTimeout
--- PASS: TestBuildLowTimeout (78.85s)
=== RUN   TestPendingBuild
--- PASS: TestPendingBuild (121.24s)
=== RUN   TestPodAffinity
--- PASS: TestPodAffinity (120.33s)
PASS
2018/09/16 22:17:07 Deleted namespace "build-tests"
ok      github.com/knative/build/test/e2e       333.162s
================================
==== RUNNING YAML E2E TESTS ====
================================
...

Your test is not emitting metrics. From what I understand, the tests are not setting up logging in https://github.com/knative/serving/blob/2291c782bc3ef9f84a2b7eb37db4f745cfac26fa/test/e2e_flags.go#L59 like in

flag.Parse()

@imjasonh
Copy link
Member Author

imjasonh commented Oct 5, 2018

I got this to work 🎉

info	TestSimpleBuild	trace/trace.go:261	metric NullBuildLatency 1538772072387509233 1538772080954654257 8.567145024s

/hold cancel

@imjasonh imjasonh changed the title [WIP] Emit metric to track null build latency Emit metric to track null build latency Oct 5, 2018
@imjasonh
Copy link
Member Author

imjasonh commented Oct 5, 2018

/assign adrcunha

@jessiezcc
Copy link

/lgtm

@imjasonh
Copy link
Member Author

imjasonh commented Oct 5, 2018

@adrcunha Looks like all that's left is testgrid configs, as mentioned in #337 (comment)

Can you let me know when those are in and we'll have pretty graphs to 😍 over?

@adrcunha
Copy link
Contributor

Metrics are now live, details in knative/test-infra#186.

vdemeester pushed a commit to vdemeester/knative-build that referenced this pull request Apr 3, 2019
* Emit metric to track null build latency

* Emit metric to track null build latency

* Support EMIT_METRICS in e2e-tests.sh

* Attempt to set up metrics flags

* Fix metric setup
vdemeester pushed a commit to vdemeester/knative-build that referenced this pull request Apr 3, 2019
* Emit metric to track null build latency

* Emit metric to track null build latency

* Support EMIT_METRICS in e2e-tests.sh

* Attempt to set up metrics flags

* Fix metric setup
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants