Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

pkg/injector: Add webhook time tracking facility #1852

Merged
merged 7 commits into from
Oct 23, 2020

Conversation

eduser25
Copy link
Contributor

Adding a small piece to help track time spent by individual routines
handling webhooks.

It parses the timeout value from the webhook call, and based on some
static thresholds will log differently depending on how much time is
taking vs. the timeout value given by the url.

Affected area:

  • Control Plane [X]

  • Performance [X]

  • Does this change contain code from or inspired by another project? If so, did you notify the maintainers and provide attribution?
    No

@eduser25 eduser25 requested a review from a team as a code owner October 15, 2020 20:00
pkg/injector/errors.go Outdated Show resolved Hide resolved
@eduser25
Copy link
Contributor Author

log example:

{"level":"debug","component":"sidecar-injector","time":"2020-10-15T02:55:38Z","file":"webhook.go:166","message":"Mutate Webhook for pod server/server took 1.841792726s time to execute (0.06 of it's timeout, 30s)"}

@codecov-io
Copy link

codecov-io commented Oct 15, 2020

Codecov Report

Merging #1852 into main will decrease coverage by 0.36%.
The diff coverage is 28.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1852      +/-   ##
==========================================
- Coverage   59.19%   58.83%   -0.37%     
==========================================
  Files         125      126       +1     
  Lines        5171     5196      +25     
==========================================
- Hits         3061     3057       -4     
- Misses       2107     2135      +28     
- Partials        3        4       +1     
Impacted Files Coverage Δ
pkg/injector/profiling.go 20.00% <20.00%> (ø)
pkg/injector/webhook.go 73.11% <60.00%> (-0.37%) ⬇️
...ertificate/providers/tresor/certificate_manager.go 64.04% <0.00%> (-12.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 622a23e...cedd099. Read the comment docs.

Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change.

I'd like to see a more concrete description of the time tracking facility being introduced since its hard to reason from the code exactly what's going on.

pkg/injector/webhook.go Show resolved Hide resolved
pkg/injector/webhook.go Outdated Show resolved Hide resolved
pkg/injector/webhook.go Outdated Show resolved Hide resolved
pkg/injector/webhook_test.go Outdated Show resolved Hide resolved
pkg/injector/webhook.go Outdated Show resolved Hide resolved
@eduser25
Copy link
Contributor Author

eduser25 commented Oct 15, 2020

its hard to reason from the code exactly what's going on.

Let's see if I can help.

  • on mutate handler, we read timeout from the url.
  • if reading the url succeeded, we defer a time-tracking function that will run when mutateHandler is done and will print how much time we spent in the handler itself.

Most of the ... ugly code is getting the podname. We could obviate this part of the code and it'd look way cleaner, but the log message would be less relevant without knowing which resource came triggered the webhook.

pkg/injector/webhook.go Outdated Show resolved Hide resolved
Comment on lines 177 to 179
timeoutValue, err := readTimeout(req)
if err != nil {
log.Error().Msgf("Could not read timeout from request url: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances will this error occur? Is there a scenario where the request will not have a timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I do not know. I'm being cautions and just avoiding the call If it can't read it.

pkg/injector/webhook.go Outdated Show resolved Hide resolved
pkg/injector/webhook.go Outdated Show resolved Hide resolved
pkg/injector/webhook.go Outdated Show resolved Hide resolved
Comment on lines 163 to 166
// Error logging when going beyond timeout value to process a webhook
logEv = log.Error()
}
logEv.Msgf("Mutate Webhook for %s took %v to execute (%.2f of it's timeout, %v)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these custom logs necessary? K8s events should be logging timeouts anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offlined. We agreed timeout messaging/events are going to be seen in K8s events to begin with, so arguably we do not need to strictly log error on those from this side of the fence.
We however might want to keep at least debug log for this messages to debug/profile timings without requiring to plug in pprof in the picture.

pkg/injector/webhook.go Outdated Show resolved Hide resolved
@shashankram
Copy link
Member

its hard to reason from the code exactly what's going on.

Let's see if I can help.

  • on mutate handler, we read timeout from the url.
  • if reading the url succeeded, we defer a time-tracking function that will run when mutateHandler is done and will print how much time we spent in the handler itself.

Most of the ... ugly code is getting the podname. We could obviate this part of the code and it'd look way cleaner, but the log message would be less relevant without knowing which resource came triggered the webhook.

I am wondering why this tracking facility is even required. The PR does not describe why this functionality is required.
When the webhook experiences a timeout, k8s API server will log an event regarding the timeout, so why is this even needed.

draychev
draychev previously approved these changes Oct 22, 2020
Copy link
Contributor

@draychev draychev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very useful!

@shashankram
Copy link
Member

@eduser25 are you planning to iterate on the changes we discussed and agreed upon or do you prefer to merge this as is?

shashankram
shashankram previously approved these changes Oct 22, 2020
Adding a small piece to help track time spent by individual routines
handling webhooks.

It parses the timeout value from the webhook call, and based on some
static thresholds will log differently depending on how much time is
taking vs. the timeout value given by the url.
@eduser25 eduser25 merged commit d3ade1e into openservicemesh:main Oct 23, 2020
@eduser25 eduser25 deleted the webhook-time branch October 23, 2020 19:43
draychev pushed a commit to draychev/osm that referenced this pull request Oct 28, 2020
* pkg/injector: Add webhook time tracking facility

Adding a small piece to help track time spent by individual routines
handling webhooks.

It parses the timeout value from the webhook call, and based on some
static thresholds will log differently depending on how much time is
taking vs. the timeout value given by the url.

* Update pkg/injector/errors.go

* Addressing CR

* Simplifying code

* Simplifying code

* Fix webhookTimeoutStr

* Add tests for TimeoutParser and WebhokLogging
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants