-
Notifications
You must be signed in to change notification settings - Fork 277
pkg/injector: Add webhook time tracking facility #1852
Conversation
log example:
|
Codecov Report
@@ Coverage Diff @@
## main #1852 +/- ##
==========================================
- Coverage 59.19% 58.83% -0.37%
==========================================
Files 125 126 +1
Lines 5171 5196 +25
==========================================
- Hits 3061 3057 -4
- Misses 2107 2135 +28
- Partials 3 4 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change.
I'd like to see a more concrete description of the time tracking facility being introduced since its hard to reason from the code exactly what's going on.
Let's see if I can help.
Most of the ... ugly code is getting the podname. We could obviate this part of the code and it'd look way cleaner, but the log message would be less relevant without knowing which resource came triggered the webhook. |
pkg/injector/webhook.go
Outdated
timeoutValue, err := readTimeout(req) | ||
if err != nil { | ||
log.Error().Msgf("Could not read timeout from request url: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what circumstances will this error occur? Is there a scenario where the request will not have a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I do not know. I'm being cautions and just avoiding the call If it can't read it.
pkg/injector/webhook.go
Outdated
// Error logging when going beyond timeout value to process a webhook | ||
logEv = log.Error() | ||
} | ||
logEv.Msgf("Mutate Webhook for %s took %v to execute (%.2f of it's timeout, %v)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these custom logs necessary? K8s events should be logging timeouts anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offlined. We agreed timeout messaging/events are going to be seen in K8s events to begin with, so arguably we do not need to strictly log error on those from this side of the fence.
We however might want to keep at least debug log for this messages to debug/profile timings without requiring to plug in pprof in the picture.
I am wondering why this tracking facility is even required. The PR does not describe why this functionality is required. |
e88518e
to
165b878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very useful!
@eduser25 are you planning to iterate on the changes we discussed and agreed upon or do you prefer to merge this as is? |
Adding a small piece to help track time spent by individual routines handling webhooks. It parses the timeout value from the webhook call, and based on some static thresholds will log differently depending on how much time is taking vs. the timeout value given by the url.
d00c37c
to
2bf7427
Compare
* pkg/injector: Add webhook time tracking facility Adding a small piece to help track time spent by individual routines handling webhooks. It parses the timeout value from the webhook call, and based on some static thresholds will log differently depending on how much time is taking vs. the timeout value given by the url. * Update pkg/injector/errors.go * Addressing CR * Simplifying code * Simplifying code * Fix webhookTimeoutStr * Add tests for TimeoutParser and WebhokLogging
Adding a small piece to help track time spent by individual routines
handling webhooks.
It parses the timeout value from the webhook call, and based on some
static thresholds will log differently depending on how much time is
taking vs. the timeout value given by the url.
Affected area:
Control Plane [X]
Performance [X]
Does this change contain code from or inspired by another project? If so, did you notify the maintainers and provide attribution?
No