-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing backend metrics #3365
Comments
Ping @marco-jantke, any idea on this possible wrong sum behavior ? |
The metrics can derive. This is all requests that are not "routable" to any of the backends (no matching frontends) will have the counter increased for the entrypoint metric but naturally not for the backend metric. This should be basically only 404s, though. Are you sure that the requests match to a backend, so a matching frontend is present? |
Yes quite sure, coincidentally I noticed this discrepancy after looking into the different metric values for 404s. One specific backend was responsible for the bulk of the 500s, with well over 8000 log lines but the metric showing a value of about 5 if I recall correctly, and without restarts or redeploys of the pod. The logs show the requests as having both a FrontendName and BackendName value. Additionally, but perhaps this should be a different issue, I noticed that metrics do not exist for backends that haven't handled any requests, where I would expect the metrics to exist with a value of 0. |
Can you paste me one of the log lines for a request that wasn't counted in the metrics? Regarding pre-population of metrics: we pre-poluate the |
Here is one, I had to redact some things though: {"BackendAddr":"redacted","BackendName":"redacted","BackendURL":{"Scheme":"http","Opaque":"","User":null,"Host":"redacted","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"ClientHost":"redacted","ClientUsername":"-","DownstreamContentSize":108,"DownstreamStatus":500,"DownstreamStatusLine":"500 Internal Server Error","Duration":12766684,"FrontendName":"redacted","OriginContentSize":108,"OriginDuration":12478122,"OriginStatus":500,"OriginStatusLine":"500 Internal Server Error","Overhead":288562,"RequestAddr":"redacted","RequestContentSize":0,"RequestCount":33668,"RequestHost":"redacted","RequestLine":"GET /redacted HTTP/1.1","RequestMethod":"GET","RequestPort":"-","RetryAttempts":0,"StartUTC":"2018-05-22T08:42:04.086828695Z","downstream_Content-Type":"application/json;charset=UTF-8","downstream_Date":"Tue, 22 May 2018 08:42:04 GMT","downstream_Vary":"Accept-Encoding","level":"info","msg":"","origin_Content-Type":"application/json;charset=UTF-8","origin_Date":"Tue, 22 May 2018 08:42:04 GMT","origin_Vary":"Accept-Encoding","request_Accept":"*/*","request_Accept-Encoding":"gzip, deflate","request_Accept-Language":"en-us","request_Cache-Control":"no-cache","request_Connection":"keep-alive","request_Pragma":"no-cache","request_User-Agent":"redacted","request_X-Forwarded-For":"redacted","request_X-Forwarded-Host":"redacted","request_X-Forwarded-Port":"443","request_X-Forwarded-Proto":"https","request_X-Forwarded-Server":"redacted","request_X-Real-Ip":"redacted","time":"2018-05-22T08:42:04Z"} I've also added some screenshots, the entrypoint metric has magically reappeared but the backend metrics are still missing since running |
Thanks for the report. I'm going to have a deeper look into this and whether we have similar problems. About the re-appearing metric it can be that you are affected of something that will be fixed with #3287. |
After digging into the issue I figured out that resetting metrics was not properly implemented. I pushed another commit to my PR acce4d9e533f2d7931a5b3738196a835f20b34d8 that will fix that. I'll also try to change the target of my PR to the 1.6. branch, in order to get this officially released sooner. A bit of background for you on what's the issue here: as the configuration of Traefik is dynamic, we had to make sure to remove metrics that belong to those dynamic configurations (e.g. |
The bug should be fixed in the latest release: https://github.com/containous/traefik/releases/tag/v1.6.3 Can you try it out and see whether it fixed your problems as well? |
Thanks for your work on this! The metrics no longer disappear. One minor note is that Prometheus appears to have reset the counters where I would expect it to take the 'restart' into account, have the metrics themselves changed a bit as well or is this expected behaviour? |
No, the counter reset is totally expected. This is how Prometheus works. Functions like I'm closing the issue therefore, thanks for reporting and in case you run into more problems, let us know :-) |
Do you want to request a feature or report a bug?
Bug
What did you do?
Roll out traefik 1.6.1 in a Kubernetes cluster with Prometheus metrics scraping, using ingress resources for dynamic configuration.
What did you expect to see?
The value of
traefik_entrypoint_requests_total{code="500"}
equal to the value ofsum(traefik_backend_requests_total{code="500"})
.What did you see instead?
traefik_entrypoint_requests_total{code="500"}
had a value of around 8000, whereas the total fortraefik_backend_requests_total{code="500"}
was around 8. The value of the entrypoints metric was reflected in the access log. Additionally, runningtraefik bug
as suggested completely removed these stats which preventing me from providing more exact values. Oddly, stats for other status codes (200, 302) are still present.Output of
traefik version
: (What version of Traefik are you using?)What is your environment & configuration (arguments, toml, provider, platform, ...)?
The text was updated successfully, but these errors were encountered: