-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary of errors in logs that are not yet monitored #3
Comments
This is an initial step in adding metrics for the errors identified in rust-lang#3.
This is an initial step in adding metrics for the errors identified in rust-lang#3.
This is an initial step in adding metrics for the errors identified in rust-lang#3.
While logging certainly makes sense, I'm wondering if it would be better to use Sentry more for these things 🤔 |
I think we should do both where possible. My original motivation for investigating was to make sure we capture Heroku platform level error codes, where the request may not make it to the backed, or where the backend completes successfully but for some reason the user still sees an error. Then by adopting the existing prefix, we can ensure that all levels of errors end up in at least one place together. |
Here is a summary of
error=""
entries in our logs that we may want to monitor more closely in our metrics. We may want to do like Heroku does and assign code values to these error cases. We should ensure these all have anat=error
prefix so that they can be easily ingested from logs.error="canceling statement due to statement timeout"
error="unhealthy database pool"
error="there is no unique or exclusion constraint matching the ON CONFLICT specification"
error="end of file reached"
(on crate publish endpoint)downloads_counter error: unhealthy database pool
at=error mod=downloads_counter error="unhealthy database pool"
.Error: error sending request for url (https://events.pagerduty.com/generic/2010-04-15/create_event.json): operation timed out
error="failed to upload crate: error sending request for url (https://crates-io.s3-us-west-1.amazonaws.com/crates/xyz/xyz-0.2.0.crate): connection closed before message completed"
error="missing user {private.inviter_id}"
,error="missing crate with id {invitation.crate_id}"
.Additionally, we may want to add an
at=warn
prefix that could be used to flag slow requests and other operationally interesting events that aren't strictly errors.The text was updated successfully, but these errors were encountered: