-
-
Notifications
You must be signed in to change notification settings - Fork 152
Customizing Sentry issues via fingerprinting
Courtlistener and other FLP applications use Sentry to catch errors
Sentry automatically groups errors into Issues. This makes them easier to monitor and analyze. For example, it shows the number of times, and the first and last date these events in an issue happened.
However, sometimes the automated Sentry groups/issues are not granular enough. For example, it may mix up all the ConnectionError
s for several different websites scraped by Juriscraper; when these errors should be analyzed by each scraper that connects to a different webpage. Given that the grouping/issue has too many events, the person monitoring it will usually "Archive" it, effectively silencing it.
This is not desirable. For these cases, we can override Sentry's automated groups from the backend, using what Sentry calls "fingerprinting"
A Sentry event is an instance of an error in our application. These events will be grouped into Issues.
Events/errors can be of 2 kinds: logged errors and uncontrolled exceptions
Explicit logger.error
calls put by the developer. Usually, they represent "expected" errors or data quality problems.
We can pass a fingerprint using the extra
argument
court_id = "nysupct"
...
logger.error("No citations found", extra={"fingerprint":[f"{court_id}-no-citation-found"]})
This fingerprint will be attached to the Sentry event
before it is sent to the Sentry server
In general, the more custom data that the logger.error
message contains, the better Sentry will group the events. In the example above, without the explicit fingerprint, all "No citations found" were being grouped in the same Sentry Issue. A better error message would add the opinion and court id: logger.error("No citations found for {opinion.id} and {court_id}", extra=...)
Exceptions that were not inside a try/except block.
For example, an standard library IndexError
, or a Django models' IntegrityError
.
They are sent to Sentry explicitly by using sentry_sdk.capture_exception
For example, from courtlistener's cl/scrapers/management/commands/cl_scrape_opinions.py
:
module_string = mod.Site().court_id
try:
self.parse_and_scrape_site(mod, options["full_crawl"])
except Exception as e:
capture_exception(
e, fingerprint=[module_string, "{{ default }}"]
)
Sentry expects a list as the value of fingerprint
. The order of that list matters
From the previous example,
capture_exception(
e, fingerprint=[module_string, "{{ default }}"]
)
Will have a different grouping than
capture_exception(
e, fingerprint=["{{ default }}", module_string]
)
for the same error. The differentiating key should come first
The same applies for fingerprints that consist of a single string inside the list
-
[f"{court_id} - {logged_error}"]
will separate issues bycourt_id
-
[f"{logged_error} - {court_id}"]
will mix different courts into the same issue
Some code that tests this can be seen here
At the bottom of a Sentry issue, there is an "Event Grouping Information" section.
If the fingerprinting worked, it should say Grouped by: custom fingerprint
.
Sentry may also take into account the custom fingerprint without giving it a 100% weight. This can be checked by expanding the "Event Grouping Information", which may also have some of these values:
Grouped by: exception stack-trace, in-app exception stack-trace
this message appears for issues that group uncontrolled exceptions
Grouped by: message
this appears on logged errors.