Skip to content

Introduce internal event "quality score" for outgoing events #10882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HazAT opened this issue Mar 1, 2024 · 17 comments
Closed

Introduce internal event "quality score" for outgoing events #10882

HazAT opened this issue Mar 1, 2024 · 17 comments
Assignees

Comments

@HazAT
Copy link
Member

HazAT commented Mar 1, 2024

Problem Statement

Javascript errors/events can be very noisy - and our SDK today send a lot of data.
We have simple mechanisms in place like InboundFilters

const DEFAULT_IGNORE_ERRORS = [
/^Script error\.?$/,
/^Javascript error: Script error\.? on line 0$/,
/^ResizeObserver loop completed with undelivered notifications.$/,
];

to remove specific errors that and not worth sending.

Tho this is not enough and still a lot of clutter reaches Sentry, from weird chrome extension errors to other edge cases of low quality events that are just noise (no good message / no stacktrace)

Solution Brainstorm

What if we add a very simple internal "Quality Score" for an error/event in the SDK and only if it reaches a certain value, we actually send it.

For example:

new Error(undefined) & no message & no stack trace
-> Drop

No exception, no stack trace, no message
-> Drop

Weird edge cases we identified ourselves by looking at Sentry Events
-> Drop

A stack trace with no message and only one single frame with <anonymous>
-> Drop

It could operate on the payload/JSON level to make it relatable to what enters Sentry.

@HazAT HazAT added this to the 8.0.0 milestone Mar 1, 2024
@antonpirker
Copy link
Member

Could this be done in relay so it works for all SDKs?

@HazAT
Copy link
Member Author

HazAT commented Mar 1, 2024

In theory, yes BUT
A lot of things depend on errors in the SDK.
For example: We trigger Replays for errors in SDKs - if it's just a worthless error, we also shouldn't trigger a Replay.

The same is true for transactions; we set the status to interal_error in transactions if an error happens - and again, if the error is not actually a relevant error, we mark perfectly fine transactions as "errored".

So this could be technically done in Relay to then also drop a Replay/transaction when there is a crappy error - tho it's super complex and not easy to implement and might even require tail-based sampling.

@Lms24
Copy link
Member

Lms24 commented Mar 5, 2024

I think this is a good idea and something we should explore a bit more.

new Error(undefined) & no message & no stack trace
-> Drop

No exception, no stack trace, no message
-> Drop

Big +1, probably easy to do in the SDK with little code (and yes probably SDK specific how such low quality errors can be identified) 👍

Weird edge cases we identified ourselves by looking at Sentry Events
-> Drop

Generally yes but I'd say these depend a lot on how we'd identify such individual edge cases. For instance, only by error message? we should consider adding them as a server-side inbound filter. checking against strings (or RegExp) could become bundle size intensive depending on how many such cases we handle. IIRC we already handle the infamous Script Error this way. Might be worth exploring if we could move this to the server.

If there's a deeper pattern (I dunno think something like, only anonymous stack frames or a certain amount of frames in combination with another factor) it probably makes sense to keep this in the SDK.

@HazAT
Copy link
Member Author

HazAT commented Mar 5, 2024

Yeah there is a balance to strike - but the goal is not to send them from the SDK because so much already is connected to errors (what I mentioned, Replays/Transactions etc.)

@scefali
Copy link
Contributor

scefali commented Mar 5, 2024

I may not have standing here, but the SDK seems like the wrong place to do it, especially for browser JS since this code will add to bundle size. We also require folks to update SDKs to get these changes (obviously) which means that doing this in replay is a higher leverage change by the number of users impacted immediately. I see there are some downsides doing this in Relay but I think we should do more investigation before we decide.

@mydea
Copy link
Member

mydea commented Mar 6, 2024

So just to add a perspective for Replay: Replay's error sampling is based on a 200 response when an error event is sent. So in order to keep this working "as expected", either we need to drop errors in the SDK, or they need to be dropped early enough that we actually get a non-200 status code. If errors are dropped anywhere later (e.g. as currently for inbound filters) the SDK does not know about it and thus replay sampling cannot take it into consideration.

@Lms24
Copy link
Member

Lms24 commented Apr 3, 2024

The googletag error should already be filtered out since 7.108/8.0.0-alpha.5 (via #11210 and #11208).

The other one (Cannot read properties of null (reading 'content')) is very generic. It has a stack trace, a generic filename and an error message that's similar to actual "valid" errors. We'd probably need to check for error kind + message + 1 stack frame from app:///popup.js to avoid filtering false positves.

@HazAT
Copy link
Member Author

HazAT commented Apr 3, 2024

app:///popup.js indicates that is is probably happening within a Chrome extension and for some reason our SDKs picks it up.

@HazAT
Copy link
Member Author

HazAT commented Apr 16, 2024

Another one
image
Permission denied to access property "correspondingUseElement"
ref: bitwarden/clients#7837

@HazAT
Copy link
Member Author

HazAT commented Apr 18, 2024

Consider: getsentry/sentry#69096

HazAT added a commit to getsentry/relay that referenced this issue Apr 18, 2024
…nd filter (#3442)

There are some errors in Sentry that throw this:


https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Property_access_denied

Often these errors are caused by extensions that do things that are not
allowed.
The samples we found in Sentry are thrown in minified React bundles.

ref:
getsentry/sentry-javascript#10882 (comment)
@Lms24
Copy link
Member

Lms24 commented Apr 18, 2024

RE getsentry/sentry#69096 and related issues where filtering on the message isn't enough. I think in theses cases we want to make filtering decisions on the original stack trace, meaning ideally, before our SDK integrations and event processors come in and potentially modify the stack frames (e.g. rewriteFramesIntegration). Perhaps we can use the new preprocessEvent integration hook to filter which should run before other event processors.

@lforst
Copy link
Member

lforst commented Apr 18, 2024

BIG BRAIN idea we just had: What if users would be able to tell Sentry "this is a list of all the static JS files I am deploying, please ignore everything that doesn't come from those files".

We have all the building blocks already: sentry-cli sourcemaps inject, bundler plugins, ...

@HazAT
Copy link
Member Author

HazAT commented May 28, 2024

image

@lforst
Copy link
Member

lforst commented May 29, 2024

Dumping a few thoughts for posterity:

  • Assuming we drop all events that do not have a stack trace, what happens to events captured through captureMessage which often don't have a stack trace?
  • What if somebody wants to manually and very intentionally capture an error but we drop it because of its quality? That would be very intransparent.
  • I looked around in the docs repo and 90% of the noise came from browser extensions or similar.
  • We could add an option to filter js-in-html errors.
  • We can probably throw away "failed to fetch" errors without stack trace
  • Screenshot 2024-05-29 at 14 50 28 Stuff like this looks generally looks unactionable. If you think it further though, the error message could have also been myVeryWellKnownFunction is not a function, and you would instantly know that you are accessing that function even though it's not defined. So we likely shouldn't throw those errors away (?).

Immediate plan of action based on my thoughts and research: Since the vast majority of noisy issues in the internal Sentry Sentry project or the Sentry Docs project come from browser extensions or similar, it makes most sense to a) work on getting the third party code filter integration out b) improve documentation around allowUrls and make it more prominent in the docs.

Next, we can filter out:

  • errors with handled: false, that neither have a message nor a stack trace - those can always go
  • "failed to fetch" errors without a stacktrace (maybe we can look into patching fetch to attach a stack trace as a next step?)
  • js-in-html errors (if the user has opted in)

@lforst
Copy link
Member

lforst commented Jun 3, 2024

Thread to revisit when we ship: https://www.reddit.com/r/nextjs/comments/1d5fxx8/anyone_using_sentry_while_not_getting_overwhelmed/

@AbhiPrasad AbhiPrasad removed this from the 8.0.0 milestone Jun 4, 2024
@lforst
Copy link
Member

lforst commented Jun 25, 2024

Summing up the changes we did:

  • Drop errors that are missing message and stack trace
  • Attach stack trace to fetch errors if they dont have one
  • Extend our default filters with ones that users get a lot but are unactionable
  • Add a new way of dropping errors unrelated code: https://sentry.io/changelog/ignore-errors-that-dont-come-from-your-code/
  • Educate people via docs, changelog and blog posts

@lforst lforst closed this as completed Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants