-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalVariables integration - race condition processing of cached frames #13415
Comments
I'm hoping with the digging I've done to root cause these it shouldn't require a high effort repro example -- but if it's needed I can create one (it'll just take me some time that I don't immediately have). |
For some clarity/context: This happens inconsistently but often in my application - I notice when on first startup, the first time an error is captured it will sometimes NOT have local variables attached to it. Some extra logging I added shows that this is because the worker sends the callFrames after the exception is processed - a race condition. To make this more easy to test/reproduce, you can add a temporary You can see the result in our sentry instance here:
Personally, I'm thinking there should be some mechanism where the exception capturing half will wait for the worker to report on the exception. There's some problems with that though:
So, maybe an even more ideal mechanism comes if we have a way of somehow linking the sentry exceptions with the exact debugger frame related to it, instead of just hoping they match. Then we don't have this weird additional behaviour with frames colliding. Thoughts? |
Hey @Bruno-DaSilva thanks for writing in! We'll look into your issue next week as this week is Hackweek at Sentry (see #13421). |
I think you're right, the simplest solution is to delay before checking for local variables and we can include a timeout. |
@timfish one wrench in that I'm thinking about is it may still have a race condition, albeit a smaller one: what if two similar requests/exceptions come in at once and both hit the debugger? Both exceptions will save in the LRU in the same key so only one will persist, and then the delay will finish and it will be attached to just one of the two What do you think? It's all rooted in the fact we can't match the thrown exceptions to their exact parent request from |
Yeah it's tricky. We could put the exception variables in a stack and only read from one end and we'd be more sure that we're getting them in the correct order. This would likely be reliable if we only had to consider uncaught exceptions. However, for caught exceptions there's nothing stopping users from calling |
I wonder if we could generate+inject a unique uuid on exception creation or exception throw that could then be used to match? I can't immediately think of a way of doing that for every possible exception without patching node internals though... EDIT: could we use the memory address for this (or perhaps some other very unique, unchanging id that might exist in an exception) for the exception? |
From the debugger we can modify almost anything in the app state so I wonder if we could write a unique ID property the error object which we could then read off later in runtime as it passes through the Sentry SDK 🤔 |
So I don't think we can't do this at the point when the debugger is initially paused. We'd need to step further on in execution until the error object is part of the scope. This would again open us up to race conditions with multiple exceptions occurring closely together. |
One (terrible) option is to monkeypatch |
@timfish I think I found a way to do what you were looking for. Check this out:
await session.post('Runtime.callFunctionOn', { functionDeclaration: 'function() { this.brunosId = "asfg"; }', objectId: data.objectId });
What do you think? |
Is there an existing issue for this?
How do you use Sentry?
Sentry Saas (sentry.io)
Which SDK are you using?
@sentry/node
SDK Version
8.26.0
Framework Version
Node v20.12.2
Link to Sentry event
First error's local variables (endpoint
fees/33/fee
) are blank: https://clearbanc.sentry.io/issues/5724968061/events/13ac4e69d73e4cb6b79160830ef972f7/First error's local variables are on the SECOND error's call (endpoint
fees/44/fee
) https://clearbanc.sentry.io/issues/5724968061/events/f697d4fe012e408d8fea4241d6d3c786/Reproduction Example/SDK Setup
SDK setup:
Unfortunately because it's a race condition it may be hard to reproduce without very specific code, error behaviour, and hardware. Hopefully you can trust that it does happen as an edge case and we can find some way to gate frame checks to wait for the debugger or something?
Steps to Reproduce
includeLocalVariables: true
Expected Result
Sentry UI should show exception #1 with state A and exception #2 with state B
Actual Result
Sentry UI shows exception #1 with NO state, and exception #2 with state A.
I dug into this and it appears there is a race condition with cachedFrames where we check the LRU cache for frames before the worker sends the frame info message to populate the cache. So then the second time the error is captured, it will receive the previous frame's information that was never consumed the first time. And then the worker sends the second frame's late, repeating ad nauseum.
The text was updated successfully, but these errors were encountered: