-
-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backpressure Management #2410
Comments
What about using TPL Dataflow? |
The python implementation appears to use a combination of whether the queue is full or whether requests have been rate limited to adjust a downsample factor. This is then used to reduce the number of traces that get captured. Curiously, it looks like rate limits across any category would cause downsampling (not just rate limits applied to traces). It would be good to understand if that was deliberate and, if so, what the thinking was behind that design choice. @bitsandfoxes are we looking to mimic what the Python SDK is doing in .NET or are we also considering strategies like the one Matt originally proposed (some way to implement priority queues). |
We do have some docs on backpressure.
If traces get rate limited errors get downsampled? |
No the other way around... Downsampling only affects traces in the Python implementation. However if Metrics or Errors get rate limited, this would cause traces to be downsampled, if I've understood that code correctly. |
After talking to the other teams I think we should mimic the other SDKs.
|
Problem Statement
The internal
BackgroundWorker
class manages back-pressure for the SDK. When events, transactions, sessions, etc. are captured, theSentryClient
adds them to aConcurrentQueue
within the background worker. The background worker then has a separate task that is responsible for reading from the queue and sending events to theITransport
implementation (typically that's theHttpTransport
).This is generally a good design. However, it suffers from the limitation that there's only one queue for all envelope types, non-prioritized, with a single limit set by
SentryOptions.MaxQueueItems
- defaulting to 30. If the queue is full, envelopes are dropped when captured, never being added to the queue. Therefore, it's quite possible (especially in a server environment) that a large flood of one type of envelope (such as transactions) can prevent another more important type of envelope (such as error events) from being sent to Sentry.Solution Brainstorm
We need to support concurrency on the producer side, so the built-in
PriortyQueue
class is out. We need to be non-blocking on the consumer side, soBlockingCollection
is also out. There is no built-inPriortyConcurrentQueue
class. Thus, to implement prioritization, we will probably need to use multipleConcurrentQueue
instances in the background worker - perhaps in a dictionary where the key is the envelope type.We also should have more than one option for controlling the maximum queue size (this might be separate properties, or a dictionary). Currently we just have
MaxQueueItems
- which is undocumented.We should probably set higher default queue sizes for ASP.NET and ASP.NET Core. Currently,
MaxQueueSize
defaults to 30. We do show setting it higher in some of the sample apps, but with no explanation.Whatever is decided in getsentry/team-sdks#10 should be taken into consideration also.
We potentially have a solution in this spike (although it may need tweaking):
References
The text was updated successfully, but these errors were encountered: