-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(metrics): Partition and split metrics buckets just before sending #2682
Conversation
5eab309
to
28e591e
Compare
0ccce13
to
51e3fd3
Compare
a7a6fd6
to
ce5eaea
Compare
1116f30
to
3a01dde
Compare
ea4fac9
to
d8975a6
Compare
Any chance we can split this PR? 2,000+k LOC is complicated to follow in a review. |
tried to make a visualization Before: graph TD
Project -->|RateLimitBuckets| EnvelopeProcessorService
Project -->|MergeBuckets| AggregatorService
EnvelopeProcessorService -->|MergeBuckets| AggregatorService
AggregatorService -->|FlushBuckets| ProjectCacheService
ProjectCacheService -->|SendMetrics| EnvelopeManagerService
EnvelopeManagerService -->|SendRequest| UpstreamRelay
After: graph TD
Project-->|RateLimitBuckets|EnvelopeProcessorService
Project-->|MergeBuckets|AggregatorService
EnvelopeProcessorService-->|MergeBuckets|AggregatorService
AggregatorService -->|FlushBuckets| ProjectCacheService
ProjectCacheService-->|SendMetrics|EnvelopeManagerService
EnvelopeManagerService-->|EncodeMetrics|EnvelopeProcessor
EnvelopeProcessor-->|SubmitEnvelope|EnvelopeManagerService
edit: your newest one: graph TD
Project-->|RateLimitBuckets|EnvelopeProcessorService
Project-->|MergeBuckets|AggregatorService
EnvelopeProcessorService-->|MergeBuckets|AggregatorService
AggregatorService -->|FlushBuckets| ProjectCacheService
ProjectCacheService-->|EncodeMetrics|EnvelopeProcessor
EnvelopeProcessor-->|SubmitEnvelope|EnvelopeManagerService
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as iker said, if you're able to split this up then that would be greatly appreciated, It's hard to confidently approve a PR with so many changes
f7d7690
to
43d469b
Compare
@TBS1996 @iker-barriocanal the PR is now 2 commits:
|
e1c7b2e
to
da0ebea
Compare
I think the PR is looking very good, there's a lot of code that looks a lot cleaner now. It's difficult to have everything in my working memory though, which makes it hard to approve it. Btw, it's helpful if you leave some comments on the github diff making some explanations, such as which functions you've changed and which you've just moved, or if you've both moved a variable and renamed it, it's nice to have a small note about that. It's not really necessary in smaller PRs but here it would have been very helpful. I'll probably approve soon, in the meantime you could ping someone else to take a look too, there should hopefully be more than 1 reviewer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces some interesting logic, but I think it's trying to do too many things at once.
Slicing metrics without additional allocations is a good performance improvement. However:
- It introduces a case that may result in panics. We should not introduce panics and do our best effort to identify and mitigate them.
- It has a significant amount of additional complexity to fix a problem I'm not sure we have -- do we have a performance issue on metric bucket splitting?
- I believe the previous implementation produces full metric buckets (but the last one), which seems to not be the case now. I'm not sure what the impact of this change is, but more buckets to send through the wire may have higher performance implications than splitting buckets in memory.
- The complexity (not just size) makes providing a meaningful review difficult.
Let me know if I'm missing or misunderstanding something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
I would suggest to avoid the code which could panic and I think we can good to go and test it in production.
da0ebea
to
cdbe965
Compare
@iker-barriocanal sorry for the late response, some points adressed:
It's 2 things basically:
There is a way to keep the splitting in the aggregator by reference counting the split buckets, but I feel like this introduces a bigger mental overhead and also it does not logically belong to the aggregator.
Panics are gone now.
I do not have actual numbers on this, this all depends on the volume of data. So this means with less relay instances the problem actually becomes worse.
I am not sure what you mean by this. The logic should have stayed the same. If you split on the last one, there will be also a split bucket at the start of the next batch. |
The roundtrip through the envelope manager is gone now. Metrics are sent directly to the processor instead. |
* master: (27 commits) ref(metric-meta): Add metric for total incoming metric meta (#2784) feat(server): Return global config status for downstream requests (#2765) ref(processor): Create event processor sub-module with related code (#2779) fix(metrics): Temporarily restore previous configuration keys for bucket splitting (#2780) feat(metrics): Add source context to code locations (#2781) ref(processor): Split off profile processor code into separate sub-module (#2778) ref(processor): Split off replay processing code into separate sub-module (#2776) ref(metrics): Partition and split metrics buckets just before sending (#2682) feat(spans): Allow well-known path segments in resource URLs (#2770) ref(processor): Move user and client reports processing into separate submodule (#2772) ref(crons): Include message_type in kafka message (#2723) ref(spans): Split tag mapping in specific configs (#2773) release: 23.11.2 feat(metric-meta): Normalize invalid metric names (#2769) feat(spans): Extract main_thread tag for spans (#2761) Add DE Deployments (#2746) ref(processor): Move sessions related code into separate sub-module (#2768) ref(metric-meta): Capture envelope payload in a sentry issue (#2767) release: 0.8.38 ref(normalization): Restore span processing to transactionprocessor (#2764) ...
Moves metric partitioning and splitting into the envelope processor. Metrics are now sliced without any allocations.
Also track metric outcomes now instead of merging buckets back into the aggregator.
Tracking outcomes for metrics has introduced a side effect that all relay generated envelops for metrics are now properly tracked which impacts the accepted metrics (
relay.event.accepted
) of envelopes. This will cause a considerable bump in the accepted metric, but does not actually indicate a change in the overall amount of envelopes accepted by relay.