-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement queue to forward traces to metrics-generator #1331
Conversation
00b61b7
to
7848ea9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I feel strongly that a go channel approach would be preferred. Let's discuss.
Edit:
Just thought of another question. How do we make sure these queues are drained before during shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts. I'm concerned about two things
- Each queue getting its own set of workers. It would be far simpler for all queues to share the same worker pool.
- Managing the overrides in the queueManager. It feels like there's a simpler pattern where the forwarder sees that an update occurred, creates a new queueManager for the tenant and adds it to the map with the new configuration, and then just tells the existing queueManager to drain itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be very interested to see how this performs in a bigger cluster! This will reduce the concurrency with which we process requests, so I'm curious to see what amount of workers works well.
4f85511
to
e3afa0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments about how we manage the queueManager
's, I think we can improve locking a bit so writes don't block each other.
c05c36a
to
df8c71d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've left some more comments. Thanks for the replies, the structure is making more sense now 🙂
I think these are the essential metrics for monitoring monitoring distributor -> metrics-generator traffic:
tempo_distributor_forwarder_queue_length
tempo_distributor_forwarder_pushes
tempo_distributor_forwarder_dropped_pushes
tempo_distributor_metrics_generator_pushes_total
tempo_distributor_metrics_generator_pushes_failures_total
Should we rename tempo_distributor_forwarder_dropped_pushes
to tempo_distributor_forwarder_pushes_failures_total
?
Use eviciting queue to buffer and send push requests from the distributor to the generator. This queue will drop the oldest item in the queue if it's full and a new request is pushed.
df8c71d
to
ffd1033
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this is looking good. The experiments in our internal environments also showed good results :)
I'm ready to approve and merge this PR, just left a comment about the metrics we are recording.
What this PR does:
The distributor uses a forwarder to queue and forward requests to the metrics generators.
This queue will drop the oldest item in the queue if it's full and a new request is pushed.
Which issue(s) this PR fixes:
Contributes to #1303
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]