Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement queue to forward traces to metrics-generator #1331

Merged
merged 26 commits into from
Apr 22, 2022

Conversation

mapno
Copy link
Member

@mapno mapno commented Mar 8, 2022

What this PR does:

The distributor uses a forwarder to queue and forward requests to the metrics generators.

This queue will drop the oldest item in the queue if it's full and a new request is pushed.

Which issue(s) this PR fixes:
Contributes to #1303

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mapno mapno force-pushed the distributor-generator-queue branch 4 times, most recently from 00b61b7 to 7848ea9 Compare March 15, 2022 10:04
@mapno mapno marked this pull request as ready for review March 15, 2022 11:39
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I feel strongly that a go channel approach would be preferred. Let's discuss.

Edit:
Just thought of another question. How do we make sure these queues are drained before during shutdown?

modules/distributor/distributor.go Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts. I'm concerned about two things

  1. Each queue getting its own set of workers. It would be far simpler for all queues to share the same worker pool.
  2. Managing the overrides in the queueManager. It feels like there's a simpler pattern where the forwarder sees that an update occurred, creates a new queueManager for the tenant and adds it to the map with the new configuration, and then just tells the existing queueManager to drain itself.

modules/distributor/distributor.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
Copy link
Member

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very interested to see how this performs in a bigger cluster! This will reduce the concurrency with which we process requests, so I'm curious to see what amount of workers works well.

modules/distributor/distributor.go Outdated Show resolved Hide resolved
modules/distributor/distributor.go Outdated Show resolved Hide resolved
modules/distributor/distributor.go Outdated Show resolved Hide resolved
modules/distributor/distributor.go Outdated Show resolved Hide resolved
modules/distributor/distributor.go Outdated Show resolved Hide resolved
modules/overrides/overrides.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder_test.go Show resolved Hide resolved
@mapno mapno force-pushed the distributor-generator-queue branch 2 times, most recently from 4f85511 to e3afa0b Compare March 30, 2022 14:43
Copy link
Member

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments about how we manage the queueManager's, I think we can improve locking a bit so writes don't block each other.

modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Show resolved Hide resolved
modules/distributor/forwarder.go Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
@mapno mapno force-pushed the distributor-generator-queue branch from c05c36a to df8c71d Compare April 5, 2022 16:17
Copy link
Member

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've left some more comments. Thanks for the replies, the structure is making more sense now 🙂

I think these are the essential metrics for monitoring monitoring distributor -> metrics-generator traffic:

tempo_distributor_forwarder_queue_length
tempo_distributor_forwarder_pushes
tempo_distributor_forwarder_dropped_pushes
tempo_distributor_metrics_generator_pushes_total
tempo_distributor_metrics_generator_pushes_failures_total

Should we rename tempo_distributor_forwarder_dropped_pushes to tempo_distributor_forwarder_pushes_failures_total?

modules/overrides/limits.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/distributor/forwarder.go Show resolved Hide resolved
@mapno mapno force-pushed the distributor-generator-queue branch from df8c71d to ffd1033 Compare April 18, 2022 08:40
Copy link
Member

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this is looking good. The experiments in our internal environments also showed good results :)
I'm ready to approve and merge this PR, just left a comment about the metrics we are recording.

modules/distributor/forwarder.go Outdated Show resolved Hide resolved
modules/overrides/limits.go Outdated Show resolved Hide resolved
@mapno mapno merged commit f2406df into grafana:main Apr 22, 2022
@mapno mapno deleted the distributor-generator-queue branch April 22, 2022 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants