Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Implement head series limits per tenant #5333

Closed
wants to merge 2 commits into from

Conversation

onprem
Copy link
Member

@onprem onprem commented May 5, 2022

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Add the ability to limit the total number of series in head per tenant per receive replica. The limit can be specified as an additional HTTP header (X-Thanos-Series-Limit by default) on the remote write request.

As mentioned above, this limit is per tenant, per replica, which means that with the current implementation, the tenant can actually write more series in total than specified in the limit. For example, in a hashring with 3 instances of Receive, with a series limit of 100, the tenant can actually write at most 300 active series (assuming equal distribution of series between all 3 nodes). The actual limit can be less than 300 in case the load is not equally distributed and one node hits the limit earlier, but it will never be less than 100.

Alternative

An alternative approach is to calculate an effective limit by using the replication factor and number of nodes in the hashring. The limit actually used for local tsdb writes can be defined_limit * replication_factor / num_nodes.

So essentially if the over all limit we want is 150 and we have 3 nodes with replication factor 1, we will have a per node local limit of 50 (150 * 1 / 3).

But this assumes that the data is equally distributed among all nodes, but in practice, it can happen that one node hits the 50 series limit earlier than other nodes, effectively denying the service before the user actually hit the 150 series limit.

Verification

Tested locally with multiple configurations of Receive including split Router and Ingester mode.

onprem added 2 commits May 5, 2022 17:49
Add the ability to limit the total number of series in head per tenant
per receive replica. The limit can be specified as an additional HTTP
header on the remote write request.

Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>
Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>
@wiardvanrij
Copy link
Member

I think this is awesome, and we really need to have more of these options 👍 From a technical view, your two possible solutions make sense. From a user perspective, it becomes a bit harder. In a dynamic environment, we end up with a dynamic limit. Also the limit will always be roughly best effort.

Just thinking out loud here; since all the components are capable of talking to each other. Why not keep a small state of the active series? This would solve a lot of guessing and we can be very precise on the limit.

That said, I don't know if that's even possible in a timely manner and this solution works for me as well to start with.

@douglascamata
Copy link
Contributor

Please, do not forget to add some documentation about this. 🙏

Also, I couldn't find a good definition in Prometheus or Thanos docs of what is considered an "active series". This hides key information from the reader and I believe we should add a link to it in the docs of this feature.

I found this one below in Prometheus 1.18 storage documentation, but it's not easy to understand (when is a chunk closed?).

A series with an open head chunk is called an active series

This blog post from Grafana seems to have an easier to understand definition:

For Prometheus, a time series is considered active if new data points have been generated within the last 15 to 30 minutes.

Maybe we should open an issue for Prometheus to define what's considered an "active series"?

@onprem
Copy link
Member Author

onprem commented May 8, 2022

@wiardvanrij I think what you said makes sense, and we might not even need to make some shared state, if we can somehow utilise the replication step to share this info. I'll explore that a bit and report here if it doesn't work out.

@douglascamata Yes! we need good documentation around it, especially with the tradeoffs that we are making. The definition of "active" series in this PR's context is simple. It's the number of series currently in the head block of the tenant's tsdb instance. So basically the number of unique series ingested since the last block was cut (i.e 2 hours window with default settings). I think the grafana definition might be same as well, it might just be that they cut new block every 15 to 30 minutes?

The definition from Prometheus you mentioned is a bit old, from Prometheus 1.8 days, the tsdb was completely rewritten with Prometheus 2.0, and with it, the definitions changed.

// If the ref is 0, it indicates that inserting the samples will create a new time series.
// We do the seriesLimit check before actually creating the series in head because even after
// a rollback, the series will stay in head, defeating the whole purpose of series limit.
if ref == 0 && seriesLimit > 0 && s.NumSeries() >= seriesLimit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the for loop on line 78? I think we don't need to allocate memory for labels in this case.

// We do the seriesLimit check before actually creating the series in head because even after
// a rollback, the series will stay in head, defeating the whole purpose of series limit.
if ref == 0 && seriesLimit > 0 && s.NumSeries() >= seriesLimit {
_ = app.Rollback()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we trying to rollback in this case? We only check whether the series is new in TSDB head and we don't append anything. Do we still need this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't want to do partial writes, so I am explicitly rolling back the previous appends. I think due to the early return app.Commit() won't be called, so no partial writes anyways, but still calling app.Rollback() is looks like a nice idea to me as it will close the Appender.

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an amazing feature for any multi-tenant setup. I wonder what is the status of the PR and if any help is needed to push it forward :)

@@ -160,7 +160,7 @@ func TestWriter(t *testing.T) {
w := NewWriter(logger, m)

for idx, req := range testData.reqs {
err = w.Write(context.Background(), DefaultTenant, req)
err = w.Write(context.Background(), 0, DefaultTenant, req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a test with a non-zero limit.

@@ -258,7 +262,7 @@ type replica struct {
replicated bool
}

func (h *Handler) handleRequest(ctx context.Context, rep uint64, tenant string, wreq *prompb.WriteRequest) error {
func (h *Handler) handleRequest(ctx context.Context, rep, seriesLimit uint64, tenant string, wreq *prompb.WriteRequest) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seriesLimit field seems to be added to the protobuf, so can we add it once to the request to avoid propagating it to every function?

@stale
Copy link

stale bot commented Sep 21, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 21, 2022
@stale stale bot closed this Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants