receive: Implement head series limits per tenant #5333

onprem · 2022-05-05T12:30:14Z

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Add the ability to limit the total number of series in head per tenant per receive replica. The limit can be specified as an additional HTTP header (X-Thanos-Series-Limit by default) on the remote write request.

As mentioned above, this limit is per tenant, per replica, which means that with the current implementation, the tenant can actually write more series in total than specified in the limit. For example, in a hashring with 3 instances of Receive, with a series limit of 100, the tenant can actually write at most 300 active series (assuming equal distribution of series between all 3 nodes). The actual limit can be less than 300 in case the load is not equally distributed and one node hits the limit earlier, but it will never be less than 100.

Alternative

An alternative approach is to calculate an effective limit by using the replication factor and number of nodes in the hashring. The limit actually used for local tsdb writes can be defined_limit * replication_factor / num_nodes.

So essentially if the over all limit we want is 150 and we have 3 nodes with replication factor 1, we will have a per node local limit of 50 (150 * 1 / 3).

But this assumes that the data is equally distributed among all nodes, but in practice, it can happen that one node hits the 50 series limit earlier than other nodes, effectively denying the service before the user actually hit the 150 series limit.

Verification

Tested locally with multiple configurations of Receive including split Router and Ingester mode.

Add the ability to limit the total number of series in head per tenant per receive replica. The limit can be specified as an additional HTTP header on the remote write request. Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>

Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>

wiardvanrij · 2022-05-05T15:11:23Z

I think this is awesome, and we really need to have more of these options 👍 From a technical view, your two possible solutions make sense. From a user perspective, it becomes a bit harder. In a dynamic environment, we end up with a dynamic limit. Also the limit will always be roughly best effort.

Just thinking out loud here; since all the components are capable of talking to each other. Why not keep a small state of the active series? This would solve a lot of guessing and we can be very precise on the limit.

That said, I don't know if that's even possible in a timely manner and this solution works for me as well to start with.

douglascamata · 2022-05-06T13:53:45Z

Please, do not forget to add some documentation about this. 🙏

Also, I couldn't find a good definition in Prometheus or Thanos docs of what is considered an "active series". This hides key information from the reader and I believe we should add a link to it in the docs of this feature.

I found this one below in Prometheus 1.18 storage documentation, but it's not easy to understand (when is a chunk closed?).

A series with an open head chunk is called an active series

This blog post from Grafana seems to have an easier to understand definition:

For Prometheus, a time series is considered active if new data points have been generated within the last 15 to 30 minutes.

Maybe we should open an issue for Prometheus to define what's considered an "active series"?

onprem · 2022-05-08T16:11:09Z

@wiardvanrij I think what you said makes sense, and we might not even need to make some shared state, if we can somehow utilise the replication step to share this info. I'll explore that a bit and report here if it doesn't work out.

@douglascamata Yes! we need good documentation around it, especially with the tradeoffs that we are making. The definition of "active" series in this PR's context is simple. It's the number of series currently in the head block of the tenant's tsdb instance. So basically the number of unique series ingested since the last block was cut (i.e 2 hours window with default settings). I think the grafana definition might be same as well, it might just be that they cut new block every 15 to 30 minutes?

The definition from Prometheus you mentioned is a bit old, from Prometheus 1.8 days, the tsdb was completely rewritten with Prometheus 2.0, and with it, the definitions changed.

yeya24 · 2022-05-08T20:34:04Z

pkg/receive/writer.go

+ // If the ref is 0, it indicates that inserting the samples will create a new time series.
+ // We do the seriesLimit check before actually creating the series in head because even after
+ // a rollback, the series will stay in head, defeating the whole purpose of series limit.
+ if ref == 0 && seriesLimit > 0 && s.NumSeries() >= seriesLimit {


Can we move this to the for loop on line 78? I think we don't need to allocate memory for labels in this case.

yeya24 · 2022-05-08T20:35:29Z

pkg/receive/writer.go

+ // We do the seriesLimit check before actually creating the series in head because even after
+ // a rollback, the series will stay in head, defeating the whole purpose of series limit.
+ if ref == 0 && seriesLimit > 0 && s.NumSeries() >= seriesLimit {
+ _ = app.Rollback()


What are we trying to rollback in this case? We only check whether the series is new in TSDB head and we don't append anything. Do we still need this line?

Because we don't want to do partial writes, so I am explicitly rolling back the previous appends. I think due to the early return app.Commit() won't be called, so no partial writes anyways, but still calling app.Rollback() is looks like a nice idea to me as it will close the Appender.

fpetkovski

This would be an amazing feature for any multi-tenant setup. I wonder what is the status of the PR and if any help is needed to push it forward :)

fpetkovski · 2022-07-08T13:49:19Z

pkg/receive/writer_test.go

@@ -160,7 +160,7 @@ func TestWriter(t *testing.T) {
 w := NewWriter(logger, m)

 for idx, req := range testData.reqs {
- err = w.Write(context.Background(), DefaultTenant, req)
+ err = w.Write(context.Background(), 0, DefaultTenant, req)


Would be nice to have a test with a non-zero limit.

fpetkovski · 2022-07-08T13:51:42Z

pkg/receive/handler.go

@@ -258,7 +262,7 @@ type replica struct {
 replicated bool
 }

-func (h *Handler) handleRequest(ctx context.Context, rep uint64, tenant string, wreq *prompb.WriteRequest) error {
+func (h *Handler) handleRequest(ctx context.Context, rep, seriesLimit uint64, tenant string, wreq *prompb.WriteRequest) error {


The seriesLimit field seems to be added to the protobuf, so can we add it once to the request to avoid propagating it to every function?

stale · 2022-09-21T00:13:15Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

onprem added 2 commits May 5, 2022 17:49

receive: Implement head series limits

ca5f86d

Add the ability to limit the total number of series in head per tenant per receive replica. The limit can be specified as an additional HTTP header on the remote write request. Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>

Return correct status code from RemoteWrite grpc method

948082a

Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>

pull-request-size bot added the size/L label May 5, 2022

yeya24 reviewed May 8, 2022

View reviewed changes

douglascamata mentioned this pull request Jun 2, 2022

Add write limits for tenants in Thanos Receiver #5404

Closed

6 tasks

fpetkovski reviewed Jul 8, 2022

View reviewed changes

stale bot added the stale label Sep 21, 2022

stale bot closed this Oct 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive: Implement head series limits per tenant #5333

receive: Implement head series limits per tenant #5333

onprem commented May 5, 2022 •

edited

Loading

wiardvanrij commented May 5, 2022

douglascamata commented May 6, 2022

onprem commented May 8, 2022 •

edited

Loading

yeya24 May 8, 2022

yeya24 May 8, 2022

onprem May 9, 2022

fpetkovski left a comment

fpetkovski Jul 8, 2022

fpetkovski Jul 8, 2022

stale bot commented Sep 21, 2022

receive: Implement head series limits per tenant #5333

receive: Implement head series limits per tenant #5333

Conversation

onprem commented May 5, 2022 • edited Loading

Changes

Alternative

Verification

wiardvanrij commented May 5, 2022

douglascamata commented May 6, 2022

onprem commented May 8, 2022 • edited Loading

yeya24 May 8, 2022

Choose a reason for hiding this comment

yeya24 May 8, 2022

Choose a reason for hiding this comment

onprem May 9, 2022

Choose a reason for hiding this comment

fpetkovski left a comment

Choose a reason for hiding this comment

fpetkovski Jul 8, 2022

Choose a reason for hiding this comment

fpetkovski Jul 8, 2022

Choose a reason for hiding this comment

stale bot commented Sep 21, 2022

onprem commented May 5, 2022 •

edited

Loading

onprem commented May 8, 2022 •

edited

Loading