-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
receive: Implement head series limits per tenant #5333
Conversation
Add the ability to limit the total number of series in head per tenant per receive replica. The limit can be specified as an additional HTTP header on the remote write request. Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>
Signed-off-by: Prem Saraswat <prmsrswt@gmail.com>
I think this is awesome, and we really need to have more of these options 👍 From a technical view, your two possible solutions make sense. From a user perspective, it becomes a bit harder. In a dynamic environment, we end up with a dynamic limit. Also the limit will always be roughly best effort. Just thinking out loud here; since all the components are capable of talking to each other. Why not keep a small state of the active series? This would solve a lot of guessing and we can be very precise on the limit. That said, I don't know if that's even possible in a timely manner and this solution works for me as well to start with. |
Please, do not forget to add some documentation about this. 🙏 Also, I couldn't find a good definition in Prometheus or Thanos docs of what is considered an "active series". This hides key information from the reader and I believe we should add a link to it in the docs of this feature. I found this one below in Prometheus 1.18 storage documentation, but it's not easy to understand (when is a chunk closed?).
This blog post from Grafana seems to have an easier to understand definition:
Maybe we should open an issue for Prometheus to define what's considered an "active series"? |
@wiardvanrij I think what you said makes sense, and we might not even need to make some shared state, if we can somehow utilise the replication step to share this info. I'll explore that a bit and report here if it doesn't work out. @douglascamata Yes! we need good documentation around it, especially with the tradeoffs that we are making. The definition of "active" series in this PR's context is simple. It's the number of series currently in the head block of the tenant's The definition from Prometheus you mentioned is a bit old, from Prometheus 1.8 days, the |
// If the ref is 0, it indicates that inserting the samples will create a new time series. | ||
// We do the seriesLimit check before actually creating the series in head because even after | ||
// a rollback, the series will stay in head, defeating the whole purpose of series limit. | ||
if ref == 0 && seriesLimit > 0 && s.NumSeries() >= seriesLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to the for loop on line 78? I think we don't need to allocate memory for labels in this case.
// We do the seriesLimit check before actually creating the series in head because even after | ||
// a rollback, the series will stay in head, defeating the whole purpose of series limit. | ||
if ref == 0 && seriesLimit > 0 && s.NumSeries() >= seriesLimit { | ||
_ = app.Rollback() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we trying to rollback in this case? We only check whether the series is new in TSDB head and we don't append anything. Do we still need this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't want to do partial writes, so I am explicitly rolling back the previous appends. I think due to the early return app.Commit()
won't be called, so no partial writes anyways, but still calling app.Rollback()
is looks like a nice idea to me as it will close the Appender
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be an amazing feature for any multi-tenant setup. I wonder what is the status of the PR and if any help is needed to push it forward :)
@@ -160,7 +160,7 @@ func TestWriter(t *testing.T) { | |||
w := NewWriter(logger, m) | |||
|
|||
for idx, req := range testData.reqs { | |||
err = w.Write(context.Background(), DefaultTenant, req) | |||
err = w.Write(context.Background(), 0, DefaultTenant, req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a test with a non-zero limit.
@@ -258,7 +262,7 @@ type replica struct { | |||
replicated bool | |||
} | |||
|
|||
func (h *Handler) handleRequest(ctx context.Context, rep uint64, tenant string, wreq *prompb.WriteRequest) error { | |||
func (h *Handler) handleRequest(ctx context.Context, rep, seriesLimit uint64, tenant string, wreq *prompb.WriteRequest) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seriesLimit
field seems to be added to the protobuf, so can we add it once to the request to avoid propagating it to every function?
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Changes
Add the ability to limit the total number of series in head per tenant per receive replica. The limit can be specified as an additional HTTP header (
X-Thanos-Series-Limit
by default) on the remote write request.As mentioned above, this limit is per tenant, per replica, which means that with the current implementation, the tenant can actually write more series in total than specified in the limit. For example, in a hashring with 3 instances of Receive, with a series limit of
100
, the tenant can actually write at most300
active series (assuming equal distribution of series between all 3 nodes). The actual limit can be less than300
in case the load is not equally distributed and one node hits the limit earlier, but it will never be less than100
.Alternative
An alternative approach is to calculate an effective
limit
by using thereplication factor
and number of nodes in the hashring. The limit actually used for local tsdb writes can bedefined_limit * replication_factor / num_nodes
.So essentially if the over all limit we want is
150
and we have3
nodes with replication factor1
, we will have a per node local limit of50
(150 * 1 / 3
).But this assumes that the data is equally distributed among all nodes, but in practice, it can happen that one node hits the
50
series limit earlier than other nodes, effectively denying the service before the user actually hit the150
series limit.Verification
Tested locally with multiple configurations of Receive including split Router and Ingester mode.