operator: Expose per_stream_rate_limit & burst #9630

jpinsonneau · 2023-06-05T13:09:35Z

What this PR does / why we need it:
These changes allow users to configure PerStreamRateLimit and PerStreamRateLimitBurst in IngestionLimitSpec.
This is needed for Netobserv to increase the default values in some large clusters scenarios.

Which issue(s) this PR fixes:
None

Special notes for your reviewer:
I have forced using MB unit for consistency with ingestion_rate_mb.
Please let me know if you prefer to keep it flexible to any human readable forms (1MB, 256KB, etc).

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

MasslessParticle · 2023-06-05T14:16:36Z

If you're exposing these, it's probably also worthwhile to expose stream sharding. This feature works in conjunction with the per_stream_rate_limit so that Loki should never hit the limit. For Example, you'd configure your loki with a psrl of 5mb and a desired_rate of 1mb and only bump into the limit in extreme cases.

JStickler

[docs team] LGTM, one question...

operator/docs/lokistack/sop.md

jpinsonneau · 2023-06-06T07:29:32Z

If you're exposing these, it's probably also worthwhile to expose stream sharding. This feature works in conjunction with the per_stream_rate_limit so that Loki should never hit the limit. For Example, you'd configure your loki with a psrl of 5mb and a desired_rate of 1mb and only bump into the limit in extreme cases.

Thanks for sharing that ! We need to try this for our use case and see the impact on performances before coming back with a followup.
cc @periklis

periklis

Looks overall good to me. I am just not sure if we should introduce some validation to ensure people understand that their inputs are in MB and not MiBi or bytes or what ever. It is a minor thing but let's give it a spin discussing this. What's your opinion?

periklis · 2023-06-06T07:51:45Z

If you're exposing these, it's probably also worthwhile to expose stream sharding. This feature works in conjunction with the per_stream_rate_limit so that Loki should never hit the limit. For Example, you'd configure your loki with a psrl of 5mb and a desired_rate of 1mb and only bump into the limit in extreme cases.

@MasslessParticle This is something we aim for but I am still not 100% sure how stable this feature is. I have been tracking the PRs since last year on this topic and noticed that pieces when disjointed into the code base and not based on LID. Don't get me wrong it is an excellent idea but the implementation went into Loki a bit experimental. It started with the distributor/ingester pair and then PR by PR trickling down into queriers, etc. In addition I am not 100% if we did not overlook any impact on the ruler side. If you have a full picture on how far we are with this feature I am glad to be wrong :)

@jpinsonneau Yes let's leave stream sharding for a follow-up PR.

jpinsonneau · 2023-06-06T11:04:57Z

Looks overall good to me. I am just not sure if we should introduce some validation to ensure people understand that their inputs are in MB and not MiBi or bytes or what ever. It is a minor thing but let's give it a spin discussing this. What's your opinion?

For me, the best would be to move everything to flagext.ByteSize and improve it.
Currently IngestionRateMB & IngestionBurstSizeMB are in MB using float64
whereas MaxLineSize, PerStreamRateLimit, PerStreamRateLimitBurst, MaxQueryBytesRead & MaxQuerierBytesRead are in human readable format using flagext.ByteSize.

periklis · 2023-06-06T11:28:53Z

Looks overall good to me. I am just not sure if we should introduce some validation to ensure people understand that their inputs are in MB and not MiBi or bytes or what ever. It is a minor thing but let's give it a spin discussing this. What's your opinion?

For me, the best would be to move everything to flagext.ByteSize and improve it. Currently IngestionRateMB & IngestionBurstSizeMB are in MB using float64 whereas MaxLineSize, PerStreamRateLimit, PerStreamRateLimitBurst, MaxQueryBytesRead & MaxQuerierBytesRead are in human readable format using flagext.ByteSize.

Yeah, IngestionRateMB and IngeationBurstSizeMB are still around and remind me of the early Loki days where flagext.ByteSize was not used. They will say as such for now because our API is already GA. OTH I am not sure if we can use flagext.ByteSize in kubernetes CRDs. Would you mind to research this a bit?

MasslessParticle · 2023-06-06T18:31:19Z

It started with the distributor/ingester pair and then PR by PR trickling down into queriers, etc. In addition I am not 100% if we did not overlook any impact on the ruler side.

@periklis The implementation for this only exists in the distributor/ingester. afaik, there is nothing to do with stream sharding in the queriers. We're running it in all our production environments and consider it to be stable

We have overloaded the word "sharding" too much so maybe something ran together?

periklis · 2023-06-06T18:46:01Z

It started with the distributor/ingester pair and then PR by PR trickling down into queriers, etc. In addition I am not 100% if we did not overlook any impact on the ruler side.

@periklis The implementation for this only exists in the distributor/ingester. afaik, there is nothing to do with stream sharding in the queriers. We're running it in all our production environments and consider it to be stable

We have overloaded the word "sharding" too much so maybe something ran together?

IIRC we had some PR melting the stream shards in the querier together? Maybe I am wrong, but didn't we put individual shard into the chunks and thus need to merge them together on query time?

MasslessParticle · 2023-06-06T18:49:59Z

There isn't anything explicitly merging sharded streams together. That happens automatically as part of the query process.

If we have the sharded stream:

{foo="bar", __stream_shard__=1}
{foo="bar", __stream_shard__=2}

Then both are merged at query time when querying for {foo="bar"} like any other streams

periklis · 2023-06-06T18:54:25Z

There isn't anything explicitly merging sharded streams together. That happens automatically as part of the query process.

If we have the sharded stream:
{foo="bar", __stream_shard__=1}
{foo="bar", __stream_shard__=2}
Then both are merged at query time when querying for {foo="bar"} like any other streams

Ok! I was under the wrong impression that __stream_shard__ is not visible to the user and eliminated at query time.

On a further note, since you run this on production for quite some time. How painful was the series churn after enabling this? Or was this negligible?

MasslessParticle · 2023-06-06T19:42:42Z

We had some issues with series churn and over-sharding. We addressed those and now we're not seeing any issues from series churn.

periklis · 2023-06-06T20:22:53Z

We had some issues with series churn and over-sharding. We addressed those and now we're not seeing any issues from series churn.

You don't happen to have pointers to the PR addressing these issues?

MasslessParticle · 2023-06-06T21:45:16Z

It was this one. #9414

We saw some issues with resource consumption due to over sharding. Additionally, we saw the first few payloads cause a ton of stream churn. This PR addressed both by looking at the stream rate over a larger period of time.

It has the interesting consequence that the per_stream_rate_limit becomes a "max_payload_size" limit in the case where the stream hasn't been seen before.

Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>

[release-5.7] Backport PR grafana#9630

Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>

[release-5.6] Backport PR grafana#9630

per_stream_rate_limit & burst limits configurable

8437458

pull-request-size bot added the size/L label Jun 5, 2023

github-actions bot added the sig/operator label Jun 5, 2023

jpinsonneau changed the title ~~NETOBSERV-717 per_stream_rate_limit & burst limits configurable~~ NETOBSERV-717 Expose per_stream_rate_limit & burst limits Jun 5, 2023

jpinsonneau changed the title ~~NETOBSERV-717 Expose per_stream_rate_limit & burst limits~~ NETOBSERV-717 Expose per_stream_rate_limit & burst Jun 5, 2023

update changelog

6afa06f

periklis changed the title ~~NETOBSERV-717 Expose per_stream_rate_limit & burst~~ operator: Expose per_stream_rate_limit & burst Jun 5, 2023

jpinsonneau marked this pull request as ready for review June 5, 2023 13:21

jpinsonneau requested a review from a team as a code owner June 5, 2023 13:22

JStickler approved these changes Jun 5, 2023

View reviewed changes

operator/docs/lokistack/sop.md Outdated Show resolved Hide resolved

updated doc reason

d6971ac

periklis reviewed Jun 6, 2023

View reviewed changes

Merge branch 'main' into netobserv-717

cefe341

periklis merged commit e680239 into grafana:main Jun 7, 2023

periklis added a commit to periklis/loki that referenced this pull request Jun 30, 2023

operator: Expose per_stream_rate_limit & burst (grafana#9630)

a0a319b

Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>

periklis added a commit to openshift/loki that referenced this pull request Jun 30, 2023

Merge pull request #158 from periklis/backport-pr-9630-to-5.6

c07a8d7

[release-5.7] Backport PR grafana#9630

periklis added a commit to periklis/loki that referenced this pull request Jun 30, 2023

operator: Expose per_stream_rate_limit & burst (grafana#9630)

0aeb1a9

Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>

periklis added a commit to openshift/loki that referenced this pull request Jun 30, 2023

Merge pull request #159 from periklis/backport-pr-9630-to-5.6.z

4bfb355

[release-5.6] Backport PR grafana#9630

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator: Expose per_stream_rate_limit & burst #9630

operator: Expose per_stream_rate_limit & burst #9630

jpinsonneau commented Jun 5, 2023 •

edited

Loading

MasslessParticle commented Jun 5, 2023

JStickler left a comment

jpinsonneau commented Jun 6, 2023

periklis left a comment

periklis commented Jun 6, 2023

jpinsonneau commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

operator: Expose per_stream_rate_limit & burst #9630

operator: Expose per_stream_rate_limit & burst #9630

Conversation

jpinsonneau commented Jun 5, 2023 • edited Loading

MasslessParticle commented Jun 5, 2023

JStickler left a comment

Choose a reason for hiding this comment

jpinsonneau commented Jun 6, 2023

periklis left a comment

Choose a reason for hiding this comment

periklis commented Jun 6, 2023

jpinsonneau commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

periklis commented Jun 6, 2023

MasslessParticle commented Jun 6, 2023

jpinsonneau commented Jun 5, 2023 •

edited

Loading