Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: Expose per_stream_rate_limit & burst #9630

Merged
merged 4 commits into from
Jun 7, 2023

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Jun 5, 2023

What this PR does / why we need it:
These changes allow users to configure PerStreamRateLimit and PerStreamRateLimitBurst in IngestionLimitSpec.
This is needed for Netobserv to increase the default values in some large clusters scenarios.

Which issue(s) this PR fixes:
None

Special notes for your reviewer:
I have forced using MB unit for consistency with ingestion_rate_mb.
Please let me know if you prefer to keep it flexible to any human readable forms (1MB, 256KB, etc).

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

@jpinsonneau jpinsonneau changed the title NETOBSERV-717 per_stream_rate_limit & burst limits configurable NETOBSERV-717 Expose per_stream_rate_limit & burst limits Jun 5, 2023
@jpinsonneau jpinsonneau changed the title NETOBSERV-717 Expose per_stream_rate_limit & burst limits NETOBSERV-717 Expose per_stream_rate_limit & burst Jun 5, 2023
@periklis periklis changed the title NETOBSERV-717 Expose per_stream_rate_limit & burst operator: Expose per_stream_rate_limit & burst Jun 5, 2023
@jpinsonneau jpinsonneau marked this pull request as ready for review June 5, 2023 13:21
@jpinsonneau jpinsonneau requested a review from a team as a code owner June 5, 2023 13:22
@MasslessParticle
Copy link
Contributor

If you're exposing these, it's probably also worthwhile to expose stream sharding. This feature works in conjunction with the per_stream_rate_limit so that Loki should never hit the limit. For Example, you'd configure your loki with a psrl of 5mb and a desired_rate of 1mb and only bump into the limit in extreme cases.

Copy link
Contributor

@JStickler JStickler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[docs team] LGTM, one question...

operator/docs/lokistack/sop.md Outdated Show resolved Hide resolved
@jpinsonneau
Copy link
Contributor Author

If you're exposing these, it's probably also worthwhile to expose stream sharding. This feature works in conjunction with the per_stream_rate_limit so that Loki should never hit the limit. For Example, you'd configure your loki with a psrl of 5mb and a desired_rate of 1mb and only bump into the limit in extreme cases.

Thanks for sharing that ! We need to try this for our use case and see the impact on performances before coming back with a followup.
cc @periklis

Copy link
Collaborator

@periklis periklis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall good to me. I am just not sure if we should introduce some validation to ensure people understand that their inputs are in MB and not MiBi or bytes or what ever. It is a minor thing but let's give it a spin discussing this. What's your opinion?

@periklis
Copy link
Collaborator

periklis commented Jun 6, 2023

If you're exposing these, it's probably also worthwhile to expose stream sharding. This feature works in conjunction with the per_stream_rate_limit so that Loki should never hit the limit. For Example, you'd configure your loki with a psrl of 5mb and a desired_rate of 1mb and only bump into the limit in extreme cases.

@MasslessParticle This is something we aim for but I am still not 100% sure how stable this feature is. I have been tracking the PRs since last year on this topic and noticed that pieces when disjointed into the code base and not based on LID. Don't get me wrong it is an excellent idea but the implementation went into Loki a bit experimental. It started with the distributor/ingester pair and then PR by PR trickling down into queriers, etc. In addition I am not 100% if we did not overlook any impact on the ruler side. If you have a full picture on how far we are with this feature I am glad to be wrong :)

@jpinsonneau Yes let's leave stream sharding for a follow-up PR.

@jpinsonneau
Copy link
Contributor Author

Looks overall good to me. I am just not sure if we should introduce some validation to ensure people understand that their inputs are in MB and not MiBi or bytes or what ever. It is a minor thing but let's give it a spin discussing this. What's your opinion?

For me, the best would be to move everything to flagext.ByteSize and improve it.
Currently IngestionRateMB & IngestionBurstSizeMB are in MB using float64
whereas MaxLineSize, PerStreamRateLimit, PerStreamRateLimitBurst, MaxQueryBytesRead & MaxQuerierBytesRead are in human readable format using flagext.ByteSize.

@periklis
Copy link
Collaborator

periklis commented Jun 6, 2023

Looks overall good to me. I am just not sure if we should introduce some validation to ensure people understand that their inputs are in MB and not MiBi or bytes or what ever. It is a minor thing but let's give it a spin discussing this. What's your opinion?

For me, the best would be to move everything to flagext.ByteSize and improve it. Currently IngestionRateMB & IngestionBurstSizeMB are in MB using float64 whereas MaxLineSize, PerStreamRateLimit, PerStreamRateLimitBurst, MaxQueryBytesRead & MaxQuerierBytesRead are in human readable format using flagext.ByteSize.

Yeah, IngestionRateMB and IngeationBurstSizeMB are still around and remind me of the early Loki days where flagext.ByteSize was not used. They will say as such for now because our API is already GA. OTH I am not sure if we can use flagext.ByteSize in kubernetes CRDs. Would you mind to research this a bit?

@MasslessParticle
Copy link
Contributor

It started with the distributor/ingester pair and then PR by PR trickling down into queriers, etc. In addition I am not 100% if we did not overlook any impact on the ruler side.

@periklis The implementation for this only exists in the distributor/ingester. afaik, there is nothing to do with stream sharding in the queriers. We're running it in all our production environments and consider it to be stable

We have overloaded the word "sharding" too much so maybe something ran together?

@periklis
Copy link
Collaborator

periklis commented Jun 6, 2023

It started with the distributor/ingester pair and then PR by PR trickling down into queriers, etc. In addition I am not 100% if we did not overlook any impact on the ruler side.

@periklis The implementation for this only exists in the distributor/ingester. afaik, there is nothing to do with stream sharding in the queriers. We're running it in all our production environments and consider it to be stable

We have overloaded the word "sharding" too much so maybe something ran together?

IIRC we had some PR melting the stream shards in the querier together? Maybe I am wrong, but didn't we put individual shard into the chunks and thus need to merge them together on query time?

@MasslessParticle
Copy link
Contributor

There isn't anything explicitly merging sharded streams together. That happens automatically as part of the query process.

If we have the sharded stream:

{foo="bar", __stream_shard__=1}
{foo="bar", __stream_shard__=2}

Then both are merged at query time when querying for {foo="bar"} like any other streams

@periklis
Copy link
Collaborator

periklis commented Jun 6, 2023

There isn't anything explicitly merging sharded streams together. That happens automatically as part of the query process.

If we have the sharded stream:

{foo="bar", __stream_shard__=1}
{foo="bar", __stream_shard__=2}

Then both are merged at query time when querying for {foo="bar"} like any other streams

Ok! I was under the wrong impression that __stream_shard__ is not visible to the user and eliminated at query time.

On a further note, since you run this on production for quite some time. How painful was the series churn after enabling this? Or was this negligible?

@MasslessParticle
Copy link
Contributor

We had some issues with series churn and over-sharding. We addressed those and now we're not seeing any issues from series churn.

@periklis
Copy link
Collaborator

periklis commented Jun 6, 2023

We had some issues with series churn and over-sharding. We addressed those and now we're not seeing any issues from series churn.

You don't happen to have pointers to the PR addressing these issues?

@MasslessParticle
Copy link
Contributor

It was this one. #9414

We saw some issues with resource consumption due to over sharding. Additionally, we saw the first few payloads cause a ton of stream churn. This PR addressed both by looking at the stream rate over a larger period of time.

It has the interesting consequence that the per_stream_rate_limit becomes a "max_payload_size" limit in the case where the stream hasn't been seen before.

@periklis periklis merged commit e680239 into grafana:main Jun 7, 2023
periklis added a commit to periklis/loki that referenced this pull request Jun 30, 2023
Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>
periklis added a commit to openshift/loki that referenced this pull request Jun 30, 2023
periklis added a commit to periklis/loki that referenced this pull request Jun 30, 2023
Co-authored-by: Periklis Tsirakidis <periklis@redhat.com>
periklis added a commit to openshift/loki that referenced this pull request Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants