Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new stream count limiter #13006

Merged
merged 5 commits into from
May 23, 2024

Conversation

vlad-diachenko
Copy link
Contributor

@vlad-diachenko vlad-diachenko commented May 21, 2024

What this PR does / why we need it:
implemented a new stream count limiter that uses the values from ownedStreamService if the feature flag is enabled. This functionality is needed to take into account only the owned streams while asserting the limit.

Special notes for your reviewer:

  • until the feature flag is enabled, these changes should not affect the users
  • previously we had a RWLock near the value in the limiter, and that lock was used to disable all the limit while recovering the WAL. However, this lock is not needed because when we replay the WAL we do not assert if we reached the stream count limit: https://github.com/grafana/loki/blob/main/pkg/ingester/instance.go#L288-L290 (record is nil while replaying the WAL). So, I removed this RWLock to simplify the implementation.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

…dStreamService if the feature flag is enabled.

Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
@vlad-diachenko vlad-diachenko requested a review from a team as a code owner May 21, 2024 14:24
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label May 21, 2024
@vlad-diachenko vlad-diachenko changed the title new stream count limiter feat: new stream count limiter May 21, 2024
Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
memoryStreams.WithLabelValues(i.instanceID).Inc()
memoryStreamsLabelsBytes.Add(float64(len(s.labels.String())))
i.streamsCreatedTotal.Inc()
i.addTailersToNewStream(s)
streamsCountStats.Add(1)

i.ownedStreamsSvc.incOwnedStreamCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go in the LogStreamCreation conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we need to do it all the time... btw, even if UseOwnedStreamCount is set to false, because this option is in runtime config and can be turned on at any moment, the service should always know the count of the streams.

}

func (l *Limiter) convertGlobalToLocalLimit(globalLimit int) int {
if globalLimit == 0 {
return 0
}

// todo: change to healthyInstancesInZoneCount() once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@JordanRushing JordanRushing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 May I suggest adding a benchmark for using owned streams vs not?

Co-authored-by: JordanRushing <rushing.jordan@gmail.com>
@vlad-diachenko
Copy link
Contributor Author

May I suggest adding a benchmark for using owned streams vs not?

Stream creation is not executed on every push, only when we create a new stream. Also, the implementation just adds another source of values, which will be just atomic ints. So, the new changes are unlikely to affect the performance, and knowing that this validation is not executed on every push request, makes me think that we could skip it;)))
WDYT?

Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
@vlad-diachenko vlad-diachenko merged commit 1111595 into main May 23, 2024
60 checks passed
@vlad-diachenko vlad-diachenko deleted the vlad.diachenko/owned-stream-count-limiter branch May 23, 2024 10:18
trevorwhitney added a commit that referenced this pull request May 24, 2024
commit 0bfd0ad
Merge: 68aa188 efdae3d
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 17:04:32 2024 -0600

    Merge branch 'main' into sample-count-and-bytes

commit 68aa188
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 17:03:32 2024 -0600

    feat: guard aggregation behavior behind a feature flag

commit efdae3d
Author: hayden <haydenfuss@gmail.com>
Date:   Thu May 23 16:25:50 2024 -0400

    feat(helm): Support for PVC Annotations for Non-Distributed Modes (#12023)

    Signed-off-by: hfuss <hayden.fuss@kaleido.io>
    Co-authored-by: J Stickler <julie.stickler@grafana.com>
    Co-authored-by: Trevor Whitney <trevorjwhitney@gmail.com>

commit f0d6a92
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 14:03:32 2024 -0600

    feat: reject filter queries to /patterns endpoint

commit dc620e7
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Wed May 8 14:08:44 2024 -0600

    feat: collect and serve pre-agg bytes and count

    * pre-aggregate bytes and count per stream in the pattern ingester
    * serve bytes_over_time and count_over_time queries from the patterns
      endpoint

commit 97212ea
Author: Jay Clifford <45856600+Jayclifford345@users.noreply.github.com>
Date:   Thu May 23 12:10:48 2024 -0400

    feat: Added Interactive Sandbox to Quickstart tutorial (#12701)

commit 1111595
Author: Vladyslav Diachenko <82767850+vlad-diachenko@users.noreply.github.com>
Date:   Thu May 23 13:18:16 2024 +0300

    feat: new stream count limiter (#13006)

    Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
    Co-authored-by: JordanRushing <rushing.jordan@gmail.com>

commit 987e551
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Thu May 23 02:15:52 2024 +0200

    fix: allow cluster label override in bloom dashboards (#13012)

    Signed-off-by: QuentinBisson <quentin@giantswarm.io>

commit d3c9cec
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Thu May 23 01:59:28 2024 +0200

    fix: upgrade old plugin for the loki-operational dashboard. (#13016)

    Signed-off-by: QuentinBisson <quentin@giantswarm.io>

commit 8d9fb68
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Wed May 22 22:00:08 2024 +0200

    fix: remove unneccessary disk panels for ssd read path (#13014)

    Signed-off-by: QuentinBisson <quentin@giantswarm.io>

commit 1948899
Author: Quentin Bisson <quentin@giantswarm.io>
Date:   Wed May 22 15:16:29 2024 +0200

    fix: Mixins - Add missing log datasource on loki-deletion (#13011)

commit efd8f5d
Author: Salva Corts <salva.corts@grafana.com>
Date:   Wed May 22 10:43:32 2024 +0200

    refactor(blooms): Add queue to bloom planner and enqueue tasks (#13005)

commit d6f29fc
Author: Vitor Gomes <41302394+vitoorgomes@users.noreply.github.com>
Date:   Wed May 22 04:34:42 2024 +1200

    docs: update otlp ingestion with correct endpoint and add endpoint to reference api docs (#12996)

commit 3195036
Author: Salva Corts <salva.corts@grafana.com>
Date:   Tue May 21 13:12:24 2024 +0200

    refactor(bloom planner): Compute gaps and build tasks from metas and TSDBs  (#12994)

commit 7a3338e
Author: Jonathan Davies <jpds@protonmail.com>
Date:   Tue May 21 10:41:42 2024 +0100

    feat: loki/main.go: Log which config file path is used on startup (#12985)

    Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>

commit bf8a278
Author: Ashwanth <iamashwanth@gmail.com>
Date:   Tue May 21 12:56:07 2024 +0530

    chore: remove duplicate imports (#13001)

commit 1f5291a
Author: Ashwanth <iamashwanth@gmail.com>
Date:   Tue May 21 12:38:02 2024 +0530

    fix(indexstats): do not collect stats from "IndexStats" lookups for other query types (#12978)

commit 8442dca
Author: Jay Clifford <45856600+Jayclifford345@users.noreply.github.com>
Date:   Mon May 20 17:52:17 2024 -0400

    feat: Added getting started video (#12975)

commit 75ccf21
Author: Christian Haudum <christian.haudum@gmail.com>
Date:   Mon May 20 17:14:40 2024 +0200

    feat(blooms): Separate page buffer pools for series pages and bloom pages (#12992)

    Series pages are much smaller than bloom pages and therefore can make use of a separate buffer pool with different buckets.

    The second commit fixes a possible panic.

    Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

commit 94d610e
Author: Yarden Shoham <git@yardenshoham.com>
Date:   Mon May 20 18:05:50 2024 +0300

    docs: Fix broken link in the release notes (#12990)

    Co-authored-by: J Stickler <julie.stickler@grafana.com>

commit 31a1314
Author: choeffer <christian.hoeffer@maibornwolff.de>
Date:   Mon May 20 16:39:25 2024 +0200

    docs(install-monolithic): add quotation marks (#12982)

    Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com>

commit 8978ecf
Author: Salva Corts <salva.corts@grafana.com>
Date:   Mon May 20 12:36:22 2024 +0200

    feat: Boilerplate for new bloom build planner and worker components. (#12989)
JordanRushing added a commit to JordanRushing/loki that referenced this pull request Jul 15, 2024
Ingester stream limits now take into account "owned streams" and periodically
update when the Ingester ring is changed. Non-owned streams are now also
flushed when this update takes place. The stream limit calculation has also been updated for improved
accuracy in multi-zone ingester deployments.

Relevant PRs:
- grafana#13006
- grafana#13030
- grafana#13232
- grafana#13103
- grafana#13231
- grafana#13254
- grafana#13314
- grafana#13321
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants