query-scheduler: fix query distribution in SSD mode #9471

sandeepsukhani · 2023-05-17T11:42:13Z

What this PR does / why we need it:
When we run the query-scheduler in ring mode, queriers and query-frontend discover the available query-scheduler instances using the ring. However, we have a problem when query-schedulers are not running in the same process as queriers and query-frontend since we try to get the ring client interface from the scheduler instance.

This causes queries not to be spread across all the available queriers when running in SSD mode because we point querier workers to query frontend when there is no ring client and scheduler address configured.

I have fixed this issue by adding a new hidden target to initialize the ring client in reader/member mode based on which service is initializing it. reader mode will be used by queriers and query-frontend for discovering query-scheduler instances from the ring. member mode will be used by query-schedulers for registering themselves in the ring.

I have also made a couple of changes not directly related to the issue but it fixes some problems:

reset metric registry for each integration test - Previously we were reusing the same registry for all the tests and just ignored the attempts to register same metrics. This causes the registry to have metrics registered only from the first test so any updates from subsequent tests won't reflect in the metrics. metrics was the only reliable way for me to verify that query-schedulers were connected to queriers and query-frontend when running in ring mode in the integration test that I added to test my changes. This should also help with other tests where earlier it was hard to reliably check the metrics.
load config from cli as well before applying dynamic config - Previously we were applying dynamic config considering just the config from config file. This results in unexpected config changes, for example, this config change was getting ignored and dynamic config tuning was unexpectedly turning on ring mode in the config. It is better to do any config tuning based on both file and cli args configs.

Which issue(s) this PR fixes:
Fixes #9195

Special notes for your reviewer:
I have copied most of the ring manager code from indexgateway RingManager. I will open a follow-up PR to refactor and share the code between the two since most of the code is the same.

Checklist

Tests updated
CHANGELOG.md updated

…ng mode

…istered just for the running test

trevorwhitney

This looks great, thanks for fixing this! I think we're just missing one small check for legacy read mode, and then LGTM!

trevorwhitney · 2023-05-18T17:24:45Z

pkg/scheduler/lifecycle.go

+)
+
+func (rm *RingManager) OnRingInstanceRegister(_ *ring.BasicLifecycler, ringDesc ring.Desc, instanceExists bool, instanceID string, instanceDesc ring.InstanceDesc) (ring.InstanceState, ring.Tokens) {
+	// When we initialize the index gateway instance in the ring we want to start from


I think this comment should say scheduler?

trevorwhitney · 2023-05-18T17:31:39Z

pkg/loki/modules.go

+	t.Cfg.QueryScheduler.SchedulerRing.ListenPort = t.Cfg.Server.GRPCListenPort
+
+	managerMode := scheduler.RingManagerModeReader
+	if t.Cfg.isModuleEnabled(QueryScheduler) || t.Cfg.isModuleEnabled(Backend) || t.Cfg.isModuleEnabled(All) {


we are missing a check for legacy read mode here. in legacy read mode (before backend was introduced), the scheduler was part of the read target, so we should be a member when t.Cfg.LegacyReadTarget && t.Cfg.isModuleEnabled(Read)

trevorwhitney · 2023-05-18T17:37:25Z

pkg/scheduler/ringmanager.go

+
+	// instantiate ring for both mode modes.
+	ringCfg := rm.cfg.SchedulerRing.ToRingConfig(ringReplicationFactor)
+	rm.Ring, err = ring.NewWithStoreClientAndStrategy(ringCfg, ringNameForServer, ringKey, ringStore, ring.NewIgnoreUnhealthyInstancesReplicationStrategy(), prometheus.WrapRegistererWithPrefix("cortex_", registerer), rm.log)


nit: can we break this long function call up over multiple lines?

trevorwhitney · 2023-05-18T17:48:17Z

pkg/scheduler/ringmanager.go

+	rm.subservicesWatcher = services.NewFailureWatcher()
+	rm.subservicesWatcher.WatchManager(rm.subservices)
+
+	rm.Service = services.NewIdleService(func(ctx context.Context) error {


sorry for the lack of understanding, but I'm not quite following how this IdleService works, and how it is able to read the ring without adding itself to it? Is it because it only has a Ring service and not a RingLifecycler service?

When running RingManager in reader mode, we only use the ring client for reading the ring. We do not have to register the service in the ring to be able to read the ring. RingLifecycler is required only when we want to register tokens in the ring.

…-scheduler replicas (#9477) **What this PR does / why we need it**: Currently, we have a bug in our code when running Loki in SSD mode and using the ring for query-scheduler discovery. It causes queries to not be distributed to all the available read pods. I have explained the issue in detail in [the PR which fixes the code](#9471). Since this bug causes a major query performance impact and code release might take time, in this PR we are doing a new helm release which fixes the issue by using the k8s service for discovering `query-scheduler` replicas. **Which issue(s) this PR fixes**: Fixes #9195

…-scheduler replicas (grafana#9477) **What this PR does / why we need it**: Currently, we have a bug in our code when running Loki in SSD mode and using the ring for query-scheduler discovery. It causes queries to not be distributed to all the available read pods. I have explained the issue in detail in [the PR which fixes the code](grafana#9471). Since this bug causes a major query performance impact and code release might take time, in this PR we are doing a new helm release which fixes the issue by using the k8s service for discovering `query-scheduler` replicas. **Which issue(s) this PR fixes**: Fixes grafana#9195

trevorwhitney

LGTM!

grafanabot · 2023-06-06T06:34:40Z

Hello @sandeepsukhani!
Backport pull requests need to be either:

Pull requests which address bugs,
Urgent fixes which need product approval, in order to get merged,
Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If the pull request modifies CI behaviour, please add the type/ci label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

grafanabot · 2023-06-06T06:37:33Z

The backport to release-2.8.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-9471-to-release-2.8.x origin/release-2.8.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 0a5e149ea540d9b034ff7023ca6f95ce09805080
# Push it to GitHub
git push --set-upstream origin backport-9471-to-release-2.8.x
git switch main
# Remove the local backport branch
git branch -D backport-9471-to-release-2.8.x

Then, create a pull request where the base branch is release-2.8.x and the compare/head branch is backport-9471-to-release-2.8.x.

**What this PR does / why we need it**: When we run the `query-scheduler` in `ring` mode, `queriers` and `query-frontend` discover the available `query-scheduler` instances using the ring. However, we have a problem when `query-schedulers` are not running in the same process as queriers and query-frontend since [we try to get the ring client interface from the scheduler instance](https://github.com/grafana/loki/blob/abd6131bba18db7f3575241c5e6dc4eed879fbc0/pkg/loki/modules.go#L358). This causes queries not to be spread across all the available queriers when running in SSD mode because [we point querier workers to query frontend when there is no ring client and scheduler address configured](https://github.com/grafana/loki/blob/b05f4fced305800b32641ae84e3bed5f1794fa7d/pkg/querier/worker_service.go#L115). I have fixed this issue by adding a new hidden target to initialize the ring client in `reader`/`member` mode based on which service is initializing it. `reader` mode will be used by `queriers` and `query-frontend` for discovering `query-scheduler` instances from the ring. `member` mode will be used by `query-schedulers` for registering themselves in the ring. I have also made a couple of changes not directly related to the issue but it fixes some problems: * [reset metric registry for each integration test](grafana@18c4fe5) - Previously we were reusing the same registry for all the tests and just [ignored the attempts to register same metrics](https://github.com/grafana/loki/blob/01f0ded7fcb57e3a7b26ffc1e8e3abf04a403825/integration/cluster/cluster.go#L113). This causes the registry to have metrics registered only from the first test so any updates from subsequent tests won't reflect in the metrics. metrics was the only reliable way for me to verify that `query-schedulers` were connected to `queriers` and `query-frontend` when running in ring mode in the integration test that I added to test my changes. This should also help with other tests where earlier it was hard to reliably check the metrics. * [load config from cli as well before applying dynamic config](grafana@f9e2448) - Previously we were applying dynamic config considering just the config from config file. This results in unexpected config changes, for example, [this config change](https://github.com/grafana/loki/blob/4148dd2c51cb827ec3889298508b95ec7731e7fd/integration/loki_micro_services_test.go#L66) was getting ignored and [dynamic config tuning was unexpectedly turning on ring mode](https://github.com/grafana/loki/blob/52cd0a39b8266564352c61ab9b845ab597008770/pkg/loki/config_wrapper.go#L94) in the config. It is better to do any config tuning based on both file and cli args configs. **Which issue(s) this PR fixes**: Fixes grafana#9195 (cherry picked from commit 0a5e149)

(cherry-picked from commit 0a5e149 / #9471)

…-scheduler replicas (grafana#9477) **What this PR does / why we need it**: Currently, we have a bug in our code when running Loki in SSD mode and using the ring for query-scheduler discovery. It causes queries to not be distributed to all the available read pods. I have explained the issue in detail in [the PR which fixes the code](grafana#9471). Since this bug causes a major query performance impact and code release might take time, in this PR we are doing a new helm release which fixes the issue by using the k8s service for discovering `query-scheduler` replicas. **Which issue(s) this PR fixes**: Fixes grafana#9195

sandeepsukhani added 3 commits May 17, 2023 16:26

fix issue in distributing queries when running query schedulers in ri…

4148dd2

…ng mode

reset metric registry when creating a new cluster to have metrics reg…

18c4fe5

…istered just for the running test

load config from cli as well before applying dynamic config

f9e2448

sandeepsukhani requested a review from a team as a code owner May 17, 2023 11:42

pull-request-size bot added the size/XL label May 17, 2023

lint

d3ee182

sandeepsukhani mentioned this pull request May 18, 2023

helm: release a new helm charts to use k8s service for discover query-scheduler replicas #9477

Merged

sandeepsukhani changed the title ~~query-scheduler: fix query distribution for SSD mode~~ query-scheduler: fix query distribution in SSD mode May 18, 2023

update changelog

731b191

trevorwhitney requested changes May 18, 2023

View reviewed changes

changes suggested from PR review

4a81823

sandeepsukhani requested a review from trevorwhitney May 29, 2023 05:40

trevorwhitney approved these changes Jun 1, 2023

View reviewed changes

Merge branch 'main' into query-scheduler-ring-mode-fix

0d62fe4

sandeepsukhani added the backport release-2.8.x label Jun 6, 2023

grafanabot added the missing-labels label Jun 6, 2023

sandeepsukhani added the type/bug Somehing is not working as expected label Jun 6, 2023

grafanabot removed the missing-labels label Jun 6, 2023

sandeepsukhani merged commit 0a5e149 into grafana:main Jun 6, 2023

grafanabot added the backport-failed label Jun 6, 2023

sandeepsukhani mentioned this pull request Jun 6, 2023

query-scheduler: fix query distribution in SSD mode (#9471) #9637

Merged

MasslessParticle pushed a commit that referenced this pull request Jun 16, 2023

query-scheduler: fix query distribution in SSD mode (#9471) (#9637)

6416631

(cherry-picked from commit 0a5e149 / #9471)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query-scheduler: fix query distribution in SSD mode #9471

query-scheduler: fix query distribution in SSD mode #9471

sandeepsukhani commented May 17, 2023 •

edited

Loading

trevorwhitney left a comment

trevorwhitney May 18, 2023

trevorwhitney May 18, 2023

trevorwhitney May 18, 2023

trevorwhitney May 18, 2023

sandeepsukhani May 22, 2023

trevorwhitney left a comment

grafanabot commented Jun 6, 2023

grafanabot commented Jun 6, 2023

query-scheduler: fix query distribution in SSD mode #9471

query-scheduler: fix query distribution in SSD mode #9471

Conversation

sandeepsukhani commented May 17, 2023 • edited Loading

trevorwhitney left a comment

Choose a reason for hiding this comment

trevorwhitney May 18, 2023

Choose a reason for hiding this comment

trevorwhitney May 18, 2023

Choose a reason for hiding this comment

trevorwhitney May 18, 2023

Choose a reason for hiding this comment

trevorwhitney May 18, 2023

Choose a reason for hiding this comment

sandeepsukhani May 22, 2023

Choose a reason for hiding this comment

trevorwhitney left a comment

Choose a reason for hiding this comment

grafanabot commented Jun 6, 2023

grafanabot commented Jun 6, 2023

sandeepsukhani commented May 17, 2023 •

edited

Loading