Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query-scheduler: fix query distribution in SSD mode #9471

Merged

Conversation

sandeepsukhani
Copy link
Contributor

@sandeepsukhani sandeepsukhani commented May 17, 2023

What this PR does / why we need it:
When we run the query-scheduler in ring mode, queriers and query-frontend discover the available query-scheduler instances using the ring. However, we have a problem when query-schedulers are not running in the same process as queriers and query-frontend since we try to get the ring client interface from the scheduler instance.

This causes queries not to be spread across all the available queriers when running in SSD mode because we point querier workers to query frontend when there is no ring client and scheduler address configured.

I have fixed this issue by adding a new hidden target to initialize the ring client in reader/member mode based on which service is initializing it. reader mode will be used by queriers and query-frontend for discovering query-scheduler instances from the ring. member mode will be used by query-schedulers for registering themselves in the ring.

I have also made a couple of changes not directly related to the issue but it fixes some problems:

Which issue(s) this PR fixes:
Fixes #9195

Special notes for your reviewer:
I have copied most of the ring manager code from indexgateway RingManager. I will open a follow-up PR to refactor and share the code between the two since most of the code is the same.

Checklist

  • Tests updated
  • CHANGELOG.md updated

@sandeepsukhani sandeepsukhani requested a review from a team as a code owner May 17, 2023 11:42
@sandeepsukhani sandeepsukhani changed the title query-scheduler: fix query distribution for SSD mode query-scheduler: fix query distribution in SSD mode May 18, 2023
Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for fixing this! I think we're just missing one small check for legacy read mode, and then LGTM!

)

func (rm *RingManager) OnRingInstanceRegister(_ *ring.BasicLifecycler, ringDesc ring.Desc, instanceExists bool, instanceID string, instanceDesc ring.InstanceDesc) (ring.InstanceState, ring.Tokens) {
// When we initialize the index gateway instance in the ring we want to start from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should say scheduler?

t.Cfg.QueryScheduler.SchedulerRing.ListenPort = t.Cfg.Server.GRPCListenPort

managerMode := scheduler.RingManagerModeReader
if t.Cfg.isModuleEnabled(QueryScheduler) || t.Cfg.isModuleEnabled(Backend) || t.Cfg.isModuleEnabled(All) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are missing a check for legacy read mode here. in legacy read mode (before backend was introduced), the scheduler was part of the read target, so we should be a member when t.Cfg.LegacyReadTarget && t.Cfg.isModuleEnabled(Read)


// instantiate ring for both mode modes.
ringCfg := rm.cfg.SchedulerRing.ToRingConfig(ringReplicationFactor)
rm.Ring, err = ring.NewWithStoreClientAndStrategy(ringCfg, ringNameForServer, ringKey, ringStore, ring.NewIgnoreUnhealthyInstancesReplicationStrategy(), prometheus.WrapRegistererWithPrefix("cortex_", registerer), rm.log)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we break this long function call up over multiple lines?

rm.subservicesWatcher = services.NewFailureWatcher()
rm.subservicesWatcher.WatchManager(rm.subservices)

rm.Service = services.NewIdleService(func(ctx context.Context) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the lack of understanding, but I'm not quite following how this IdleService works, and how it is able to read the ring without adding itself to it? Is it because it only has a Ring service and not a RingLifecycler service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running RingManager in reader mode, we only use the ring client for reading the ring. We do not have to register the service in the ring to be able to read the ring. RingLifecycler is required only when we want to register tokens in the ring.

sandeepsukhani added a commit that referenced this pull request May 22, 2023
…-scheduler replicas (#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes #9195
rgarcia6520 pushed a commit to rgarcia6520/grafana-loki that referenced this pull request May 26, 2023
…-scheduler replicas (grafana#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](grafana#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes grafana#9195
Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@grafanabot
Copy link
Collaborator

Hello @sandeepsukhani!
Backport pull requests need to be either:

  • Pull requests which address bugs,
  • Urgent fixes which need product approval, in order to get merged,
  • Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If the pull request modifies CI behaviour, please add the type/ci label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

@sandeepsukhani sandeepsukhani added the type/bug Somehing is not working as expected label Jun 6, 2023
@sandeepsukhani sandeepsukhani merged commit 0a5e149 into grafana:main Jun 6, 2023
@grafanabot
Copy link
Collaborator

The backport to release-2.8.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-9471-to-release-2.8.x origin/release-2.8.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 0a5e149ea540d9b034ff7023ca6f95ce09805080
# Push it to GitHub
git push --set-upstream origin backport-9471-to-release-2.8.x
git switch main
# Remove the local backport branch
git branch -D backport-9471-to-release-2.8.x

Then, create a pull request where the base branch is release-2.8.x and the compare/head branch is backport-9471-to-release-2.8.x.

sandeepsukhani added a commit to sandeepsukhani/loki that referenced this pull request Jun 6, 2023
**What this PR does / why we need it**:
When we run the `query-scheduler` in `ring` mode, `queriers` and
`query-frontend` discover the available `query-scheduler` instances
using the ring. However, we have a problem when `query-schedulers` are
not running in the same process as queriers and query-frontend since [we
try to get the ring client interface from the scheduler
instance](https://github.com/grafana/loki/blob/abd6131bba18db7f3575241c5e6dc4eed879fbc0/pkg/loki/modules.go#L358).

This causes queries not to be spread across all the available queriers
when running in SSD mode because [we point querier workers to query
frontend when there is no ring client and scheduler address
configured](https://github.com/grafana/loki/blob/b05f4fced305800b32641ae84e3bed5f1794fa7d/pkg/querier/worker_service.go#L115).

I have fixed this issue by adding a new hidden target to initialize the
ring client in `reader`/`member` mode based on which service is
initializing it. `reader` mode will be used by `queriers` and
`query-frontend` for discovering `query-scheduler` instances from the
ring. `member` mode will be used by `query-schedulers` for registering
themselves in the ring.

I have also made a couple of changes not directly related to the issue
but it fixes some problems:
* [reset metric registry for each integration
test](grafana@18c4fe5)
- Previously we were reusing the same registry for all the tests and
just [ignored the attempts to register same
metrics](https://github.com/grafana/loki/blob/01f0ded7fcb57e3a7b26ffc1e8e3abf04a403825/integration/cluster/cluster.go#L113).
This causes the registry to have metrics registered only from the first
test so any updates from subsequent tests won't reflect in the metrics.
metrics was the only reliable way for me to verify that
`query-schedulers` were connected to `queriers` and `query-frontend`
when running in ring mode in the integration test that I added to test
my changes. This should also help with other tests where earlier it was
hard to reliably check the metrics.
* [load config from cli as well before applying dynamic
config](grafana@f9e2448)
- Previously we were applying dynamic config considering just the config
from config file. This results in unexpected config changes, for
example, [this config
change](https://github.com/grafana/loki/blob/4148dd2c51cb827ec3889298508b95ec7731e7fd/integration/loki_micro_services_test.go#L66)
was getting ignored and [dynamic config tuning was unexpectedly turning
on ring
mode](https://github.com/grafana/loki/blob/52cd0a39b8266564352c61ab9b845ab597008770/pkg/loki/config_wrapper.go#L94)
in the config. It is better to do any config tuning based on both file
and cli args configs.

**Which issue(s) this PR fixes**:
Fixes grafana#9195

(cherry picked from commit 0a5e149)
MasslessParticle pushed a commit that referenced this pull request Jun 16, 2023
mraboosk pushed a commit to mraboosk/loki that referenced this pull request Oct 7, 2024
…-scheduler replicas (grafana#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](grafana#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes grafana#9195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loki queries not split across queriers
3 participants