Ruler querier service option #5081

gonzalez · 2023-05-25T16:43:26Z

What this PR does

This PR provides the option to deploy a read path dedicated to the ruler.
This allows you to scale user/dashboards queries independently from the ruler usage.

Logiraptor

Overall this looks pretty good. I should have some time tomorrow to give it a try locally. In the meantime I've left a few comments.

operations/helm/charts/mimir-distributed/CHANGELOG.md

operations/helm/charts/mimir-distributed/values.yaml

operations/helm/charts/mimir-distributed/templates/ruler/ruler-dep.yaml

operations/helm/charts/mimir-distributed/values.yaml

Logiraptor · 2023-05-25T21:08:24Z

operations/helm/charts/mimir-distributed/values.yaml

+ruler_querier_service:
+  enabled: false


Suggested change

ruler_querier_service:

enabled: false

remote_rule_evaluation:

enabled: false

I think this name better aligns with the terminology we use elsewhere, WDYT?

Obviously, we'd need to update all the references in this PR as well.

Is it possible to wire this in when the ruler-querier replicas are more than 0?

With this feature we would have 2 evaluation modes but with 2 implementations of remote_rule_evaluation (shared vs dedicated ruler query paths) We should probably distinguish those in the name ? Unless we simply have remote evaluation always deploy a dedicated query path ? @Logiraptor @dimitarvdimitrov

In the changes you have only the dedicated path is used when remote_rule_evaluation.enabled: true

mimir/operations/helm/charts/mimir-distributed/values.yaml

Lines 337 to 340 in 8de84b0

{{- if .Values.ruler_querier_service.enabled }}

query_frontend:

address: dns:///{{ template "mimir.fullname" . }}-ruler-query-frontend.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}

{{- end }}

I agree that it should be an decision the user makes - to use shared vs dedicated. In this case the conditional linked above should work for both cases. We should also document on all of remote_rule_evaluation, ruler_query_frontend, ruler_querier, and ruler_query_scheduler how they interact and how to configure them.

Is it possible to wire this in when the ruler-querier replicas are more than 0?

@dimitarvdimitrov the default values for ruler-querier will always have the replicas > 0

But the operator can choose to set the dedicated rulers to 0 replicas and use the shared query path. This will give them e.g. query sharding and query scheduling, which the ruler itself doesn't support; all without having to deploy more resources.

I suggest to make it possible to deploy in both ways - dedicated and shared. Dedicated when the dedicated query-frontend replicas are more than 0 and shared when they are 0.

If you agree, then I'd also suggest to make the dedicated ruler path deploy with 0 replicas. The reason is that we aim to have the chart more compatible with smaller deployments, but still enable larger deployments. A dedicated ruler path is a somewhat advanced feature and many regular operators might not need. Once we do that it would be great to document how to setup the remote ruler evaluation both with dedicated query path and a shared query path.

thoughts on this @gonzalez?

dimitarvdimitrov

Thanks for the contribution. I reviewed it on a high level and haven't looked at the manifests in detail yet.

I think we should also add docs for using this. We have a page for running the chart in production. This is a good candidate for providing a bit more isolation between rules and the user queries for more sizable clusters. This is the doc I am referring to https://github.com/grafana/mimir/blob/335c64a12f3102fc0399fd9666f7d238ab857168/docs/sources/helm-charts/mimir-distributed/run-production-environment-with-helm/_index.md

operations/helm/charts/mimir-distributed/values.yaml

dimitarvdimitrov · 2023-05-26T09:59:54Z

...s/helm/charts/mimir-distributed/templates/ruler-query-frontend/ruler-query-frontend-dep.yaml

@@ -0,0 +1,131 @@
+{{- if .Values.ruler_querier_service.enabled -}}
+apiVersion: apps/v1
+kind: Deployment


We're having issues with the duplicated nginx deployment because its manifests are copied. WDYT about extracting this as a separate named template and invoking it for the regular querier and the ruler-querier (same for query-frontend...)?

I don't understand - where are you seeing the duplicate nginx deployments - thanks.

I meant this nginx deployment

mimir/operations/helm/charts/mimir-distributed/templates/nginx/nginx-dep.yaml

Line 52 in 18f1dd0

image: {{ .Values.nginx.image.registry }}/{{ .Values.nginx.image.repository }}:{{ .Values.nginx.image.tag }}

and this one

mimir/operations/helm/charts/mimir-distributed/templates/gateway/gateway-dep.yaml

Line 78 in 18f1dd0

image: {{ .nginx.image.registry }}/{{ .nginx.image.repository }}:{{ .nginx.image.tag }}

i think it will make it easier to make changes to these deployments in the future. Such as changing a default value in one deployment and making sure it's not lost on the other.

It the two templates are distinct, then when someone opens a PR to update one, there is nothing immediately telling neither the reviewers nor the author that there's another querier deployment that needs the same change. And at that stage there's drift between the two. I think the same applies to the query-frontend and query-scheduler.

This increases the effort for this PR, but I believe it makes the chart slightly more maintainable in the long run

operations/helm/charts/mimir-distributed/CHANGELOG.md

dimitarvdimitrov · 2023-05-26T10:01:46Z

operations/helm/charts/mimir-distributed/values.yaml

+ruler_querier_service:
+  enabled: false


Is it possible to wire this in when the ruler-querier replicas are more than 0?

operations/helm/charts/mimir-distributed/values.yaml

Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

operations/helm/charts/mimir-distributed/templates/_helpers.tpl

operations/helm/charts/mimir-distributed/values.yaml

...helm/charts/mimir-distributed/templates/ruler-query-scheduler/ruler-query-scheduler-dep.yaml

Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

…o use the headless service

lasermoth · 2024-04-02T02:01:38Z

I'm looking for this functionality currently, it doesn't look like this PR has been touched in a year.
Are we able to progress it?

dimitarvdimitrov · 2024-04-08T09:40:38Z

If @gonzalez doesn't have the bandwidth to take care of this, I think anyone else should be able to complete this building on top of his work.

I haven't looked at the PR in a while, but IIRC the comments I left were the only major blockers to merging this.

It looks like there are a lot of conflicts, but they seem minor to me - the majority of files are just autogenerated manifests; make build-helm-tests after rebasing should resolve them

alex5517 · 2024-04-25T07:31:55Z

@dimitarvdimitrov,

I created a new PR which tries to add this feature: #7964

gonzalez added 5 commits May 19, 2023 09:54

adding ruler query and query-frontend components

fa45afe

disabling ruler-query-frontend caching

8002196

adding ruler config logic and enable option for the service

8e4847e

updating ruler-query-frontend and ruler-query scheduler-address

5d86e33

updated scheduler and frontend address

0d57e52

gonzalez requested a review from a team as a code owner May 25, 2023 16:43

gonzalez added 8 commits May 25, 2023 12:43

updating changelog

853677d

updating values file - default to disabled

251212a

make build-helm-tests changes

66b09d2

updating max_chunk_pool_bytes to match jsonnet

5ef684e

updating test values to match jssonnet updated values

8101754

updating make build-helm-tests

2358f30

fixing trailing spaces

08f9990

make build-helm-tests one more time

84b85c6

Logiraptor reviewed May 25, 2023

View reviewed changes

dimitarvdimitrov reviewed May 26, 2023

View reviewed changes

gonzalez and others added 6 commits May 26, 2023 08:48

Update operations/helm/charts/mimir-distributed/CHANGELOG.md

7fa86ac

Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

updated to feature in changelog

46c55a1

removing deprecated max_chunk_pool_bytes option

9526b45

make build-helm-tests update

c2b9c77

cleaning up commented ifs

998f9ba

updated test

8de84b0

Logiraptor requested review from dimitarvdimitrov and Logiraptor May 31, 2023 13:54

dreamlibrarian reviewed May 31, 2023

View reviewed changes

operations/helm/charts/mimir-distributed/templates/_helpers.tpl Show resolved Hide resolved

adding ruler-query-scheduler templates

e0f9d56

Logiraptor reviewed Jun 1, 2023

View reviewed changes

operations/helm/charts/mimir-distributed/values.yaml Outdated Show resolved Hide resolved

Logiraptor reviewed Jun 1, 2023

View reviewed changes

...helm/charts/mimir-distributed/templates/ruler-query-scheduler/ruler-query-scheduler-dep.yaml Show resolved Hide resolved

gonzalez and others added 2 commits June 1, 2023 08:56

Update operations/helm/charts/mimir-distributed/values.yaml

9b5a85c

Co-authored-by: Patrick Oyarzun <patrick.oyarzun@grafana.com>

updating querier.scheduler-address query-frontend.scheduler-address t…

c04caa8

…o use the headless service

alex5517 mentioned this pull request Apr 25, 2024

Helm: Add support for dedicated ruler query path #7964

Merged

4 tasks

dimitarvdimitrov closed this in #7964 May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruler querier service option #5081

Ruler querier service option #5081

gonzalez commented May 25, 2023

Logiraptor left a comment

Logiraptor May 25, 2023

dimitarvdimitrov May 26, 2023

gonzalez May 26, 2023

dimitarvdimitrov May 31, 2023 •

edited

Loading

gonzalez Jun 1, 2023

dimitarvdimitrov Jun 2, 2023

dimitarvdimitrov Jun 14, 2023

dimitarvdimitrov left a comment

dimitarvdimitrov May 26, 2023

gonzalez May 26, 2023

dimitarvdimitrov May 31, 2023

dimitarvdimitrov Jun 14, 2023

dimitarvdimitrov May 26, 2023

lasermoth commented Apr 2, 2024

dimitarvdimitrov commented Apr 8, 2024

alex5517 commented Apr 25, 2024

	{{- if .Values.ruler_querier_service.enabled }}
	query_frontend:
	address: dns:///{{ template "mimir.fullname" . }}-ruler-query-frontend.{{ .Release.Namespace }}.svc:{{ include "mimir.serverHttpListenPort" . }}
	{{- end }}

Ruler querier service option #5081

Ruler querier service option #5081

Conversation

gonzalez commented May 25, 2023

What this PR does

Logiraptor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitarvdimitrov May 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lasermoth commented Apr 2, 2024

dimitarvdimitrov commented Apr 8, 2024

alex5517 commented Apr 25, 2024

dimitarvdimitrov May 31, 2023 •

edited

Loading