-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm): Adding KEDA autoscaling support #7282
Conversation
b9fd5d2
to
ce04869
Compare
@dimitarvdimitrov I have opened this PR and took #4687 as a base. I have ported the changes from your PR #6971 into here, as you have mentioned in this comment. I know that there is still documentation missing for this PR. Could you, nevertheless, have a look over the changes? |
many thanks for picking this up. I will set some time aside this week to review it |
Have you had the chance to test this out on a mimir cell? It would be nice to do that before we merge the PR |
I did tried it in a minikube setup, but not yet in a real world scenario. Do you feel confident in merging it as experimental feature? This would give me the option to test it in our production cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, great work @beatkind! Thank you for picking this up.
Review
The PR is in a really good shape. I left a few minor comments and I think they'll be quick to address. Can you take a look? I'm happy to merge this as an experimental feature after addressing them.
I apologise for the delayed review, but I couldn't find a large-enough block of time to review this (it's a hefty PR). I'll try to be more responsive for the next iterations.
Docs
Perhaps the major reason for writing docs is to cover the migration procedure for enabling autoscaling. At Grafana Labs we had to go through a migration procedure to make sure that there aren't any disruptions. The main problem was that removing the replicas
field causes the deployment to scale to 1 replica if the HPA hasn't had the time to take control, which is very likely if you're removing replicas and creating the HPA in one go.
I think a reasonable migration procedure would be to add a config flag to kedaAutoscaling which preserves replicas
(e.g. *.kedaAutoscaling.keepReplicas
). If you'd like to add that in this PR you're welcome, but this PR is already quite large and is already a beneficial addition. I opened this issue to track the migration docs and configuration #7367
Do you agree with this? Maybe I'm missing something about the migration.
Future work
There some bits which are missing from this PR which I think we should add before calling this work complete. I've added them in another issue #7368. As the one opening this PR, would you like to take ownership of these follow-up actions? It's ok to say no, just decided to offer you the chance.
also cc @krajorama in case you want to review this too
operations/helm/charts/mimir-distributed/templates/query-frontend/query-frontend-so.yaml
Outdated
Show resolved
Hide resolved
operations/helm/charts/mimir-distributed/templates/querier/querier-so.yaml
Outdated
Show resolved
Hide resolved
operations/helm/charts/mimir-distributed/templates/metamonitoring/_helpers.tpl
Show resolved
Hide resolved
@dimitarvdimitrov many thanks for the review! I am happy to add missing parts that you have mentioned. I also totally agree with the migration path. You are right, this is already a big PR. I will add the things you have mentioned, let then merge it as experimental. I will also add a comment regarding to be careful regarding the mentioned replica situation. I am happy to take ownership of the issues you have created and I plan to open another PR once that is merged introducing the migration to autoscaler and adding docs :) Thanks you very much! |
…perimental state of the feature, using the same autoscaling rules & behaviours as jsonnet, only supporting newest version of KEDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one comment on the querier SO and a suggestion to satisfy the linter
operations/helm/charts/mimir-distributed/templates/query-frontend/query-frontend-so.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for your work on this @beatkind 😄
because of some drift with |
@dimitarvdimitrov already on my way |
…afana#7269) * helm: align grpc server connection lifetime settings with jsonnet Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> * helm: rebuild tests Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> --------- Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>
…rafana#7290) * querymiddleware: race condition in shardActiveSeriesMiddleware s2.Writer is not goroutine-safe to reuse between concurrent requests. Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> * querymiddleware: remove flaky ResponseBodyStreamed test from shardActiveSeriesMiddleware --------- Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Update pkg/util/version/info.go Update pkg/mimirtool/client/client.go Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> Co-authored-by: Andy Asp <90626759+andyasp@users.noreply.github.com>
…onfig (grafana#7298) Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Jsonnet / Helm: improve distributors graceful shutdown Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix Helm distributor termination grace period Signed-off-by: Marco Pracucci <marco@pracucci.com> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: grafanabot <grafanabot@grafana.com>
…fault to true, and deprecate (grafana#7366) * Distributor: Change -distributor.enable-otlp-metadata-storage flag's default to true and deprecate --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…-distributor.limit-inflight-requests-using-grpc-method-limiter as stable and enable it by default (grafana#7360) * Mark -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter as stable and enable it by default Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Mark config as deprecated Signed-off-by: Marco Pracucci <marco@pracucci.com> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com>
…rafana#7342) * Do not consider out-of-order blocks when filtering compactable jobs Co-authored-by: Marco Pracucci <marco@pracucci.com>
* mimir: inject span profiler into tracer Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> * Update changelog Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com> --------- Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
* Add experimental partitions ring lifecycler support Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Use http.Error() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Skip deprecated linter check for code that will be soon replaced Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated dskit Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add partition id to logs Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added TestIngester_ShouldNotCreatePartitionIfThereIsShutdownMarker Signed-off-by: Marco Pracucci <marco@pracucci.com> --------- Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Head branch was pushed to by a user without write access
I believe this still needs one more |
What this PR does
This PR takes #4687 as base layer and tries to implement all added code comments. Please be aware that this PR does not cover the changes in the monitoring dashboards that may be needed.
Which issue(s) this PR fixes or relates to
Fixes #3430
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.