-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler autoscaler is not considering reserved replicas #6733
Labels
kind/bug
Categorizes issue or PR as related to a bug.
triage/accepted
Issues which should be fixed (post-triage)
Comments
pierDipi
added a commit
to pierDipi/eventing
that referenced
this issue
Mar 13, 2023
The autoscaler runs in every controller replica [1], it tries to scale down on every replica after the given refresh period, and sometimes the 2 replicas don't agree on which value to use for the new replicas since the state is lister/cache based, leading to a too fast scale up or down behavior or sometime not converging. (also because of knative#6733) [1] https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi
added a commit
to pierDipi/eventing
that referenced
this issue
Mar 13, 2023
The autoscaler runs in every controller replica [1], it tries to scale down on every replica after the given refresh period, and sometimes the 2 replicas don't agree on which value to use for the new replicas since the state is lister/cache based, leading to a too fast scale up or down behavior or sometime not converging. (also because of knative#6733) [1] https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot
pushed a commit
that referenced
this issue
Mar 14, 2023
Fixes #6732 The autoscaler runs in every controller replica [1], it tries to scale down on every replica after the given refresh period, and sometimes the 2 replicas don't agree on which value to use for the new replicas since the state is lister/cache based, leading to a too fast scale up or down behavior or sometime not converging. (also because of #6733) Implementations should be using knative/pkg#2675 for enabling leader-aware autoscaler. (PR knative/pkg#2688) [1] https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
vishal-chdhry
pushed a commit
to vishal-chdhry/eventing
that referenced
this issue
Mar 14, 2023
Fixes knative#6732 The autoscaler runs in every controller replica [1], it tries to scale down on every replica after the given refresh period, and sometimes the 2 replicas don't agree on which value to use for the new replicas since the state is lister/cache based, leading to a too fast scale up or down behavior or sometime not converging. (also because of knative#6733) Implementations should be using knative/pkg#2675 for enabling leader-aware autoscaler. (PR knative/pkg#2688) [1] https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
vishal-chdhry
pushed a commit
to vishal-chdhry/eventing
that referenced
this issue
Apr 25, 2023
Fixes knative#6732 The autoscaler runs in every controller replica [1], it tries to scale down on every replica after the given refresh period, and sometimes the 2 replicas don't agree on which value to use for the new replicas since the state is lister/cache based, leading to a too fast scale up or down behavior or sometime not converging. (also because of knative#6733) Implementations should be using knative/pkg#2675 for enabling leader-aware autoscaler. (PR knative/pkg#2688) [1] https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
This issue is stale because it has been open for 90 days with no |
/triage accepted |
pierDipi
added a commit
to pierDipi/eventing
that referenced
this issue
Sep 22, 2023
…ercommitted pods There are extensive comments in the actual code changes on the why of each individual change. - Properly handle overcommitted pods - Don't block the scheduler on triggering the autoscaler if the autoscaler is active - Additional fix for knative#6733 - various logging improvements (leader, state, actions and context) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi
added a commit
to pierDipi/eventing
that referenced
this issue
Sep 22, 2023
…ercommitted pods There are extensive comments in the actual code changes on the why of each individual change. - Properly handle overcommitted pods - Don't block the scheduler on triggering the autoscaler if the autoscaler is active - Additional fix for knative#6733 - various logging improvements (leader, state, actions and context) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi
added a commit
to pierDipi/eventing
that referenced
this issue
Sep 22, 2023
…ercommitted pods There are extensive comments in the actual code changes on the why of each individual change. - Properly handle overcommitted pods - Don't block the scheduler on triggering the autoscaler if the autoscaler is active - Additional fix for knative#6733 - various logging improvements (leader, state, actions and context) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot
pushed a commit
that referenced
this issue
Sep 22, 2023
…ercommitted pods (#7281) * Scheduler: fix reserved replicas handling, blocking autoscaler and overcommitted pods There are extensive comments in the actual code changes on the why of each individual change. - Properly handle overcommitted pods - Don't block the scheduler on triggering the autoscaler if the autoscaler is active - Additional fix for #6733 - various logging improvements (leader, state, actions and context) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Add unit tests for overcommitted pods Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow-robot
pushed a commit
to knative-prow-robot/eventing
that referenced
this issue
Oct 17, 2023
…ercommitted pods There are extensive comments in the actual code changes on the why of each individual change. - Properly handle overcommitted pods - Don't block the scheduler on triggering the autoscaler if the autoscaler is active - Additional fix for knative#6733 - various logging improvements (leader, state, actions and context) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot
pushed a commit
that referenced
this issue
Oct 17, 2023
…toscaler and overcommitted pods (#7374) * Scheduler: fix reserved replicas handling, blocking autoscaler and overcommitted pods There are extensive comments in the actual code changes on the why of each individual change. - Properly handle overcommitted pods - Don't block the scheduler on triggering the autoscaler if the autoscaler is active - Additional fix for #6733 - various logging improvements (leader, state, actions and context) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Add unit tests for overcommitted pods Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
Categorizes issue or PR as related to a bug.
triage/accepted
Issues which should be fixed (post-triage)
Describe the bug
The autoscaler is not considering
reserved
replicas (and also pending replicas for scaling down) when making decision on scaling up or down (seenil
in [1]), this makes its behavior inconsistent and sometime leads to not converging to a given number or a too fast scale up / down behavior.[1]
eventing/pkg/scheduler/statefulset/autoscaler.go
Line 120 in 1092472
Expected behavior
A clear and concise description of what you expected to happen.
To Reproduce
No consistent way of reproducing the issue.
Knative release version
Additional context
Add any other context about the problem here such as proposed priority
The text was updated successfully, but these errors were encountered: