Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler autoscaler is not considering reserved replicas #6733

Closed
pierDipi opened this issue Feb 7, 2023 · 2 comments · Fixed by #7027
Closed

Scheduler autoscaler is not considering reserved replicas #6733

pierDipi opened this issue Feb 7, 2023 · 2 comments · Fixed by #7027
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)

Comments

@pierDipi
Copy link
Member

pierDipi commented Feb 7, 2023

Describe the bug

The autoscaler is not considering reserved replicas (and also pending replicas for scaling down) when making decision on scaling up or down (see nil in [1]), this makes its behavior inconsistent and sometime leads to not converging to a given number or a too fast scale up / down behavior.

[1]

state, err := a.stateAccessor.State(nil)

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce

No consistent way of reproducing the issue.

Knative release version

Additional context
Add any other context about the problem here such as proposed priority

@pierDipi pierDipi added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2023
pierDipi added a commit to pierDipi/eventing that referenced this issue Mar 13, 2023
The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of knative#6733)

[1]
https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi added a commit to pierDipi/eventing that referenced this issue Mar 13, 2023
The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of knative#6733)

[1]
https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot pushed a commit that referenced this issue Mar 14, 2023
Fixes #6732 

The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of #6733)

Implementations should be using
knative/pkg#2675
for enabling leader-aware autoscaler. (PR
knative/pkg#2688)

[1]

https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
vishal-chdhry pushed a commit to vishal-chdhry/eventing that referenced this issue Mar 14, 2023
Fixes knative#6732 

The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of knative#6733)

Implementations should be using
knative/pkg#2675
for enabling leader-aware autoscaler. (PR
knative/pkg#2688)

[1]

https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
vishal-chdhry pushed a commit to vishal-chdhry/eventing that referenced this issue Apr 25, 2023
Fixes knative#6732 

The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of knative#6733)

Implementations should be using
knative/pkg#2675
for enabling leader-aware autoscaler. (PR
knative/pkg#2688)

[1]

https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@github-actions
Copy link

github-actions bot commented May 9, 2023

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2023
@github-actions github-actions bot closed this as completed Jun 8, 2023
@pierDipi pierDipi reopened this Jun 19, 2023
@pierDipi
Copy link
Member Author

/triage accepted

@knative-prow knative-prow bot added the triage/accepted Issues which should be fixed (post-triage) label Jun 19, 2023
@pierDipi pierDipi self-assigned this Jun 19, 2023
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2023
@pierDipi pierDipi moved this from ✅ Done to 🏗 In progress in Eventing Kafka Broker Scheduling and Scaling Jul 27, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Eventing Kafka Broker Scheduling and Scaling Aug 31, 2023
pierDipi added a commit to pierDipi/eventing that referenced this issue Sep 22, 2023
…ercommitted pods

There are extensive comments in the actual code changes on the why
of each individual change.

- Properly handle overcommitted pods
- Don't block the scheduler on triggering the autoscaler if the autoscaler
  is active
- Additional fix for knative#6733
- various logging improvements (leader, state, actions and context)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi added a commit to pierDipi/eventing that referenced this issue Sep 22, 2023
…ercommitted pods

There are extensive comments in the actual code changes on the why
of each individual change.

- Properly handle overcommitted pods
- Don't block the scheduler on triggering the autoscaler if the autoscaler
  is active
- Additional fix for knative#6733
- various logging improvements (leader, state, actions and context)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi added a commit to pierDipi/eventing that referenced this issue Sep 22, 2023
…ercommitted pods

There are extensive comments in the actual code changes on the why
of each individual change.

- Properly handle overcommitted pods
- Don't block the scheduler on triggering the autoscaler if the autoscaler
  is active
- Additional fix for knative#6733
- various logging improvements (leader, state, actions and context)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot pushed a commit that referenced this issue Sep 22, 2023
…ercommitted pods (#7281)

* Scheduler: fix reserved replicas handling, blocking autoscaler and overcommitted pods

There are extensive comments in the actual code changes on the why
of each individual change.

- Properly handle overcommitted pods
- Don't block the scheduler on triggering the autoscaler if the autoscaler
  is active
- Additional fix for #6733
- various logging improvements (leader, state, actions and context)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Add unit tests for overcommitted pods

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow-robot pushed a commit to knative-prow-robot/eventing that referenced this issue Oct 17, 2023
…ercommitted pods

There are extensive comments in the actual code changes on the why
of each individual change.

- Properly handle overcommitted pods
- Don't block the scheduler on triggering the autoscaler if the autoscaler
  is active
- Additional fix for knative#6733
- various logging improvements (leader, state, actions and context)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot pushed a commit that referenced this issue Oct 17, 2023
…toscaler and overcommitted pods (#7374)

* Scheduler: fix reserved replicas handling, blocking autoscaler and overcommitted pods

There are extensive comments in the actual code changes on the why
of each individual change.

- Properly handle overcommitted pods
- Don't block the scheduler on triggering the autoscaler if the autoscaler
  is active
- Additional fix for #6733
- various logging improvements (leader, state, actions and context)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Add unit tests for overcommitted pods

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant