Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler autoscaler is not leader aware #6732

Closed
pierDipi opened this issue Feb 7, 2023 · 1 comment · Fixed by #6814
Closed

Scheduler autoscaler is not leader aware #6732

pierDipi opened this issue Feb 7, 2023 · 1 comment · Fixed by #6814
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@pierDipi
Copy link
Member

pierDipi commented Feb 7, 2023

Describe the bug

The autoscaler runs in every controller replica [1] and it tries to scale down on every replica after the given refresh period and sometimes the 2 replicas don't agree on which value to use for the new replicas since the state is lister/cache based, leading to a too fast scale up or down behavior or sometime not converging (also because of #6733)

[1]

func (a *autoscaler) Start(ctx context.Context) {
attemptScaleDown := false
pending := int32(0)
for {
select {
case <-ctx.Done():
return
case <-time.After(a.refreshPeriod):
attemptScaleDown = true
case pending = <-a.trigger:
attemptScaleDown = false
}
// Retry a few times, just so that we don't have to wait for the next beat when
// a transient error occurs
a.syncAutoscale(ctx, attemptScaleDown, pending)
pending = int32(0)
}
}

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce

No consistent way of reproducing the issue.

Knative release version
<= main

Additional context
Add any other context about the problem here such as proposed priority

@pierDipi pierDipi added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2023
@pierDipi
Copy link
Member Author

pierDipi commented Feb 7, 2023

Blocked by knative/pkg#2675

pierDipi added a commit to pierDipi/eventing that referenced this issue Feb 7, 2023
In order to make knative#6732
we need a way of injecting more configuration down to the scheduler
and autoscaler, so in this PR I'm extracting a new function to
create a scheduler with `statefulset.New` but with a single
`Config` parameter, this will allow us adding configurations
without making breaking API changes.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
pierDipi added a commit to pierDipi/eventing that referenced this issue Feb 7, 2023
In order to make knative#6732
we need a way of injecting more configuration down to the scheduler
and autoscaler, so in this PR I'm extracting a new function to
create a scheduler with `statefulset.New` but with a single
`Config` parameter, this will allow us adding configurations
without making breaking API changes.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot pushed a commit that referenced this issue Feb 9, 2023
…ers (#6736)

In order to make #6732 we need
a way of injecting more configuration down to the scheduler and
autoscaler, so in this PR I'm extracting a new function to create a
scheduler with `statefulset.New` but with a single `Config` parameter,
this will allow us adding configurations without making breaking API
changes.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow-robot pushed a commit to knative-prow-robot/eventing that referenced this issue Feb 9, 2023
In order to make knative#6732
we need a way of injecting more configuration down to the scheduler
and autoscaler, so in this PR I'm extracting a new function to
create a scheduler with `statefulset.New` but with a single
`Config` parameter, this will allow us adding configurations
without making breaking API changes.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow-robot pushed a commit to knative-prow-robot/eventing that referenced this issue Feb 9, 2023
In order to make knative#6732
we need a way of injecting more configuration down to the scheduler
and autoscaler, so in this PR I'm extracting a new function to
create a scheduler with `statefulset.New` but with a single
`Config` parameter, this will allow us adding configurations
without making breaking API changes.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
knative-prow bot pushed a commit that referenced this issue Mar 14, 2023
Fixes #6732 

The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of #6733)

Implementations should be using
knative/pkg#2675
for enabling leader-aware autoscaler. (PR
knative/pkg#2688)

[1]

https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
vishal-chdhry pushed a commit to vishal-chdhry/eventing that referenced this issue Mar 14, 2023
Fixes knative#6732 

The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of knative#6733)

Implementations should be using
knative/pkg#2675
for enabling leader-aware autoscaler. (PR
knative/pkg#2688)

[1]

https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@pierDipi pierDipi self-assigned this Apr 18, 2023
vishal-chdhry pushed a commit to vishal-chdhry/eventing that referenced this issue Apr 25, 2023
Fixes knative#6732 

The autoscaler runs in every controller replica [1], it tries
to scale down on every replica after the given refresh period,
and sometimes the 2 replicas don't agree on which value to use
for the new replicas since the state is lister/cache based,
leading to a too fast scale up or down behavior or sometime
not converging.
(also because of knative#6733)

Implementations should be using
knative/pkg#2675
for enabling leader-aware autoscaler. (PR
knative/pkg#2688)

[1]

https://github.com/knative/eventing/blob/1092472f440586099d6a5cbf1d3234bb36431af4/pkg/scheduler/statefulset/autoscaler.go#L85-L103

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant