Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stop scale loop for pause Scaledbject (issue #4253) #4550

Merged
merged 8 commits into from
Jun 18, 2023

Conversation

tobotg
Copy link
Contributor

@tobotg tobotg commented May 18, 2023

In the controller in the reconcile loop, when a scaledObject is paused (annotated with ), this fix stops the scaling logic (to prevent attempt to connect the to scalers), and deletes the generated HPA (so it is not querying KEDA Metric Server for metrics)

Checklist

Fixes #4253

@tobotg tobotg requested a review from a team as a code owner May 18, 2023 12:49
@tobotg
Copy link
Contributor Author

tobotg commented May 18, 2023

@zroubalik @tomkerkhove Resubmitting a new PR.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, could you please update this PR to contain newest commits? there are some improvements in e2e tests logging, thanks!

@zroubalik
Copy link
Member

zroubalik commented May 18, 2023

/run-e2e
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, there are some minor nits and we should also introduce a new Condition in the ScaledObject.Status.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, could you please add a marker to CRD to show PAUSED column and regenerate the CRD (make generate)?
Probably before the AGE column:
https://github.com/kedacore/keda/blob/main/apis/keda/v1alpha1/scaledobject_types.go#L36

@zroubalik
Copy link
Member

zroubalik commented May 31, 2023

/run-e2e
Update: You can check the progress here

@tobotg
Copy link
Contributor Author

tobotg commented May 31, 2023

Looking good, could you please add a marker to CRD to show PAUSED column and regenerate the CRD (make generate)?
Probably before the AGE column:
https://github.com/kedacore/keda/blob/main/apis/keda/v1alpha1/scaledobject_types.go#L36

Sure thing. I added it and run make manifests.

@zroubalik
Copy link
Member

zroubalik commented Jun 1, 2023

/run-e2e
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the e2e test for paused feature fails :(

@tobotg
Copy link
Contributor Author

tobotg commented Jun 1, 2023

the e2e test for paused feature fails :(

ok thanks @zroubalik , I'll have a look at the pause_scaling_test logs, I guess this is what failed (I can't seem to find the details of the failure in the GitHub actions link)

@tobotg
Copy link
Contributor Author

tobotg commented Jun 4, 2023

@zroubalik The e2e tests are passing in my cluster, but I suspect that the test needs more time to recover the scaling loop (loads might be inducing some more delay), so I'm increasing the wait time. Could you please trigger another run when you get a chance? Thanks

@zroubalik
Copy link
Member

zroubalik commented Jun 5, 2023

/run-e2e
Update: You can check the progress here

@zroubalik
Copy link
Member

zroubalik commented Jun 5, 2023

/run-e2e pause
Update: You can check the progress here

@tobotg
Copy link
Contributor Author

tobotg commented Jun 5, 2023

@zroubalik

/run-e2e pause
Update: You can check the progress here

That's weird. I'll need to have a deeper look later, as I'm at loss understanding the failures here.

@zroubalik
Copy link
Member

@zroubalik

/run-e2e pause
Update: You can check the progress here

That's weird. I'll need to have a deeper look later, as I'm at loss understanding the failures here.

Seems like the testing deployment is not even initally scaled out, it stays on 0 replicas.

@tobotg
Copy link
Contributor Author

tobotg commented Jun 6, 2023

@zroubalik

/run-e2e pause
Update: You can check the progress here

That's weird. I'll need to have a deeper look later, as I'm at loss understanding the failures here.

Seems like the testing deployment is not even initally scaled out, it stays on 0 replicas.

Yes, that's exactly what is puzzling me. Need to sort it out!

@tobotg
Copy link
Contributor Author

tobotg commented Jun 8, 2023

I made two changes to the test

  • use a much smaller nginx image to speed up deployment
  • make the test wait for the monitored deployment to hit its target replica count before proceeding
    @zroubalik

@zroubalik
Copy link
Member

zroubalik commented Jun 8, 2023

/run-e2e pause
Update: You can check the progress here

@zroubalik
Copy link
Member

zroubalik commented Jun 8, 2023

@tobotg I ran your branch locally and got the same error as we can see here in the PR.

@tobotg
Copy link
Contributor Author

tobotg commented Jun 8, 2023

@tobotg I ran your branch locally and got the same error as we can see here in the PR.

@zroubalik Interesting that you're able to reproduce it locally, as I'm not able to reproduce it here. Any chance you can share Keda logs and app:pause-scaling-test and app:pause-scaling-test-monitored logs? That could be helpful for debugging. Thanks in advance.

tobotg added 8 commits June 15, 2023 01:03
Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
…or msg and new condition msg

Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
…sed condition

Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
… get hpa name and paused condition

Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
…ifest for paused condition

Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
…rease test wait timeout

Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
…ed up test with slim container image and wait for monitored deployment

Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
Signed-off-by: Tobo Atchou <tobo.atchou@gmail.com>
@tobotg
Copy link
Contributor Author

tobotg commented Jun 14, 2023

@zroubalik I had some time tonight to look into this issue, and submitted changes to the PR. Could you please trigger an e2e test when you get a chance? Thanks

@zroubalik
Copy link
Member

zroubalik commented Jun 15, 2023

/run-e2e pause
Update: You can check the progress here

@zroubalik
Copy link
Member

zroubalik commented Jun 18, 2023

/run-e2e
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @tobotg! Awesome job, it is a great improvement!

@zroubalik zroubalik merged commit 93851e7 into kedacore:main Jun 18, 2023
SpiritZhou pushed a commit to SpiritZhou/keda that referenced this pull request Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The paused-replicas annotation not fully respected
2 participants