-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load test: Test run-to-completion workflow for Job objects #799
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
@mm4tt @wojtek-t So I am actually wondering about one thing. Let's assume we change jobs to "sleep X". I have two ideas what we could do here:
What do you think about it? |
We can get rid of scaling phase for jobs at this point. Just run it to completion, ideally with parallelism < completions. |
/assign |
Let me know if this plan makes sense:
|
I'm actually reluctant to changing that in the current load test. We make some assumptions about number of pods etc. @mborsz - FYI |
I agree with splitting Jobs to another test. Currently we are strongly depending on number of Pods and changing that during the test can generate unstable results. As mentioned in kubernetes/enhancements#3113 and kubernetes/enhancements#3111 understanding performance metrics should require monitoring values in metric job_sync_duration_seconds (they are visible in Prometheus as job_controller_job_sync_duration_seconds_* metrics ). To achieve this, we should tweak
|
As a part of #704 the load test was extended to cover Jobs.
The way it was implemented was kind of a shortcut, we used "pause" pods that never complete and used Jobs in the same way we use/test Deployments, i.e. we create N Jobs of size X then scale them up or down and then delete the jobs. While this was a good start to test overall performance of job-controller, we should actually use the jobs in a way they are designed to be used, i.e. don't use the pause pods, but pods that end after some time and test the run-to-completion workflow.
The text was updated successfully, but these errors were encountered: