Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: add support for controlling maximum number of jobs per executor #1452

Closed
2 tasks done
shahzebsiddiqui opened this issue Apr 27, 2023 · 2 comments
Closed
2 tasks done

Comments

@shahzebsiddiqui
Copy link
Member

Please describe your feature

in https://github.com/buildtesters/buildtest/blob/devel/buildtest/schemas/settings.schema.json we have a key property max_jobs defined that can be used to control number of jobs that can be run per executor. We should first try to implement this feature for batch jobs where max_jobs signify number of concurrent jobs that can be running at a given time.

We can have the following setup

executors:
  # total across all executors in a single run
  max_jobs: 50
  local:
     # maximum number of jobs running across all local executors
    max_jobs: 20
    bash:
      shell: bash
       # maximum number of jobs running within a executor
      max_jobs: 5
  slurm:
    regular:
      qos: regular
      max_jobs: 2

Suggest potential solution

No response

Additional Information

No response

Post question in Slack

  • I agree that I posted my question in slack before creating this issue

Is there an existing issue

  • I confirm there is no existing issue for this issue
@shahzebsiddiqui
Copy link
Member Author

Looking at the codebase we have to make some changes. For instance right now we are just iterating over the builders and just running the test using the executor.run method

results.append(pool.apply_async(executor.run, args=(builder,)))

We need some logic in the base class https://github.com/buildtesters/buildtest/blob/devel/buildtest/executors/base.py that keeps track of number of jobs run per executor so we can implement this feature.

For instance in the local executor run implementation where we dispatch the job we should check if number of active jobs running exceeds max_jobs if so we should raise exception and have it wait in queue until next iteration

try:
command = builder.run(run_cmd, timeout=timeout)
except RuntimeFailure as err:
builder.failed()
self.logger.error(err)
return

@shahzebsiddiqui
Copy link
Member Author

this feature was added in #1629 and #1630

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: No status
Development

No branches or pull requests

1 participant