Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system parameter ntask_max is not honored for certain subclasses #200

Open
bch0w opened this issue Mar 9, 2024 · 1 comment
Open

system parameter ntask_max is not honored for certain subclasses #200

bch0w opened this issue Mar 9, 2024 · 1 comment
Labels

Comments

@bch0w
Copy link
Member

bch0w commented Mar 9, 2024

Certain System sub classes that do not support array jobs (e.g., Frontera, Wisteria). The work around implementation is to submit individual jobs to the system one by one. However, these modules have no mechanism for controlling the parameter ntask_max and so will submit all jobs simultaneously to the job scheduler.

This is not the intended behavior and may lead to resource competition or upset sysadmins. These systems need their own internal ntask_max routine which only submits ntask_max jobs at once, and monitors the queue, submitting new jobs when previous jobs complete.

I think all the requisite pieces are there, just requires implementation and testing. I think what will be the biggest hurdle is the live checking of a job queue and the decision to submit new jobs, this can sometimes be a finicky operation.

@bch0w bch0w added the bug label Mar 19, 2024
@bch0w
Copy link
Member Author

bch0w commented Nov 21, 2024

This is implemented for Wisteria using #227, still open for Frontera although we might be able to use the same mechanism developed for Wisteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant