-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue selection in env_batch is not adequate for blues #1255
Comments
More info on blues: you can't directly submit to the "batch" queue. Adding "-q batch" will get your job rejected. For regular jobs, it figures out where to route you based on nodes and time requested. Only if you want a different queue, like "shared", then you have to use the "-q" option. |
@rljacob that's been fixed on the ACME side via python hack. |
Had a request come in on the ACME side so I'll take this. |
Update queue selection to take walltime into account Adds concept of strict walltime. The idea here is to have better support for machines like blues that have a "debug" queue and a "standard" queue. The "debug" has strict limits on both walltime and num_pes and therefore should not be selected as the user's queue if they asked for a long walltime. For other machines, the maxwalltime setting is being used more like a default walltime than a true max. Test suite: scripts_regression_tests (melvin and skybridge) and some by-hand testing on blues Test baseline: Test namelist changes: Test status: bit for bit Fixes #1255 User interface changes?: Changes in how walltime is handled Code review: @jedwards4b @jayeshkrishna @rljacob
Some background: blues has two available queues, "shared" and "batch". The shared queue will throw an error if you try to submit a job and request more than an hour of runtime. The "shared" queue will accept num cores 1-64, the "batch" queue will accept 1-thousands. My problem is that we have a lot of tests that take 64 cores (fit in either queue), but some take three hours and some take one. env_batch.select_best_queue seems unable to select the appropriate queue for such jobs (1 hour or less should go to "shared" and the others should go to "batch"), instead it just selects "shared" for everything.
The text was updated successfully, but these errors were encountered: