You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Each rule has a static requirement of time needed to run. However real running times scale with the input file sizes. On top of that, in the existing implementation, specific queues provided in the scicore slurm system are specified in the slurm profile, which might cause incompatibilities with other slurm systems.
Describe the solution you'd like
Make use of time variable within each rule. Scale the estimate with the input sizes.
Based on some existing runs, I can fit by least squares the size - runtimes and use the coefficient for scaling. Not sure if this will always work (Multiple inputs, multiprocessing optimisation). Still as we have more information on successful runs we could improve the accuracy of these estimates.
Check if the attempt keyword can be used to increase the time requirement upon restart.
Turns out there is no access to the attempt parameter in the params field. The plan is to first calculate the time estimate in the resources and then interface that to the params field, so that we can then feed it to the cluster.json
Check if the time estimate is translated correctly in terms of the queueing system.
For the 6hour queue this works fine. Jobs that require less than 30 minutes are still going to the 6hour queue, which is specific to this slurm instance and should be fine in general.
Implementation tasks:
Add params and resources time parameters and use standard times first
Fix the time parameter to be scaled to the input sizes by finding coefficient of scaling for a few rules (ensure the concept works).
Fix the time parameter to be scaled to the input sizes by finding coefficient of scaling for each rule
Remove specific rule specifications in cluster.json files
Run standard tests
Run some real dataset to ensure the estimates are reasonable
Check efficiency score of the workflow to observe if it has increased
Describe alternatives you've considered
If all the above do. not work focus on eliminating the queue specification from the cluster.json
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Each rule has a static requirement of time needed to run. However real running times scale with the input file sizes. On top of that, in the existing implementation, specific queues provided in the scicore slurm system are specified in the slurm profile, which might cause incompatibilities with other slurm systems.
Describe the solution you'd like
Based on some existing runs, I can fit by least squares the size - runtimes and use the coefficient for scaling. Not sure if this will always work (Multiple inputs, multiprocessing optimisation). Still as we have more information on successful runs we could improve the accuracy of these estimates.
Turns out there is no access to the attempt parameter in the params field. The plan is to first calculate the time estimate in the resources and then interface that to the params field, so that we can then feed it to the cluster.json
For the 6hour queue this works fine. Jobs that require less than 30 minutes are still going to the 6hour queue, which is specific to this slurm instance and should be fine in general.
Implementation tasks:
Describe alternatives you've considered
If all the above do. not work focus on eliminating the queue specification from the cluster.json
The text was updated successfully, but these errors were encountered: