Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale time resources #59

Open
6 of 10 tasks
mkatsanto opened this issue Mar 16, 2022 · 1 comment
Open
6 of 10 tasks

Scale time resources #59

mkatsanto opened this issue Mar 16, 2022 · 1 comment
Labels
future will not be fixed for NOW

Comments

@mkatsanto
Copy link
Collaborator

mkatsanto commented Mar 16, 2022

Is your feature request related to a problem? Please describe.
Each rule has a static requirement of time needed to run. However real running times scale with the input file sizes. On top of that, in the existing implementation, specific queues provided in the scicore slurm system are specified in the slurm profile, which might cause incompatibilities with other slurm systems.

Describe the solution you'd like

  • Make use of time variable within each rule. Scale the estimate with the input sizes.

Based on some existing runs, I can fit by least squares the size - runtimes and use the coefficient for scaling. Not sure if this will always work (Multiple inputs, multiprocessing optimisation). Still as we have more information on successful runs we could improve the accuracy of these estimates.

  • Check if the attempt keyword can be used to increase the time requirement upon restart.

Turns out there is no access to the attempt parameter in the params field. The plan is to first calculate the time estimate in the resources and then interface that to the params field, so that we can then feed it to the cluster.json

  • Check if the time estimate is translated correctly in terms of the queueing system.
    For the 6hour queue this works fine. Jobs that require less than 30 minutes are still going to the 6hour queue, which is specific to this slurm instance and should be fine in general.

Implementation tasks:

  • Add params and resources time parameters and use standard times first
  • Fix the time parameter to be scaled to the input sizes by finding coefficient of scaling for a few rules (ensure the concept works).
  • Fix the time parameter to be scaled to the input sizes by finding coefficient of scaling for each rule
  • Remove specific rule specifications in cluster.json files
  • Run standard tests
  • Run some real dataset to ensure the estimates are reasonable
  • Check efficiency score of the workflow to observe if it has increased

Describe alternatives you've considered
If all the above do. not work focus on eliminating the queue specification from the cluster.json

@mkatsanto mkatsanto self-assigned this Mar 16, 2022
@mkatsanto
Copy link
Collaborator Author

@fgypas fgypas added the future will not be fixed for NOW label Apr 25, 2022
@mkatsanto mkatsanto added this to the submission_related_updates milestone Oct 28, 2022
@mkatsanto mkatsanto removed their assignment Oct 28, 2022
@mkatsanto mkatsanto removed this from the submission_related_updates milestone Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future will not be fixed for NOW
Projects
None yet
Development

No branches or pull requests

2 participants