You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to define the expected resource usage of a given stage, i.e. memory, cores, etc., so that a pipeline does not attempt to use more than what is available (this is particularly helpful when tools have high memory usage).
The number of concurrently running threads is managed via Shake's
resources, but memory is not. This could be extended to include memory.
Are there any other types of resources that would be good to capture?
Mainly RAM. For example, we have a step of our pipeline that takes 60GB+ RAM, and then RAM usage contracts greatly. We want to run many instances of the pipeline in parallel on a single machine, which is only possible if the memory usage is staggered.
It would also be useful to consider time (skip optional steps if there is not enough time) and disk space (in our case, if the pipeline expects to run out of space we can write to a slower networked drive). I'd suggest these latter two be optionally provided data for now, perhaps with some generic functionality for handling them.
It would be useful to define the expected resource usage of a given stage, i.e. memory, cores, etc., so that a pipeline does not attempt to use more than what is available (this is particularly helpful when tools have high memory usage).
There is some support for this in Shake already:
https://hackage.haskell.org/package/shake-0.18.3/docs/Development-Shake.html#t:Resource
And here is another example:
https://github.com/fulcrumgenomics/dagr#resource-management
The text was updated successfully, but these errors were encountered: