-
-
Notifications
You must be signed in to change notification settings - Fork 237
Description
Overview
Cwltool currently could benefit from significant performance improvements through changes in the way workflows and tooling are processed. Suggestions for this based on profiling from @stain
Flag for skipping add_schemas
For workflows with complex schemas, it takes a huge amount of time for cwltool to gather, process and check them. Profiles can be seen below before and after removal:
- with add_schemas 482.372s
- without add_schemas 101.739s
This was tested by adding a simple return statement at the start of the function. This option could benefit a lot of users and would be an easy win
Multicore file loading
A large part of the remaining time taken, as would be expected, is in downloading and parsing.
Currently cwltool spends about 0.6s per file loaded remotely. As in the above traces of a total time of 86s, 36s is http fetching. These files could be loaded in parallel on a multicore machine and yaml parsing a different document could be done at the same time as downloading another.
With many split tooling files or subworkflows this will make a particularly large impact.