Skip to content

Performance Improvements #433

@MarkRobbo

Description

@MarkRobbo

Overview

Cwltool currently could benefit from significant performance improvements through changes in the way workflows and tooling are processed. Suggestions for this based on profiling from @stain

Flag for skipping add_schemas

For workflows with complex schemas, it takes a huge amount of time for cwltool to gather, process and check them. Profiles can be seen below before and after removal:

This was tested by adding a simple return statement at the start of the function. This option could benefit a lot of users and would be an easy win

Multicore file loading

A large part of the remaining time taken, as would be expected, is in downloading and parsing.

Currently cwltool spends about 0.6s per file loaded remotely. As in the above traces of a total time of 86s, 36s is http fetching. These files could be loaded in parallel on a multicore machine and yaml parsing a different document could be done at the same time as downloading another.

With many split tooling files or subworkflows this will make a particularly large impact.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions