Workflow Execution: Step by Step

A globally accessible, transactional, graph database represents the workflow.
The graph database is initially populated by parsing a CWL file.
One or more orchestrator daemons monitor tasks/results and update graph database.
A resource manager and/or scheduler (e.g. Slurm) is used to map workflow execution decisions to the underlying platform.

Orchestrator starts with "head" node of workflow.
- This node will have metadata and requirements for the entire workflow.
Orchestrator asks API for all tasks that are ready to run.
- This may typically be one, a big MPI job.
Orchestrator, via the resource manager, send these tasks for execution.
- What is the possible feedback from this operation?
Orchestrator "awaits" status changes on executing tasks.
- Async Python operations?
- Can these be effectively/correctly built with Python?
As tasks complete, orchestrator gets (or already knows) their dependents.
Orchestrator tracks dependents' dependencies (ugh).
- When all dependencies are met, task is set to "runnable" and this process recurses.
- How do we terminate the workflow? Is a "tail" node required in graph? Probably.
The graph is modified as workflow progresses so that, at the end of the run, it represents the "state" and "result" of the workflow.
- Status of task execution (resources consumed, runtime, etc.)
- All outputs and inputs are cataloged (maybe SHA of their contents?)

Provide feedback