-
Notifications
You must be signed in to change notification settings - Fork 3
Workflow Execution: Step by Step
Allen McPherson edited this page Jun 24, 2019
·
1 revision
- A globally accessible, transactional, graph database represents the workflow.
- The graph database is initially populated by parsing a CWL file.
- One or more orchestrator daemons monitor tasks/results and update graph database.
- A resource manager and/or scheduler (e.g. Slurm) is used to map workflow execution decisions to the underlying platform.
- Orchestrator starts with "head" node of workflow.
- This node will have metadata and requirements for the entire workflow.
- Orchestrator asks API for all tasks that are ready to run.
- This may typically be one, a big MPI job.
- Orchestrator, via the resource manager, send these tasks for execution.
- What is the possible feedback from this operation?
- Orchestrator "awaits" status changes on executing tasks.
- Async Python operations?
- Can these be effectively/correctly built with Python?
- As tasks complete, orchestrator gets (or already knows) their dependents.
- Orchestrator tracks dependents' dependencies (ugh).
- When all dependencies are met, task is set to "runnable" and this process recurses.
- How do we terminate the workflow? Is a "tail" node required in graph? Probably.
- The graph is modified as workflow progresses so that, at the end of the run, it represents the "state" and "result" of the workflow.
- Status of task execution (resources consumed, runtime, etc.)
- All outputs and inputs are cataloged (maybe SHA of their contents?)