A domain specific language based on YAML is defined to express the resource requirements and other attributes of one or more programs submitted to a Flux instance for execution. This RFC describes the canonical jobspec form, which represents a request to run exactly one program.
-
Name: github.com/flux-framework/rfc/spec_14.adoc
-
Editor: Tom Scogland <scogland1@llnl.gov>
-
State: raw
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
-
Express the resource requirements of a program to the scheduler.
-
Allow graph-oriented resource requirements to be expressed.
-
Express program attributes such as arguments, run time, and task layout, to be considered by the program execution service (RFC 12)
-
Express dependencies relative to other programs executing within the same Flux instance.
-
Emphasize expressivity over simplicity, as this canonical form may be generated from other user-friendly forms or interfaces.
-
Facilitate reproducible runs.
-
Promote sharing and reuse of jobspec.
This RFC describes the canonical form of "jobspec", a domain specific language based on YAML[1]. The canonical jobspec SHALL consist of a single YAML document representing a reusable request to run exactly one program. Hereafter, "jobspec" refers to the canonical form, and "non-canonical jobspec" refers to the non-canonical form.
Non-canonical jobspec SHALL be decomposed into jobspec before it is enqueued for the scheduler and program execution service.
User facing tools MAY generate jobspec from non-canonical jobspec, or other sources. Such tools MAY:
-
generate a batch of dependent jobspecs representing a scientific workflow
-
generate a stream of jobspecs representing a steered parameter study
-
convert simulation parameters into jobspec containing computed resource requirements, etc.
-
convert command line arguments to jobspec, e.g. "flux mpirun"
The jobspec SHALL be submitted to a job submission service. Malformed jobspec SHALL be immediately rejected by the job submission service. A stack of plugins SHALL test jobspec against site or user defined criteria, and on failure, MAY reject the jobspec, or MAY warn the user and continue on. The job submission service SHALL enqueue the jobspec for consideration by the scheduler.
The scheduler SHALL consider each enqueued jobspec in the context of its dependencies and the pool of available resources. When the scheduler chooses to execute a job, it allocates resources, associates them with the jobspec, and notifies the program execution service to start the program(s).
The program execution service, described in RFC 12, launches the program(s). Task slots, containment, and task layout SHALL be created within the allocated resources as described by the jobspec, or if that is not possible, the job SHALL enter a failed state and resources SHALL be returned to the scheduler.
Once a job is retired, the jobspec SHALL be retained as part of its provenance record.
Resources are represented as hierarchies or graphs, as described in RFC 4.
FIXME: describe how Flux hierarchical resource representation affects jobspec design.
A canonical jobspec YAML document SHALL consist of a dictionary
defining the resources, tasks and other attributes of a single
program. The dictionary MUST contain the keys resources
, tasks
,
attributes
, and version
.
Each of the listed jobspec keys SHALL meet the form and requirements listed in detail in the sections below. For reference, a ruleset for compliant canonical jobspec is provided using JSON Content Rules[2] in the Content Rules at the end of this section.
The value of the resources
key SHALL be a strict list which MUST
define at least one resource. Each list element SHALL represent a
resource vertex or resource descriptor object as a dictionary
(described below). The list of resources defined under the resources
key SHALL represent a composite resource request for the program
defined in the jobspec.
A resource vertex SHALL contain the following keys:
- type
-
The
type
key for a resource SHALL indicate the type of resource to be matched. Some type names MAY be reserved for use in the jobspec language itself. The currently reserved type isslot
, used to define task slots. Reserved types are described in the Reserved Resource Types section below. - count
-
The
count
key SHALL indicate the desired number or range of resources matching the current vertex. Thecount
SHALL have one of two possible values: either a single integer value representing a fixed count, or a dictionary which SHALL contain the following keys:min The minimum required count or amount of this resource
max The maximum required count or amount of this resource
operator An operator applied between
min
andmax
which returns the next acceptable valueoperand The operand used in conjunction with
operator
to get the next value betweenmin
andmax
.
A resource vertex MAY additionally contain one or more of the following keys
- unit
-
The
unit
key, if supplied, SHALL have a string value indicating the chosen units applied to thecount
value or values. - exclusive
-
The
exclusive
key SHALL be a boolean indicating, when true, that the current resource is requested to be allocated exclusively to the current program. If unset, the default value forexclusive
SHALL befalse
for vertices that are not within a task slot. The default value forexclusive
SHALL betrue
for task slots (type: slot
) and their associated resources. - with
-
The
with
key SHALL indicate an edge of typeout
from this resource vertex to another resource. Therefore, the value of thewith
key SHALL be a dictionary conforming to the resource vertex specification. - label
-
The
label
key SHALL be a string that may be used to reference this resource vertex from other locations within the same jobspec.label
SHALL be local to the namespace of the current jobspec, and eachlabel
in the current jobspec must be unique.label
SHALL be mandatory in resource vertices of typeslot
. - id
-
The value of the
id
key SHALL be a string indicating a set of matching resource identifiers.
- slot
-
A resource type of
type: slot
SHALL indicate a grouping of resources into a named task slot. Aslot
SHALL be a valid resource spec including alabel
key, the value of which may be used to reference the named task slot during tasks definition. Thelabel
provided SHALL be local to the namespace of the current jobspec.A task slot SHALL have at least one edge specified using
with:
, and the resources associated with a slot SHALL be exclusively allocated to the program described in the jobspec.
The value of the tasks
key SHALL be a strict list which MUST
define at least one task. Each list element SHALL be a dictionary
representing a task or tasks to run as part of the program. A task
descriptor SHALL contain the following keys:
- command
-
The value of the
command
key SHALL be a string OR list representing an executable and its arguments. - slot
-
The value of the
slot
key SHALL be used to indicate the task slot on which this task or tasks shall be contained and executed. The number of tasks executed per task slot SHALL be a function of the number of resource slots and total number of tasks requested to execute.The value of the
slot
key SHALL be a dictionary with supported key of eitherlabel
ortype
. Thelabel
key SHALL reference alabel
of a resource vertex of typeslot
, indicating an explicitly created and named task slot. Thetype
key SHALL reference a real resource type, such ascore
ornode
, indicating an implicitly created task slot on which to map the defined tasks.type
SHALL NOT beslot
; slots must be referred to by theirlabel
. - count
-
The value of the
count
key SHALL be a dictionary supporting at least the keysper_slot
andtotal
, with other keys reserved for future or site-specific extensions.per_slot The value of
per_slot
SHALL be a number indicating the number of tasks to execute per task slot allocated to the program.total The value of the
total
field SHALL indicate the total number of tasks to be run across all task slots, possibly oversubscribed. - attributes
-
The
attributes
key SHALL be a free-form dictionary of keys which may be used for platform independent or optional extensions. - distribution
-
The value of the
distribution
key SHALL be a string, which MAY be used as input to the launcher’s algorithm for task placement and layout among task slots.
The value of the attributes
key SHALL be a dictionary of dictionaries.
The attributes
dictionary MAY contain one or more of the following keys
which, if present, must have dictionary values:
- user
-
Attributes in the
user
dictionary are unrestricted, and may be used as the application demands. Flux may provide addition tools that can identify jobs based onuser
attributes. - system
-
Attributes in the
system
dictionary are additional parameters to a Flux instance that affect program execution, scheduling, etc. All attributes insystem
are reserved words, however unrecognized words SHALL trigger no more than a warning. This permits jobspec reuse between multiple flux instances which may be configured differently and recognize different sets of attributes.Most system attributes are optional. Flux modules SHALL provide reasonable defaults for any system attributes that they recognize when at all possible.
Some common system attributes are:
- duration
-
The value of the
duration
attribute is a string representing time span. The scheduler will make an effort to allocate the requested resources for the time specified induration
.
Under the description above, the following is an example of a fully compliant
version 1 jobspec. The example below declares a request for 4 "nodes"
each of which with 1 task slot consisting of 2 cores each, for a total
of 4 task slots. A single copy of the command app
will be run on each
task slot for a total of 4 tasks.
version: 1
resources:
- type: node
count: 4
with:
- type: slot
count: 1
label: default
with:
- type: core
count: 2
tasks:
- command: app
slot:
label: default
count:
per_slot: 1
attributes:
system:
duration: 1 hour
A simpler example using implicit task slot definition to run 4 tasks across 4 nodes
version: 1
resources:
- type: node
count: 4
tasks:
- command: hostname
slot:
type: node
count:
per_slot: 1
attributes:
system:
duration: 1 hour
A jobspec conforming to version 1 of the language definition SHALL adhere to the following ruleset, described using JSON Content Rules[2] draft version 0.6.
# jcr-version 0.6 { "resources" : [ $vertex + ], "tasks" : tasks, "attributes" : { $system_attributes, $user_attributes }, "version" : 1 } $label := string $vertex_common = { "count" : ( 1.. | $complex_range), ?"exclusive" : boolean, ?"with" : [ $vertex + ] } slot_vertex = { "type" : "slot", "label" : $label, $vertex_common, } resource_vertex = { "type" : ( :string, + @{reject} ("slot")), $vertex_common ?"id" : string, ?"unit" : string, } vertex = ( $slot_vertex | $resource_vertex ) $complex_range = { "min" : 1.., "max" : 1.., "operator" : ( :"+" | :"*" | : "^" ), "operand" : 1.., } $tasks = { "command" : [ string + ], "slot" : { "label" : string | "type": string }, "count" : { "per_slot" : 1.. | "total" : 1.. }, "distribution" : string, ?"attributes" : { /.*/ : any }, } $system_attributes = { "duration" : string, /.*/ : string } $user_attributes = { /.*/ : string }
To implement basic resource manager functionality, the following use cases SHALL be supported by the jobspec:
The following "resource only" requests are assumed to be the equivalent
of existing resource manager batch job submission or allocation
requests, i.e. equivalent to oarsub
, qsub
, and salloc
. In terms
of a canonical jobspec, these requests are assumed to be requests
to start an instance, i.e. run a single copy of flux start
per
allocated node.
- Use Case 1.1
-
Request Single Resource with Count
- Specific Example
-
Request 4 nodes
- Existing Equivalents
-
Slurm
salloc -N4
PBS
qsub -l nodes=4
- Jobspec YAML
-
version: 1 resources: - type: node count: 4 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 1.2
-
Request a range of a type of resource
- Specific Example
-
Request between 3 and 30 nodes
- Existing Equivalents
-
Slurm
salloc -N3-30
- Jobspec YAML
-
version: 1 resources: - type: node count: min: 3 max: 30 operator: "+" operand: 1 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 1.3
-
Request M nodes with a minimum number of sockets per node and cores per socket
- Specific Example
-
Request 4 nodes with at least 2 sockets each, and 4 cores per socket
- Existing Equivalents
-
Slurm (a)
srun -N4 --sockets-per-node=2 --cores-per-socket=4
Slurm (b)
srun -N4 -B '2:4:*'
OAR
oarsub -l nodes=4/sockets=2/cores=4
- Jobspec YAML
-
version: 1 resources: - type: node count: 4 with: - type: socket count: 2 with: - type: core count: 4 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 1.4
-
Exclusively allocate nodes, while constraining cores and sockets.
- Specific Example
-
Request an exclusive allocation of 4 nodes that have at least two sockets and 4 cores per socket:
- Jobspec YAML
-
version: 1 resources: - type: slot with: - type: node count: 4 with: - type: socket count: 2 with: - type: core count: 4 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 1.5
-
Complex example from OAR
- Specific Example
-
ask for 1 core on 2 nodes on the same cluster with 4096 GB of memory and Infiniband 10G + 1 cpu on 2 nodes on the same switch with bicore processors for a walltime of 4 hours
- Existing Equivalents
-
OAR
oarsub -I -l "{memnode=4096 and ib10g='YES'}/cluster=1/nodes=2/core=1+{nbcore=2}/switch=1/nodes=2/cpu=1,walltime=4:0:0"
- Jobspec YAML
-
version: 1 resources: - type: cluster count: 1 with: - type: node count: 2 with: - type: memory count: 4 unit: GB - type: ib10g count: 4 - type: switch with: type: node count: 2 with: - type: core count: 1 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 4 hours
- Use Case 1.6
-
Request resources across multiple clusters
- Specific Example
-
Ask for 1 core on 15 nodes across 2 clusters (total = 30 cores)
- Existing Equivalents
-
OAR
oarsub -I -l /cluster=2/nodes=15/core=1
- Jobspec YAML
-
version: 1 resources: - type: cluster count: 2 with: - type: node count: 15 with: - type: core count: 1 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 1.7
-
Request N cores across M switches
- Specific Example
-
Request 3 cores across 3 switches
- Existing Equivalents
-
OAR
oarsub -I -l /switch=3/core=1
- Jobspec YAML
-
version: 1 resources: - type: switch count: 3 with: - type: core count: 1 tasks: - command: [ "flux", "start" ] slot: type: node count: per_slot: 1 attributes: system: duration: 1h
The following use cases include task specification in addition to resource request, demonstrating the use of task slot labels and types to relate tasks to resources.
- Use Case 2.1
-
Run N tasks across M nodes
- Specific Example
-
Run
hostname
20 times on 4 nodes, 5 per node - Existing Equivalents
-
Slurm
srun -N4 -n20 hostname
orsrun -N4 --ntasks-per-node=5 hostname
PBS
qsub -l nodes=4,mppnppn=5
- Jobspec YAML
-
version: 1 resources: - type: slot label: default count: 4 with: - type: node count: 1 tasks: - command: hostname slot: default count: per_slot: 5 attributes: system: duration: 1 hour
- Use Case 2.2
-
Run N tasks across M nodes, unequal distribution
- Specific Example
-
Run 5 copies of
hostname
across 4 nodes, default distribution - Existing Equivalents
-
Slurm
srun -n5 -N4 hostname
- Jobspec YAML
-
version: 1 resources: - type: node count: 4 tasks: - command: hostname slot: type: node count: total: 5 attributes: system: duration: 1 hour
or, with explicit task slots:
version: 1 resources: - type: slot count: 4 label: myslot with: - type: node count: 1 tasks: - command: hostname slot: label: myslot count: total: 5 attributes: system: duration: 1 hour
- Use Case 2.3
-
Run N tasks, Require M cores per task
- Specific Example
-
Run 10 copies of
myapp
, require 2 cores per copy, for a total of 20 cores - Existing Equivalents
-
Slurm
srun -n10 -c 2 myapp
- Jobspec YAML
-
version: 1 resources: - type: slot label: default count: 10 with: - type: core count: 2 tasks: - command: myapp slot: default count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 2.4
-
Run different binaries with differing resource requirements as single program
- Specific Example
-
11 tasks, one node, first 10 using one core and 4G of RAM for
read-db
, last using 6 cores and 24G of RAM fordb
- Existing Equivalents
-
None Known
- Jobspec YAML
-
version: 1 resources: - type: node with: - type: slot label: read-db count: 10 with: - type: core count: 1 - type: memory count: 4 unit: GB - type: slot label: db count: 1 with: - type: core count: 6 - type: memory count: 24 unit: GB tasks: - command: read-db slot: read-db count: per_slot: 1 - command: db slot: db count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 2.5
-
Run command requesting minimum amount of RAM per core
- Specific Example
-
Run 10 copies of
app
across 10 cores with at least 2GB per core - Existing Equivalents
-
Slurm
srun -n 10 --mem-per-cpu=2048 app
- Jobspec YAML
-
version: 1 resources: - type: slot label: default count: 10 with: - type: memory count: 2 unit: GB - type: core count: 1 tasks: - command: app slot: default count: per_slot: 1 attributes: system: duration: 1 hour
- Use Case 2.6
-
Run N copies of a command with minimum amount of RAM per node
- Specific Example
-
Run 10 copies of
app
across 2 nodes with at least 4GB per node - Existing Equivalents
-
Slurm
srun -n10 -N2 --mem=4096 app
OAR
oarsub -p memnode=4096 -l nodes=2 "taktuk -c oarsh -f $OAR_FILE_NODES broadcast exec [app]"
- Jobspec YAML
-
version: 1 resources: - type: slot label: 4GB-node count: 2 with: - type: node count: 1 with: - type: memory count: 4 unit: GB tasks: - command: app slot: 4GB-node count: total: 10 attributes: system: duration: 1 hour
-
[1]YAML Ain’t Markup Language (YAML) Version 1.1, O. Ben-Kiki, C. Evans, B. Ingerson, 2004.
-
[2]JSON Content Rules (jcr-version 0.6), A. Newton, P. Cordell, 2016
-
[3]Job Submission Description Language (JSDL) Specification, Version 1.0, Ali Anjomshoaa, et al., 2005