You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In certain scenarios, the resource set that the user requests in their jobspec may not be sufficient to execute the job. Consider the following examples:
the user requests slot[1]->gpu[1]. With current technology, at least one CPU core is required to execute a task on the gpu. Rather than rejecting the job at submission, the datacenter wants to automatically add the core as a quality-of-life improvement for the user.
the user requests slot[1]->core[1] and the datacenter wants to enforce that for each core the user gets, they also gets a certain amount of memory to prevent OOM'ing other jobs on a shared node.
the user requests 1TB of global storage on a burst buffer or IO device, but they also requested redundancy (e.g., duplication, RAID). So in order to deliver 1TB of usable capacity, the system actually needs a larger amount of space for the redundancy overhead.
the user requests 1TB of storage for an IO library like LABIOS or UnifyFS. In addition to the previous use-case: overallocating the IO resource for overheads; we also want to allocate additional compute resources for these libraries to perform their tasks (CC @Keith-Bateman).
In all of these examples, the R (concrete resource set) that the scheduler outputs will contain a superset of resources compared to the ones requested by the J (jobspec).
In a more extreme example, assume the user requests some amount of storage bandwidth. The scheduler then converts that request into some other resources (e.g., X TB of capacity spread across Y storage devices) and matches based on those. In this case, R isn't even a superset of J; it will contain resources not present in J and not contain some resources that are present in J.
After discussing with @dongahn and others, it seems like this could be handled at either the job submission time via front-end tool plugins (flux-framework/flux-core#2875) or at scheduling time by a scheduler match/select plugin. This issue is to track the design and implementation progress on the latter.
Advantages of doing it at scheduling time:
The same jobspec will be more portable across different systems (assuming the systems all have a match/select plugin that supports the relevant use-case)
Complications of doing it at scheduling time:
Since a direct 1:1 match is no longer occuring, R isn't necessarily going to be strictly equal to J (and in the last example it isn't even guaranteed to be a superset of J).
If we want to avoid large, monolithic scheduler plugins that must handle every possible use-case, we will need to figure out a way to compose multiple plugins together (one for each use-case)
Need to figure out how to handle the case where resources "expand" into different places in the hierarchy. Like in the last example about bandwidth->capacity. The bandwidth might be requested at the global level, but it gets satisfied by allocating many node-local SSDs. If this is done at match time, this could cause issues since we cannot control traversal of the jobspec or system resources at match time.
The text was updated successfully, but these errors were encountered:
In certain scenarios, the resource set that the user requests in their jobspec may not be sufficient to execute the job. Consider the following examples:
slot[1]->gpu[1]
. With current technology, at least one CPU core is required to execute a task on the gpu. Rather than rejecting the job at submission, the datacenter wants to automatically add the core as a quality-of-life improvement for the user.slot[1]->core[1]
and the datacenter wants to enforce that for each core the user gets, they also gets a certain amount of memory to prevent OOM'ing other jobs on a shared node.In all of these examples, the R (concrete resource set) that the scheduler outputs will contain a superset of resources compared to the ones requested by the J (jobspec).
In a more extreme example, assume the user requests some amount of storage bandwidth. The scheduler then converts that request into some other resources (e.g., X TB of capacity spread across Y storage devices) and matches based on those. In this case, R isn't even a superset of J; it will contain resources not present in J and not contain some resources that are present in J.
After discussing with @dongahn and others, it seems like this could be handled at either the job submission time via front-end tool plugins (flux-framework/flux-core#2875) or at scheduling time by a scheduler match/select plugin. This issue is to track the design and implementation progress on the latter.
Advantages of doing it at scheduling time:
Complications of doing it at scheduling time:
The text was updated successfully, but these errors were encountered: