Add minimum changes to support containers #8

cjh1 · 2025-12-01T16:23:53Z

This PR add some properties to the JobSpec to allow containerize jobs to be run.

juztas · 2025-12-02T00:17:50Z

Hi, Just my observation and few comments. For container runtime, the current JobSpec is minimal for an image and volume mounts, and there might be a need for more options to express for container runtime, like --mpi , --nv/--gpu , --network host ….(just a wild guess based on my experience with physicists, not based on AmSC requirements). Would it be more valuable to have it like this:

class VolumeMount(BaseModel):
    source: str
    target: str
    read_only: bool = True

class ContainerRuntime(BaseModel):
    image: str | None = None
    network_mode: str | host = host
    mpi: bool | False = False
    gpu: bool | False - False
    volume_mounts: list[VolumeMount] = []
    ... expand as needed, required in the future

class JobSpec(BaseModel):
    executable : str | None = None
    container_runtime : ContainerRuntime | None = None

Kind of, rather than pushing these flags directly in JobSpec, the API could introduce a dedicated ContainerRuntime model and separate container-related configurations.

It also raises additional questions for facilities and IRI Interface implementation (how this would work in practise between all facilities), as each facility might use a different container runtime (Docker, Apptainer, Podman…), and not everyone allows fully privileged containers (just my guess). How are these capabilities exposed (container runtime, flags supported) and who does the "heavy lifting" to translate container parameters to facility container runtime. Is it IRI Interface or is it left for the end-user to identify each facilities capabilities and make changes as required to run jobs.

cjh1 · 2025-12-02T15:28:40Z

Hi, Just my observation and few comments. For container runtime, the current JobSpec is minimal for an image and volume mounts, and there might be a need for more options to express for container runtime, like --mpi , --nv/--gpu , --network host ….(just a wild guess based on my experience with physicists, not based on AmSC requirements). Would it be more valuable to have it like this:
class VolumeMount(BaseModel):
    source: str
    target: str
    read_only: bool = True

class ContainerRuntime(BaseModel):
    image: str | None = None
    network_mode: str | host = host
    mpi: bool | False = False
    gpu: bool | False - False
    volume_mounts: list[VolumeMount] = []
    ... expand as needed, required in the future

class JobSpec(BaseModel):
    executable : str | None = None
    container_runtime : ContainerRuntime | None = None
Kind of, rather than pushing these flags directly in JobSpec, the API could introduce a dedicated ContainerRuntime model and separate container-related configurations.

Separating the configuration into into a separate container specific object is a good idea, however, I think we need to be careful to avoid exposing too much as we need to allow for sites to implement the interface, so it really needs to be the lowest common denominator that the container runtimes used across the different sites can support. For example I didn't expose the network configuration as I was thinking that we should just default to the host. For MPI and GPU configuration, I would say that these options could be enabled if the job spec dictated that they where necessary, to avoid duplicating configuration.

It also raises additional questions for facilities and IRI Interface implementation (how this would work in practise between all facilities), as each facility might use a different container runtime (Docker, Apptainer, Podman…), and not everyone allows fully privileged containers (just my guess). How are these capabilities exposed (container runtime, flags supported) and who does the "heavy lifting" to translate container parameters to facility container runtime. Is it IRI Interface or is it left for the end-user to identify each facilities capabilities and make changes as required to run jobs.

Yes, as I said above, we need to expose a very minimal subset of container functionality, so it can be implemented successfully across sites. I see this interface as a subset of container functionality rather than as superset of all container runtime options. We could also provide a site specific "extra container options" property, as an escape hatch that would allow sites to support more advanced options, but these would not necessarily be supported across all sites.

Add minimum changes to support containers

9d715b7

cjh1 force-pushed the containers branch 2 times, most recently from afd294d to 205b127 Compare December 4, 2025 19:02

Add documentation

b5bdf4a

cjh1 force-pushed the containers branch from 205b127 to b5bdf4a Compare December 4, 2025 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add minimum changes to support containers #8

Add minimum changes to support containers #8

Uh oh!

cjh1 commented Dec 1, 2025

Uh oh!

juztas commented Dec 2, 2025

Uh oh!

cjh1 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add minimum changes to support containers #8

Are you sure you want to change the base?

Add minimum changes to support containers #8

Uh oh!

Conversation

cjh1 commented Dec 1, 2025

Uh oh!

juztas commented Dec 2, 2025

Uh oh!

cjh1 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants