Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Openmpi using new libfabric to stackinator options #210

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 71 additions & 4 deletions docs/recipes.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,10 +144,12 @@ For example, in the recipe below, only `netcdf-fortran` will be built with the `

### MPI

Stackinator can configure cray-mpich (CUDA, ROCM, or non-GPU aware) on a per-environment basis, by setting the `mpi` field in an environment.
Stackinator can configure cray-mpich (CUDA, ROCM, or non-GPU aware) or OpenMPI (with or without CUDA) on a per-environment basis, by setting the `mpi` field in an environment.

!!! note
Future versions of Stackinator will support OpenMPI, MPICH and MVAPICH when (and if) they develop robust support for HPE SlingShot 11 interconnect.
Future versions of Stackinator will fully support OpenMPI, MPICH and MVAPICH when (and if) they develop robust support for HPE SlingShot 11 interconnect.

Current OpenMPI support has been tested lightly and is not guaranteed to be production ready - only OpenMPI@5.x.x is supported (default is @5.0.6 at the time of writing - 2025.03.04) - CUDA is supported, ROCM has not yet been tested.

If the `mpi` field is not set, or is set to `null`, MPI will not be configured in an environment:
```yaml title="environments.yaml: no MPI"
Expand All @@ -157,12 +159,18 @@ serial-env:
```

To configure MPI without GPU support, set the `spec` field with an optional version:
```yaml title="environments.yaml: MPI without GPU support"
```yaml title="environments.yaml: Cray MPICH without GPU support"
host-env:
mpi:
spec: cray-mpich@8.1.23
# ...
```
```yaml title="environments.yaml: OpenMPI without GPU support"
host-env:
mpi:
spec: openmpi
# ...
```

GPU-aware MPI can be configured by setting the optional `gpu` field to specify whether to support `cuda` or `rocm` GPUs:
```yaml title="environments.yaml: GPU aware MPI"
Expand All @@ -176,7 +184,66 @@ rocm-env:
spec: cray-mpich
gpu: rocm
# ...
ompi-cuda-env:
mpi:
spec: openmpi
gpu: cuda
# ...
```
#### Experimental libfabric 2.x support with cray-mpich

HPE has open-sourced the libfabric/cxi provider (and related drivers) and these can be built into cray-mpich by adding a dependency to a newer version of libfabric.
The system default is libfabric@1.15.2 - which can be changed by adding a `depends` field to the yaml. A non default version (newer than 1.15.2) will trigger a build of libfabric using libcxi, cxi-driver, cassini-headers).

This syntax appplies to both mpich and openmpi builds.
```yaml title="environments1.yaml: cray-mpich using new libfabric/cxi stack"
mpich-cxi-env:
mpi:
spec: cray-mpich
gpu: cuda
depends: [libfabric@main]
# on release of libfabric@2, we recommended using @2 in preference to @main ...
```
```yaml title="environments2.yaml: openmpi using new libfabric/cxi stack"
openmpi-cxi-env:
mpi:
spec: openmpi
gpu: cuda
depends: [libfabric@main]
# on release of libfabric@2, we recommended using @2 in preference to @main ...
```
!!! note
Currently the performance of OpenMPI on Alps clusters might not be optimal and work is ongoing to fine tune it especially for intra-node performance.

#### Custom MPI builds

If an experimental version of OpenMPI is required, the yaml syntax supports additional options to enable this. the `xspec` tag may be used to supply extra spack `spec` options.
To illustrate usage, consider this example: a build of openmpi using a particular git branch named `mpi-continue-5.0.6` which in this case has a variant `+continuations` that is not available on the `main` or released branches.
The `xspec` tag allows the user to supply arbitrary spack variants and options that replace the defaults used by stackinator in its absence.
```yaml title="custom-openmpi.yaml: custom build of openmpi"
openmpi-custom-env:
mpi:
spec: openmpi@git.mpi-continue-5.0.6=main
xspec: +continuations +internal-pmix fabrics=cma,ofi,xpmem schedulers=slurm +cray-xpmem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xspec: +continuations +internal-pmix fabrics=cma,ofi,xpmem schedulers=slurm +cray-xpmem
variants: +continuations +internal-pmix fabrics=cma,ofi,xpmem schedulers=slurm +cray-xpmem

? C.f. https://github.com/eth-cscs/alps-cluster-config/pull/23/files#r1993297476.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xspec is a new keyword @biddisco introduced for this specific use case, not to be confused with the variants keyword used in the cluster configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't make myself clear (at all; I thought I did in another comment somewhere but I've confused myself with the two PRs):

Can we please call it something other than xspec? I wasn't saying this is a bug. I think xspec is a strange name. It's not a spec, and x is too ambiguous in my opinion. If we want to keep the extra meaning of x, why not something like extra_variants? This is specifying additional variants for the chosen MPI package if I understand correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to the comment on the other PR makes sort of the same point that I wanted to make here, but for another field...

gpu: cuda
depends: ["libfabric@main"]
```
In this example, we must tell spack to fetch our custom git branch from a repo that is different from the default openmpi github version, by adding the following to our recipe `packages.yaml`
```yaml title="custom-packages.yaml: custom repo for openmpi"
# mpi-continue-5.0.6 branch available from here
openmpi:
package_attributes:
git: https://github.com/your-username/ompi
```

!!! note
To build using a specific git commit, use the syntax
```
spec: openmpi@git.9a4079916dd13d4190fe224102b57757789c13da=main
```
It is therefore possible to build arbitrary versions of MPI using custom options/branches etc using these combinations of settings.

### Version info

!!! alps

Expand Down Expand Up @@ -207,7 +274,7 @@ The list of software packages to install is configured in the `spec:` field of a
The `deprecated: ` field controls if Spack should consider versions marked as deprecated, and can be set to `true` or `false` (for considering or not considering deprecated versions, respectively).

The `unify:` field controls the Spack concretiser, and can be set to three values `true`, `false` or `when_possible`.
The
The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The

?


```yaml
cuda-env:
Expand Down
60 changes: 58 additions & 2 deletions stackinator/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,21 +353,77 @@ def generate(self, recipe):
with dst.open("w") as fid:
fid.write(cache.generate_mirrors_yaml(recipe.mirror))

packages_data = {}
# append network packages to packages.yaml
network_config_path = system_config_path / "network.yaml"
if not network_config_path.is_file():
raise FileNotFoundError(f"The network configuration file '{network_config_path}' does not exist")
with network_config_path.open() as fid:
network_config = yaml.load(fid, Loader=yaml.Loader)
packages_data.update(network_config.get("packages", {}))

# append recipe packages to packages.yaml
if recipe.packages:
system_packages = system_config_path / "packages.yaml"
packages_data = {}
if system_packages.is_file():
# load system yaml
with system_packages.open() as fid:
raw = yaml.load(fid, Loader=yaml.Loader)
packages_data = raw["packages"]
packages_data.update(raw["packages"])
packages_data.update(recipe.packages["packages"])
packages_yaml = yaml.dump({"packages": packages_data})
packages_path = config_path / "packages.yaml"
with packages_path.open("w") as fid:
fid.write(packages_yaml)

# validate the recipe mpi selection
for name, config in recipe.environments.items():
# config[mpi] holds the user specified settings in environment.yaml
if config["mpi"]:
mpi = config["mpi"]
user_mpi_spec = mpi["spec"]
user_mpi_gpu = mpi["gpu"]
user_mpi_xspec = mpi["xspec"] if "xspec" in mpi else None
user_mpi_deps = mpi["depends"] if "depends" in mpi else None
self._logger.debug(
f"User MPI selection: spec={user_mpi_spec}, gpu={user_mpi_gpu}, "
f"xspec={user_mpi_xspec}, deps={user_mpi_deps}"
)

if user_mpi_spec:
try:
mpi_impl, mpi_ver = user_mpi_spec.strip().split(sep="@", maxsplit=1)
except ValueError:
mpi_impl = user_mpi_spec.strip()
mpi_ver = None

# network_config holds the system specified settings in cluster config / network.yaml
if mpi_impl in network_config["mpi_supported"]:
default_ver = network_config[mpi_impl]["version"]
default_spec = network_config[mpi_impl]["spec"] if "spec" in network_config[mpi_impl] else ""
default_deps = (
network_config[mpi_impl]["depends"] if "depends" in network_config[mpi_impl] else ""
)
self._logger.debug(
f"System MPI selection: spec={mpi_impl}@{default_ver} {default_spec}, deps={default_deps}"
)

# select users versions or the default versions if user did not specify
version_opt = f"@{mpi_ver or default_ver}"
# create full spec based on user provided spec or default spec
spec_opt = f"{mpi_impl}{version_opt} {user_mpi_xspec or default_spec}"
if user_mpi_gpu and user_mpi_gpu not in spec_opt:
spec_opt = f"{spec_opt} +{user_mpi_gpu}"
deps_opt = user_mpi_deps or default_deps
for dep in deps_opt:
spec_opt = f"{spec_opt} ^{dep}"
self._logger.debug(f"Final MPI selection: spec={spec_opt}")

recipe.environments[name]["specs"].append(spec_opt)
else:
# TODO: Create a custom exception type
raise Exception(f"Unsupported mpi: {mpi_impl}")

# Add custom spack package recipes, configured via Spack repos.
# Step 1: copy Spack repos to store_path where they will be used to
# build the stack, and then be part of the upstream provided
Expand Down
47 changes: 7 additions & 40 deletions stackinator/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,6 @@


class Recipe:
valid_mpi_specs = {
"cray-mpich": (None, None),
"mpich": ("4.1", "device=ch4 netmod=ofi +slurm"),
"mvapich2": (
"3.0a",
"+xpmem fabrics=ch4ofi ch4_max_vcis=4 process_managers=slurm",
),
"openmpi": ("5", "+internal-pmix +legacylaunchers +orterunprefix fabrics=cma,ofi,xpmem schedulers=slurm"),
}

@property
def path(self):
Expand Down Expand Up @@ -280,43 +271,19 @@ def generate_environment_specs(self, raw):
for _, config in environments.items():
config["exclude_from_cache"] = ["cuda", "nvhpc", "perl"]

# check the environment descriptions and ammend where features are missing
# check the environment descriptions and amend where features are missing
for name, config in environments.items():
if ("specs" not in config) or (config["specs"] is None):
environments[name]["specs"] = []

if "mpi" not in config:
environments[name]["mpi"] = {"spec": None, "gpu": None}
environments[name]["mpi"] = {"spec": None, "gpu": None, "network": {"spec": None}}

# complete configuration of MPI in each environment
for name, config in environments.items():
if config["mpi"]:
mpi = config["mpi"]
mpi_spec = mpi["spec"]
mpi_gpu = mpi["gpu"]
if mpi_spec:
try:
mpi_impl, mpi_ver = mpi_spec.strip().split(sep="@", maxsplit=1)
except ValueError:
mpi_impl = mpi_spec.strip()
mpi_ver = None

if mpi_impl in Recipe.valid_mpi_specs:
default_ver, options = Recipe.valid_mpi_specs[mpi_impl]
if mpi_ver:
version_opt = f"@{mpi_ver}"
else:
version_opt = f"@{default_ver}" if default_ver else ""

spec = f"{mpi_impl}{version_opt} {options or ''}".strip()

if mpi_gpu:
spec = f"{spec} +{mpi_gpu}"

environments[name]["specs"].append(spec)
else:
# TODO: Create a custom exception type
raise Exception(f"Unsupported mpi: {mpi_impl}")
if "network" not in config["mpi"]:
environments[name]["mpi"]["network"] = {"spec": ""}

# we have not loaded the system configs yet, so mpi information will be generated
# during the builder phase. We will validate the mpi information then.

# set constraints that ensure the the main compiler is always used to build packages
# that do not explicitly request a compiler.
Expand Down
13 changes: 9 additions & 4 deletions stackinator/schema/environments.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,14 @@
"gpu": {
"enum": ["cuda", "rocm", null, false],
"default": null
}
}
},
"xspec": {"type": "string"},
"depends": {
"type": "array",
"items": {"type": "string"},
"default": []
}
}
},
{"enum": [null, false]}
],
Expand Down Expand Up @@ -87,7 +93,7 @@
},
"uenv": {
"type": "object",
"additionalPropertis": false,
"additionalProperties": false,
"properties": {
"add_compilers": { "type": "boolean" },
"prefix_paths": {
Expand All @@ -109,4 +115,3 @@
}
}
}

2 changes: 1 addition & 1 deletion unittests/test_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def test_environments_yaml(yaml_path):
# test defaults were set correctly
assert env["unify"] == "when_possible"
assert env["packages"] == ["perl", "git"]
assert env["mpi"] == {"spec": "cray-mpich", "gpu": "cuda"}
assert env["mpi"] == {"depends": [], "spec": "cray-mpich", "gpu": "cuda"}
assert env["variants"] == ["+mpi", "+cuda"]
assert env["views"] == {"default": None}

Expand Down
Loading