Skip to content

Conversation

@w1ldptr
Copy link
Contributor

@w1ldptr w1ldptr commented May 30, 2025

Set default GDS path to point to CUDA install directory and use point lib and include to symlinks instead of hardcoding architecture-specific values.

Currently build scripts and Docker build files assume that the container is built on x86_64 Linux host (by not specifying a platform which makes docker default to host architecture, hardcoding 'x86_64' in several places, etc.), which makes it impossible to create either x86_64 or aarch64 container on Arm host or aarch64 container on x86 host.

Configure target architecture via 'ARCH' docker variable. Set it to x86 by default in docker files (for any users that use it directly) and to host architecture in build-container.sh and nixlbench build.sh. Allow user to specify ARCH value via '--arch' CLI parameter, set docker build platform value accordingly and pass the value to docker build as a build arg.

Fixes NIX-14

w1ldptr added 3 commits May 30, 2025 17:38
CUDA provides symlinks to correct GDS lib and include directories so there
is no need to hardcode full path that is architecture-dependent. Set
default GDS path to point to CUDA install directory and use 'include' and
'lib64' symlinks.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Currently the build script and Docker build file assume that the container
is built on x86_64 Linux host (by not specifying a platform which makes
docker default to host architecture, hardcoding 'x86_64' in several places,
etc.), which makes it impossible to create either x86_64 or aarch64
container on Arm host or aarch64 container on x86 host.

Configure target architecture via 'ARCH' docker variable. Set it to x86 by
default in Dockerfile (for any users that use it directly) and to host
architecture in build-container.sh. Allow user to specify ARCH value via
'--arch' CLI parameter, set docker build platform value accordingly and
pass the value to docker build as a build arg.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Similar to nixl library container infrastructure nixlbench also does not
specify a docker build platform, hardcodes 'x86_64' in several places and
assumes x86 manylinux platform which makes it impossible to build anything
on aarch64 host or for aarch64 target.

Configure target architecture via 'ARCH' docker variable. Set it to x86 by
default in nixlbench Dockerfile (for any users that use it directly) and to
host architecture in build-container.sh. Allow user to specify ARCH value
via '--arch' CLI parameter, set docker build platform value accordingly and
pass the value to docker build as a build arg.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
@github-actions
Copy link

👋 Hi w1ldptr! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@iyastreb
Copy link
Contributor

Few hours ago I fixed the same issue: #407

@w1ldptr
Copy link
Contributor Author

w1ldptr commented May 30, 2025

Few hours ago I fixed the same issue: #407

@iyastreb @brminich this also fixes nixl container build besides nixlbench, allows container crossbuild (i.e. allows user to build x86 in Arm and vice versa) and properly solves the GDS path issue by leveraging symlinks instead of continuing to rely on arch-specific paths.

@iyastreb
Copy link
Contributor

@iyastreb @brminich this also fixes nixl container build besides nixlbench, allows container crossbuild (i.e. allows user to build x86 in Arm and vice versa) and properly solves the GDS path issue by leveraging symlinks instead of continuing to rely on arch-specific paths.

Sure, I like your approach, let's sync up on Monday to remove the common part.
We can maybe pick up IB related changes from my PR and apply here, or commit as 2 independent PRs

@brminich brminich merged commit 2a88836 into ai-dynamo:main Jun 2, 2025
9 checks passed
@w1ldptr w1ldptr deleted the gds-path-fix branch June 11, 2025 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build on arm looks up GDS headers in the x86_64 system path

3 participants