Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[foss/2023a] UCC-CUDA 1.2.0 w/ CUDA 12.1.1 (rebuild) #750

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

Copy link

eessi-bot bot commented Sep 25, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Sep 25, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

@casparvl casparvl added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Sep 25, 2024
@casparvl casparvl changed the title UCC CUDA rebuild now that we have an accel prefix {2023.06}[foss/2023a] UCC 1.2.0 w/ CUDA 12.1.1 (rebuild) Sep 25, 2024
@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

Copy link

eessi-build-deploy-bot-deucalion bot commented Sep 25, 2024

Updates by the bot instance boegel-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Sep 25, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_750/19874

date job status comment
Sep 25 21:32:16 UTC 2024 submitted job id 19874 awaits release by job manager
Sep 25 21:32:34 UTC 2024 released job awaits launch by Slurm scheduler
Sep 25 21:37:59 UTC 2024 running job 19874 is running
Sep 25 21:57:34 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-19874.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727300676.tar.gzsize: 0 MiB (447423 bytes)
entries: 29
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Sep 25 21:57:34 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-19874.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 26 09:14:59 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1727300676.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Copy link

eessi-bot bot commented Sep 26, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Sep 26, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Sep 26, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_750/20002

date job status comment
Sep 26 08:06:49 UTC 2024 submitted job id 20002 awaits release by job manager
Sep 26 08:07:53 UTC 2024 released job awaits launch by Slurm scheduler
Sep 26 08:13:34 UTC 2024 running job 20002 is running
Sep 26 08:30:04 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-20002.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727338704.tar.gzsize: 0 MiB (447402 bytes)
entries: 29
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Sep 26 08:30:04 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-20002.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 26 09:15:19 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1727338704.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

The one from the zen2 tree looks good:

[casparvl@login1 1.2.0-GCCcore-12.3.0-CUDA-12.1.1]$ readelf -d lib64/ucc/libucc_mc_cuda.so | grep RPATH | grep CUDA
 0x000000000000000f (RPATH)              Library rpath: [/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/UCC/1.2.0-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/numactl/2.0.16-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/UCX/1.14.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/UCC/1.2.0-GCCcore-12.3.0/lib]

@boegel boegel changed the title {2023.06}[foss/2023a] UCC 1.2.0 w/ CUDA 12.1.1 (rebuild) {2023.06}[foss/2023a] UCC-CUDA 1.2.0 w/ CUDA 12.1.1 (rebuild) Sep 26, 2024
@casparvl
Copy link
Collaborator Author

Also looking good in zen3:

[casparvl@login1 1.2.0-GCCcore-12.3.0-CUDA-12.1.1]$ readelf -d lib64/ucc/libucc_mc_cuda.so | grep RPATH | grep CUDA
 0x000000000000000f (RPATH)              Library rpath: [/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/UCC/1.2.0-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/CUDA/12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GCCcore/12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GCCcore/12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/numactl/2.0.16-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/UCX/1.14.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/UCC/1.2.0-GCCcore-12.3.0/lib]

@casparvl casparvl added the bot:deploy Ask bot to deploy missing software installations to EESSI label Sep 26, 2024

Label bot:deploy has been set by user casparvl, but this person does not have permission to trigger deployments

@casparvl
Copy link
Collaborator Author

Just checked, it's been deployed, so this PR can be merged (by someone not me :D)

@ocaisa ocaisa merged commit 5d9d33b into EESSI:2023.06-software.eessi.io Sep 26, 2024
35 checks passed
Copy link

eessi-bot bot commented Sep 26, 2024

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.09/pr_750/19874', '/project/def-users/SHARED/jobs/2024.09/pr_750/20002'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.26

PR merged! Moved [] to $HOME/trash_bin/EESSI/software-layer/2024.09.26

Copy link

eessi-bot bot commented Sep 26, 2024

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants