Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Woptim] Shared CI 2023.12.0 and rocm 5.7.1 #864

Merged
merged 36 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
9524c8d
Update RADIUSS Shared CI: add poodle machine
adrienbernede Oct 30, 2023
35e079a
Increase allocated time on poodle
adrienbernede Oct 30, 2023
8d17ee2
Comment alloc command choices
adrienbernede Nov 13, 2023
ce3cbb7
Fine tune allocation duration in CI
adrienbernede Nov 13, 2023
50397d9
Fine tune allocation duration in CI
adrienbernede Nov 22, 2023
6538c86
Fix RSC version: do not use pushed packages here
adrienbernede Nov 23, 2023
254abad
Update to RADIUSS Shared CI 2023.12.0
adrienbernede Dec 8, 2023
29c41c3
Update RSC with rocm 5.7.1
adrienbernede Dec 18, 2023
ff73645
From RSC : add missing rocm compilers
adrienbernede Dec 18, 2023
3eb60db
From RSCI: Fix reproducer syntax
adrienbernede Dec 19, 2023
9e3bde4
From RSC: switch to rocm 5.7.0 on tioga
adrienbernede Jan 9, 2024
7919803
From RSC: Add and switch to rocm@6.0.0 + add and switch to cce@16.0.1…
adrienbernede Jan 9, 2024
47186a8
From RSC: Include cce@16.0.1 in test
adrienbernede Jan 10, 2024
5c3af06
Downgrade rocmcc in CI on corona (no 5.7.1 on corona)
adrienbernede Jan 10, 2024
9aead0a
From RSC: Attempt to use Spack syntax to compare version
adrienbernede Jan 10, 2024
3b76587
From RSC: revert to old school way
adrienbernede Jan 11, 2024
ab41d9c
Remove time restrictions at job level, shared allocation should suffice
adrienbernede Jan 11, 2024
4ea5d07
Update to 5.7.0 on corona only, stick with 5.6.0 on tioga
adrienbernede Jan 11, 2024
d30f410
Fix: rocm 5.6.0 -> 5.6.1
adrienbernede Jan 11, 2024
05c9aa1
Back to rocm 5.7.1 on tioga
adrienbernede Jan 12, 2024
cb747e0
From RSC: Remove amdgpu arch hicc flags
adrienbernede Jan 12, 2024
20c5097
Merge branch 'develop' into woptim/rocm-5-7-1
adrienbernede Jan 19, 2024
3319cb5
Update RSC to main
adrienbernede Jan 19, 2024
c9f5ae5
Update Spack to 0.21.1
adrienbernede Jan 19, 2024
20debc1
From RSC: use c++ 17 to build fmt
adrienbernede Jan 22, 2024
eed0cb8
From RSC: Restrict c++ 17 requirement to intel
adrienbernede Jan 22, 2024
053d968
From RSC: 3 way amd gpu arch
adrienbernede Jan 22, 2024
3a634dc
Use CI queue on Lassen
adrienbernede Jan 22, 2024
d8f7a39
Apply CI queue to custom allocs in Umpire jobs on lassen
adrienbernede Jan 22, 2024
976c4a0
Update to RSC@main
adrienbernede Jan 22, 2024
15a0758
Fix RSC: Umpire conditionnally depends on fmt
adrienbernede Jan 22, 2024
442a04c
From RSC: Remove rocm 6.0.0 suite to prevent Spack from mixing it wit…
adrienbernede Jan 23, 2024
343d80e
Update Spack to develop-2024-01-21 to support hip 5.7.1 suite
adrienbernede Jan 24, 2024
183732c
Update RSC with updates in Spack config.yaml file
adrienbernede Jan 30, 2024
85582eb
Update RSC to main
adrienbernede Jan 30, 2024
2c3e830
Merge branch 'develop' into woptim/rocm-5-7-1
adrienbernede Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ stages:
include:
- local: '.gitlab/custom-jobs-and-variables.yml'
- project: 'radiuss/radiuss-shared-ci'
ref: 'v2023.09.0'
ref: 'v2023.12.1'
file: 'pipelines/${CI_MACHINE}.yml'
- artifact: '${CI_MACHINE}-jobs.yml'
job: 'generate-job-lists'
Expand All @@ -82,7 +82,7 @@ stages:
include:
# [Optional] checks preliminary to running the actual CI test
#- project: 'radiuss/radiuss-shared-ci'
# ref: 'v2023.09.0'
# ref: 'v2023.12.1'
# file: 'preliminary-ignore-draft-pr.yml'
# pipelines subscribed by the project
- local: '.gitlab/subscribed-pipelines.yml'
25 changes: 18 additions & 7 deletions .gitlab/custom-jobs-and-variables.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,40 @@ variables:

# Ruby
# Arguments for top level allocation
RUBY_SHARED_ALLOC: "--exclusive --reservation=ci --qos=ci_ruby --time=10 --nodes=1"
RUBY_SHARED_ALLOC: "--exclusive --reservation=ci --time=10 --nodes=2"
# Arguments for job level allocation
RUBY_JOB_ALLOC: "--overlap --reservation=ci --qos=ci_ruby --time=10 --nodes=1"
# Note: We repeat the reservation, necessary when jobs are manually re-triggered.
RUBY_JOB_ALLOC: "--overlap --reservation=ci --nodes=1"
# Project specific variants for ruby
PROJECT_RUBY_VARIANTS: "~shared +fortran +tools tests=basic "
# Project specific deps for ruby
PROJECT_RUBY_DEPS: ""

# Poodle
# Arguments for top level allocation
POODLE_SHARED_ALLOC: "--exclusive --partition=pdebug --time=8 --nodes=1"
# Arguments for job level allocation
POODLE_JOB_ALLOC: "--overlap --nodes=1"
# Project specific variants for poodle
PROJECT_POODLE_VARIANTS: "~shared +fortran +tools tests=basic"
# Project specific deps for poodle
PROJECT_POODLE_DEPS: ""

# Corona
# Arguments for top level allocation
CORONA_SHARED_ALLOC: "--exclusive --time-limit=15m --nodes=1"
CORONA_SHARED_ALLOC: "--exclusive --time-limit=10m --nodes=1"
# Arguments for job level allocation
CORONA_JOB_ALLOC: "--time-limit=10m --nodes=1 --begin-time=+5s"
CORONA_JOB_ALLOC: "--nodes=1 --begin-time=+5s"
# Project specific variants for corona
PROJECT_CORONA_VARIANTS: "~shared +fortran +device_alloc tests=basic "
# Project specific deps for corona
PROJECT_CORONA_DEPS: ""

# Tioga
# Arguments for top level allocation
TIOGA_SHARED_ALLOC: "--exclusive --time-limit=20m --nodes=1"
TIOGA_SHARED_ALLOC: "--exclusive --time-limit=15m --nodes=1"
# Arguments for job level allocation
TIOGA_JOB_ALLOC: "--time-limit=15m --nodes=1 --begin-time=+5s"
TIOGA_JOB_ALLOC: "--nodes=1 --begin-time=+5s"
# Project specific variants for tioga
PROJECT_TIOGA_VARIANTS: "~shared +fortran +device_alloc tests=basic "
# Project specific deps for tioga
Expand All @@ -45,7 +56,7 @@ variables:
# Lassen and Butte use a different job scheduler (spectrum lsf) that does not
# allow pre-allocation the same way slurm does.
# Arguments for job level allocation
LASSEN_JOB_ALLOC: "1 -W 18"
LASSEN_JOB_ALLOC: "1 -W 10 -q pci"
# Project specific variants for lassen
PROJECT_LASSEN_VARIANTS: "~shared +fortran +tools tests=basic "
# Project specific deps for lassen
Expand Down
11 changes: 9 additions & 2 deletions .gitlab/jobs/corona.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@
# SPDX-License-Identifier: (MIT)
###############################################################################

# Override reproducer section to define Umpire specific variables.
.corona_reproducer_vars:
script:
- |
echo -e "export MODULE_LIST=\"${MODULE_LIST}\""
echo -e "export SPEC=\"${SPEC//\"/\\\"}\""

########################
# Overridden shared jobs
########################
Expand All @@ -25,8 +32,8 @@
# This job intentionally tests our umpire package.py because although this job does not
# explicitly have the ~tools, the package.py should still disable tools from being built.
###
rocmcc_5_6_1_hip_openmp_device_alloc:
rocmcc_5_7_0_hip_openmp_device_alloc:
variables:
SPEC: "~shared +fortran +openmp +rocm +device_alloc tests=basic amdgpu_target=gfx906 %rocmcc@5.6.1 ^hip@5.6.1"
SPEC: "~shared +fortran +openmp +rocm +device_alloc tests=basic amdgpu_target=gfx906 %rocmcc@5.7.0 ^hip@5.7.0"
extends: .job_on_corona

16 changes: 16 additions & 0 deletions .gitlab/jobs/lassen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,28 @@
# SPDX-License-Identifier: (MIT)
###############################################################################

# Override reproducer section to define Umpire specific variables.
.lassen_reproducer_vars:
script:
- |
echo -e "export MODULE_LIST=\"${MODULE_LIST}\""
echo -e "export SPEC=\"${SPEC//\"/\\\"}\""

########################
# Overridden shared jobs
########################
# We duplicate the shared jobs description and add necessary changes for RAJA.
# We keep ${PROJECT_<MACHINE>_VARIANTS} and ${PROJECT_<MACHINE>_DEPS} So that
# the comparison with the original job is easier.

# Overriden to increase allocation
xl_2022_08_19_gcc_8_3_1_cuda_11_2_0:
variables:
SPEC: "${PROJECT_LASSEN_VARIANTS} +cuda %xl@16.1.1.12.gcc.8.3.1 ^cuda@11.2.0+allow-unsupported-compilers ${PROJECT_LASSEN_DEPS}"
MODULE_LIST: "cuda/11.2.0"
LASSEN_JOB_ALLOC: "1 -W 20 -q pci"
extends: .job_on_lassen


############
# Extra jobs
Expand Down Expand Up @@ -92,4 +107,5 @@ xl_2022_08_19_gcc_8_3_1_cuda_11_2_tpls:
variables:
SPEC: "~shared +fortran +cuda +tools tests=basic %xl@16.1.1.12.gcc.8.3.1 ^cuda@11.7.0+allow-unsupported-compilers"
MODULE_LIST: "cuda/11.7.0"
LASSEN_JOB_ALLOC: "1 -W 20 -q pci"
extends: .job_on_lassen
60 changes: 60 additions & 0 deletions .gitlab/jobs/poodle.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
###############################################################################
# Copyright (c) 2022-23, Lawrence Livermore National Security, LLC and RADIUSS
# project contributors. See the COPYRIGHT file for details.
#
# SPDX-License-Identifier: (MIT)
###############################################################################

# Override reproducer section to define Umpire specific variables.
.poodle_reproducer_vars:
script:
- |
echo -e "export MODULE_LIST=\"${MODULE_LIST}\""
echo -e "export SPEC=\"${SPEC//\"/\\\"}\""

########################
# Overridden shared jobs
########################
# We duplicate the shared jobs description and add necessary changes for RAJA.
# We keep ${PROJECT_<MACHINE>_VARIANTS} and ${PROJECT_<MACHINE>_DEPS} So that
# the comparison with the original job is easier.

# Allow failure due to compiler internal error building wrapfumpire.f
intel_2022_1_0:
variables:
SPEC: "${PROJECT_RUBY_VARIANTS} %intel@2022.1.0 ${PROJECT_RUBY_DEPS}"
extends: .job_on_poodle
allow_failure: true

############
# Extra jobs
############
# We do not recommend using ${PROJECT_<MACHINE>_VARIANTS} and
# ${PROJECT_<MACHINE>_DEPS} in the extra jobs. There is not reason not to fully
# describe the spec here.

gcc_10_3_1_numa:
variables:
SPEC: "~shared +fortran +numa +tools tests=basic %gcc@10.3.1"
extends: .job_on_poodle

clang_14_0_6_gcc_10_3_1_sqlite_experimental:
variables:
SPEC: "~shared +sqlite_experimental +tools tests=basic %clang@14.0.6.gcc.10.3.1"
extends: .job_on_poodle

# Develop builds against specific tpl version.
clang_14_0_6_gcc_10_3_1_tpls:
variables:
SPEC: "~shared +fortran +tools tests=basic %clang@14.0.6.gcc.10.3.1"
extends: .job_on_poodle

gcc_10_3_1_tpls:
variables:
SPEC: "~shared +fortran +tools tests=basic %gcc@10.3.1"
extends: .job_on_poodle

gcc_10_3_1_ipc_no_mpi:
variables:
SPEC: "~shared +ipc_shmem tests=basic %gcc@10.3.1"
extends: .job_on_poodle
19 changes: 9 additions & 10 deletions .gitlab/jobs/ruby.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
###############################################################################
# Copyright (c) 2022, Lawrence Livermore National Security, LLC and RADIUSS
# Copyright (c) 2022-23, Lawrence Livermore National Security, LLC and RADIUSS
# project contributors. See the COPYRIGHT file for details.
#
# SPDX-License-Identifier: (MIT)
###############################################################################

# Override reproducer section to define UMPIRE specific variables.
.ruby_reproducer_vars:
script:
- |
echo -e "export MODULE_LIST=\"${MODULE_LIST}\""
echo -e "export SPEC=\"${SPEC//\"/\\\"}\""

########################
# Overridden shared jobs
########################
Expand All @@ -15,7 +22,7 @@
# Allow failure due to compiler internal error building wrapfumpire.f
intel_2022_1_0:
variables:
SPEC: "~shared +fortran +tools tests=basic %intel@2022.1.0"
SPEC: "${PROJECT_RUBY_VARIANTS} %intel@2022.1.0 ${PROJECT_RUBY_DEPS}"
extends: .job_on_ruby
allow_failure: true

Expand Down Expand Up @@ -51,11 +58,3 @@ gcc_10_3_1_ipc_no_mpi:
variables:
SPEC: "~shared +ipc_shmem tests=basic %gcc@10.3.1"
extends: .job_on_ruby

# Oneapi is not available on ruby@toss4 (rhel8)
## We deactivate this job as it is known to fail with Umpire: needs gcc toolchain.
#intel_2022_1_0:
# variables:
# ON_RUBY: "OFF"
# SPEC: "${PROJECT_RUBY_VARIANTS} %intel@2022.1.0 ${PROJECT_RUBY_DEPS}"
# extends: .job_on_ruby
11 changes: 9 additions & 2 deletions .gitlab/jobs/tioga.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@
# SPDX-License-Identifier: (MIT)
###############################################################################

# Override reproducer section to define Umpire specific variables.
.tioga_reproducer_vars:
script:
- |
echo -e "export MODULE_LIST=\"${MODULE_LIST}\""
echo -e "export SPEC=\"${SPEC//\"/\\\"}\""

########################
# Overridden shared jobs
########################
Expand All @@ -29,8 +36,8 @@ cce_16_0_1:
# This job intentionally tests our umpire package.py because although this job does not
# explicitly have the ~tools, the package.py should still disable tools from being built.
###
rocmcc_5_6_1_hip_openmp_device_alloc:
rocmcc_5_7_1_hip_openmp_device_alloc:
variables:
SPEC: "~shared +fortran +openmp +rocm +device_alloc tests=basic amdgpu_target=gfx90a %rocmcc@5.6.1 ^hip@5.6.1"
SPEC: "~shared +fortran +openmp +rocm +device_alloc tests=basic amdgpu_target=gfx90a %rocmcc@5.7.1 ^hip@5.7.1"
extends: .job_on_tioga

14 changes: 14 additions & 0 deletions .gitlab/subscribed-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@ generate-job-lists:
LOCAL_JOBS_PATH: ".gitlab/jobs"
script:
- cat ${RADIUSS_JOBS_PATH}/ruby.yml ${LOCAL_JOBS_PATH}/ruby.yml > ruby-jobs.yml
- cat ${RADIUSS_JOBS_PATH}/poodle.yml ${LOCAL_JOBS_PATH}/poodle.yml > poodle-jobs.yml
- cat ${RADIUSS_JOBS_PATH}/lassen.yml ${LOCAL_JOBS_PATH}/lassen.yml > lassen-jobs.yml
- cat ${RADIUSS_JOBS_PATH}/corona.yml ${LOCAL_JOBS_PATH}/corona.yml > corona-jobs.yml
- cat ${RADIUSS_JOBS_PATH}/tioga.yml ${LOCAL_JOBS_PATH}/tioga.yml > tioga-jobs.yml
artifacts:
paths:
- ruby-jobs.yml
- poodle-jobs.yml
- lassen-jobs.yml
- corona-jobs.yml
- tioga-jobs.yml
Expand All @@ -60,6 +62,18 @@ ruby-build-and-test:
needs: [ruby-up-check, generate-job-lists]
extends: [.build-and-test]

# POODLE
poodle-up-check:
variables:
CI_MACHINE: "poodle"
extends: [.machine-check]

poodle-build-and-test:
variables:
CI_MACHINE: "poodle"
needs: [poodle-up-check, generate-job-lists]
extends: [.build-and-test]

# CORONA
corona-up-check:
variables:
Expand Down
2 changes: 1 addition & 1 deletion .uberenv_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"package_final_phase" : "initconfig",
"package_source_dir" : "../..",
"spack_url": "https://github.com/spack/spack.git",
"spack_branch": "v0.20.1",
"spack_branch": "develop-2024-01-21",
"spack_activate" : {},
"spack_configs_path": "scripts/radiuss-spack-configs",
"spack_packages_path": "scripts/radiuss-spack-configs/packages",
Expand Down
2 changes: 1 addition & 1 deletion scripts/radiuss-spack-configs
Loading