Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Trilinos Cuda build errors - various perf_test/sparse/KokkosSparse_spiluk.cpp: error: identifier ... is undefined #1366

Closed
ndellingwood opened this issue Mar 22, 2022 · 7 comments

Comments

@ndellingwood
Copy link
Contributor

Nightly Cuda builds of Trilinos are failing to compile the KokkosSparse_spiluk.cpp perf test with cuda/9.2.88 and cuda/10.1.105:

Error snip:

17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(273): error: identifier "status" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(274): error: identifier "handle" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(274): error: identifier "descr" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(275): error: identifier "info" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(275): error: identifier "policy" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(275): error: identifier "pBuffer" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(274): error: argument of type "std::size_t *" is incompatible with parameter of type "const int *"
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(282): error: identifier "structural_zero" is undefined
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(290): error: argument of type "std::size_t *" is incompatible with parameter of type "const int *"
17:55:35 
17:55:35 /home/jenkins/jenkins-new/workspace/KokkosKernels_Trilinos_TpetraCore_Sacado_KokkosDev_CUDA92_opt/Trilinos/kokkos-kernels/perf_test/sparse/KokkosSparse_spiluk.cpp(298): error: identifier "numerical_zero" is undefined

@jgfouca could any of your changes #1356 possibly impact this perf test?

Reproducer (kokkos-dev):

git clone -b kokkos-promotion https://github.com/trilinos/Trilinos.git
# Symbolic link to your kokkos and kokkos-kernels repos in Trilinos source directory for source override
cd Trilinos
ln -s <path-to-your-repo>/kokkos kokkos
ln -s <path-to-your-repo>/kokkos-kernels kokkos-kernels

cd $HOME
mkdir -p build
cd build

# Environment and configure
export ATDM_CONFIG_REGISTER_CUSTOM_CONFIG_DIR=${TRILINOS_DIR}/cmake/std/atdm/contributed/kokkos-dev
source ${TRILINOS_DIR}/cmake/std/atdm/load-env.sh kokkos-dev-cuda-opt
export OMPI_CXX=$KOKKOS_DIR/bin/nvcc_wrapper

cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DCMAKE_INSTALL_PREFIX="${PWD}/install" \
 -DCMAKE_CXX_STANDARD="14" \
 -DTrilinos_ENABLE_TESTS=OFF \
 -DTrilinos_ENABLE_ALL_PACKAGES=OFF \
 -DTrilinos_ENABLE_Kokkos=ON \
  -DKokkos_ARCH_KEPLER35=ON \
 -DTrilinos_ENABLE_KokkosKernels=ON \
  -DKokkosKernels_ENABLE_TESTS=ON \
 -DKokkos_ENABLE_CUDA=ON \
 -DKokkos_SOURCE_DIR_OVERRIDE:STRING=kokkos \
 -DKokkosKernels_SOURCE_DIR_OVERRIDE:STRING=kokkos-kernels \
$TRILINOS_DIR

@jgfouca
Copy link
Contributor

jgfouca commented Mar 22, 2022

@ndellingwood , I don't think so since I did not touch the spiluk test in #1356 , but I could try to confirm. Is this on weaver?

@ndellingwood
Copy link
Contributor Author

@jgfouca it can be reproduced with a Cuda build on any system, the reproducer above is for kokkos-dev

@ndellingwood
Copy link
Contributor Author

It doesn't look like the source code changed in the cpp file, but some configuration option must have changed where this guard is no longer true:

#if defined(KOKKOSKERNELS_INST_ORDINAL_INT) && \
    defined(KOKKOSKERNELS_INST_OFFSET_INT)

where the cusparse types are defined, but then later the cusparse tested components are not wrapped in that same guard resulting to calls in cusparse routines without the defined types. I can put in a PR that adds the guards above to each cusparse region, hopefully that is the right thing to do here

@jgfouca
Copy link
Contributor

jgfouca commented Mar 23, 2022

@ndellingwood , I don't know if you fixed something, but I was not able to reproduce this build error on kokkos-dev using the steps you provided:

[ 97%] Built target KokkosKernels_KokkosBlas3_perf_test
Scanning dependencies of target KokkosKernels_blas_cuda
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_cuda.dir/Test_Main.cpp.o
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_cuda.dir/cuda/Test_Cuda_Blas.cpp.o
[ 97%] Linking CXX executable KokkosKernels_common_cuda.exe
[ 97%] Built target KokkosKernels_common_cuda
Scanning dependencies of target KokkosKernels_blas_serial
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_serial.dir/Test_Main.cpp.o
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_blas_serial.dir/serial/Test_Serial_Blas.cpp.o
[ 97%] Linking CXX executable KokkosKernels_graph_serial.exe
[ 97%] Built target KokkosKernels_graph_serial
Scanning dependencies of target KokkosKernels_batched_sla_serial
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_batched_sla_serial.dir/Test_Main.cpp.o
[ 97%] Linking CXX executable KokkosKernels_sparse_serial.exe
[ 97%] Built target KokkosKernels_sparse_serial
[ 97%] Building CXX object kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_batched_sla_serial.dir/serial/Test_Serial_Batched_Sparse.cpp.o
[ 98%] Linking CXX executable KokkosKernels_batched_sla_serial.exe
[ 98%] Built target KokkosKernels_batched_sla_serial
[ 99%] Linking CXX executable KokkosKernels_blas_cuda.exe
[ 99%] Built target KokkosKernels_blas_cuda
[ 99%] Linking CXX executable KokkosKernels_batched_dla_serial.exe
[ 99%] Built target KokkosKernels_batched_dla_serial
[100%] Linking CXX executable KokkosKernels_blas_serial.exe
[100%] Built target KokkosKernels_blas_serial
[100%] Linking CXX executable KokkosKernels_graph_cuda.exe
[100%] Linking CXX executable KokkosKernels_sparse_cuda.exe
[100%] Built target KokkosKernels_graph_cuda
[100%] Built target KokkosKernels_sparse_cuda
[100%] Linking CXX executable KokkosKernels_batched_dla_cuda.exe
[100%] Built target KokkosKernels_batched_dla_cuda

@ndellingwood
Copy link
Contributor Author

@jgfouca I put in a PR with a fix but thanks for checking. In your build did you add the symbolic links in Trilinos to updated kokkos and kokkos-kernels repos for source override? If kokkos-kernels was a bit out of date that may explain the failure to reproduce

@jgfouca
Copy link
Contributor

jgfouca commented Mar 24, 2022

@ndellingwood , you're right. I forgot to set my KK to develop after I cloned.

@jgfouca
Copy link
Contributor

jgfouca commented Mar 24, 2022

@ndellingwood , for what it's worth, this build err was not introduced by #1356 . I set my KK repo to:

commit 6bb39275f4089e65fbaa8c8deae1ebe00454f755
Merge: ec6cf57 e634bd5
Author: Luc Berger <lberge@sandia.gov>
Date:   Fri Mar 18 10:16:32 2022 -0600

    Merge pull request #1356 from jgfouca/jgfouca/minor_test_cleanup
    
    A couple newer sparse tests were not following the new testing pattern

... and the build worked fine. This is a relief to me since my PR was purely code cleanup and should not have changed semantics. If you want, I can bisect the exact KK PR that caused the problem or we can just move on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants