Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New STKUnit test failing in ATDM CUDA PT build on white #4551

Closed
fryeguy52 opened this issue Mar 5, 2019 · 7 comments
Closed

New STKUnit test failing in ATDM CUDA PT build on white #4551

fryeguy52 opened this issue Mar 5, 2019 · 7 comments
Labels
ATDM Sev: Blocker Problems that make Trilinos unfit to be adopted by one or more ATDM APPs CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. PA: Data Services Issues that fall under the Trilinos Data Services Product Area pkg: STK type: bug The primary issue is a bug in Trilinos code or tests

Comments

@fryeguy52
Copy link
Contributor

fryeguy52 commented Mar 5, 2019

CC: @trilinos/stk, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52

Next Action Status

Description

As shown in this query the test:

  • STKUnit_tests_stk_ngp_test_utest_MPI_4

is failing in the builds:

  • Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt

Test Output on CDash

This looks like a new test that was added on 2019-02-22 and has been failing since

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt/SRC_AND_BUILD/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:120

Current Status on CDash

The current status and recent history of this test

Steps to Reproduce

One should be able to reproduce this failure on ride or white as described in:

More specifically, the commands given for ride or white are provided at:

The exact commands to reproduce this issue should be:

$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt
$ cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_STK=ON \
 $TRILINOS_DIR
$ make NP=16
$ bsub -x -Is -q rhel7F -n 16 ctest -j16
@fryeguy52 fryeguy52 added type: bug The primary issue is a bug in Trilinos code or tests pkg: STK client: ATDM Any issue primarily impacting the ATDM project ATDM Sev: Blocker Problems that make Trilinos unfit to be adopted by one or more ATDM APPs PA: Data Services Issues that fall under the Trilinos Data Services Product Area labels Mar 5, 2019
@bartlettroscoe bartlettroscoe changed the title new STKUnit test failing in ATDM CUDA build on white New STKUnit test failing in ATDM CUDA build on white Mar 6, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 9, 2019
…ilinos#4551, trilinos#2464)

Has to be disabled for the CUDA PR build.  Note, before this, no STK tests
were being enabled at all.
@alanw0
Copy link
Contributor

alanw0 commented Mar 9, 2019

We believe we have a fix for the failing stk unit test, we will try to get another stk update into trilinos in the next couple days.

jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Mar 11, 2019
…s:develop' (625e220).

* trilinos-develop:
  Temp disable some tests failing becuase ctest parallel level is too high (trilinos#2464)
  Ifpack2 - use KOKKOS_RESTRICT
  Ifpack2 - remove shadow warning
  Ifpack2 - add static inline to remove multiple definition of functions
  Disable known failing test STKUnit_tests_stk_ngp_test_utest_MPI_4 (trilinos#4551, trilinos#2464)
  WIP: Update CUDA PR build settings to correctly match working ATDM Trilinos config (trilinos#2464)
  No need to set new AAO features after cmake 3.10.0 upgrade (trilinos#1761)
  Ifpack2 - fix a typo trilinos#4388
  Ifpack2 - change vector loop
  Ifpack2 - check point for debugging
  Ifpack2 - put profilier stop at the beginning of test
  Ifpack2 - little bit of improvement on extract part
  Ifpack2 - remove unused impl
  KokkosBatched - ifpack2 need some new functions from updated kokkoskernels
  Ifpack2 - improvement on block spmv
  Ifpack2 - for jacobi solver, invert diagonals and solve with gemv
  Ifpack2 - improvement by using large team size
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Mar 11, 2019
…s:develop' (625e220).

* trilinos-develop:
  Temp disable some tests failing becuase ctest parallel level is too high (trilinos#2464)
  Ifpack2 - use KOKKOS_RESTRICT
  Ifpack2 - remove shadow warning
  Ifpack2 - add static inline to remove multiple definition of functions
  Disable known failing test STKUnit_tests_stk_ngp_test_utest_MPI_4 (trilinos#4551, trilinos#2464)
  WIP: Update CUDA PR build settings to correctly match working ATDM Trilinos config (trilinos#2464)
  No need to set new AAO features after cmake 3.10.0 upgrade (trilinos#1761)
  Ifpack2 - fix a typo trilinos#4388
  Ifpack2 - change vector loop
  Ifpack2 - check point for debugging
  Ifpack2 - put profilier stop at the beginning of test
  Ifpack2 - little bit of improvement on extract part
  Ifpack2 - remove unused impl
  KokkosBatched - ifpack2 need some new functions from updated kokkoskernels
  Ifpack2 - improvement on block spmv
  Ifpack2 - for jacobi solver, invert diagonals and solve with gemv
  Ifpack2 - improvement by using large team size
@bartlettroscoe bartlettroscoe changed the title New STKUnit test failing in ATDM CUDA build on white New STKUnit test failing in ATDM CUDA PT build on white Apr 9, 2019
@alanw0
Copy link
Contributor

alanw0 commented Jul 31, 2019

I believe this was fixed.

@alanw0 alanw0 closed this as completed Jul 31, 2019
@bartlettroscoe
Copy link
Member

@fryeguy52
Copy link
Contributor Author

output from failed test:

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/jenkins/ride/workspace/Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug-pt/SRC_AND_BUILD/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:120

@bartlettroscoe
Copy link
Member

Defects with Primary tested code not currently used (or being tested) by ATDM is not really an ATDM issue so I am removing the "client: ATDM" label to get this off our list of active issues.

@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Aug 18, 2021
@github-actions
Copy link

This issue was closed due to inactivity for 395 days.

@github-actions github-actions bot added the CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. label Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATDM Sev: Blocker Problems that make Trilinos unfit to be adopted by one or more ATDM APPs CLOSED_DUE_TO_INACTIVITY Issue or PR has been closed by the GitHub Actions bot due to inactivity. MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. PA: Data Services Issues that fall under the Trilinos Data Services Product Area pkg: STK type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants