Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piro_UnitTests_MPI_1 test failing in Trilinos-atdm-white-ride-cuda-9.2-debug-pt build #3552

Closed
bartlettroscoe opened this issue Oct 2, 2018 · 11 comments
Labels
client: ATDM Any issue primarily impacting the ATDM project PA: Nonlinear Solvers Issues that fall under the Trilinos Nonlinear Linear Solvers Product Area pkg: Piro type: bug The primary issue is a bug in Trilinos code or tests

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Oct 2, 2018

CC: @trilinos/piro , @rppawlo (Trilinos Nonlinear Solvers Product Area Lead)

Next Action Status

PR #3741 merged on 10/26/2018 repalced STEQR with PTEQR in Stokhos and on 10/27/2018 the test Piro_UnitTests_MPI_1 passed in the Trilinos-atdm-white-ride-cuda-9.2-debug-pt build on 'ride'.

Description

The test Piro_UnitTests_MPI_1 is failing in the build Trilinos-atdm-white-ride-cuda-9.2-debug-pt on 'white' and 'ride' as shown here which shows the failing output:

Sorting tests by group name then by the order they were added ... (time = 1.17e-05)

Running unit tests ...

0. Piro_ForwardSensitivities_UnitTest ... [Passed] (0.0379 sec)
1. Piro_AdjointSensitivities_UnitTest ... [Passed] (0.0233 sec)
2. Piro_ForwardOperatorSensitivities_UnitTest ... [Passed] (0.0295 sec)
3. Piro_AdjointOperatorSensitivities_UnitTest ... [Passed] (0.0231 sec)
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node white27 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
...

This is an important build because we are targeting this build on 'white' and 'ride' as a Trilinos CUDA PR testing build (see #2464 ). Also, the EMPIRE ATDM Trilinos build enables Piro.

It is not clear why this test does not fail in the more constrained ATDM Trilinos build 'Trilinos-atdm-white-ride-cuda-9.2-debug` where this test is shown passing here. Perhaps some packages and extra unit tests get enabled in this fuller Trilinos configuration?

Steps to reproduce

One should be able to reproduce these build errors on either 'white' or 'ride' by cloning the Trilinos git repo, checking out the 'develop' branch, creating a build directory, and then doing:

$ cd <some_build_dir>/

$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh cuda-9.2-debug

$ cmake \
  -GNinja \
  -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnvAllPtPackages.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Piro=ON \
  $TRILINOS_DIR

$ make NP=16

$ bsub -x -Is -q rhel7F -n 16 ctest -j16
@bartlettroscoe bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests pkg: Piro client: ATDM Any issue primarily impacting the ATDM project labels Oct 2, 2018
@bartlettroscoe
Copy link
Member Author

@rppawlo, who is currently maintaining Piro now that the original developer is no longer developing code?

@rppawlo
Copy link
Contributor

rppawlo commented Oct 2, 2018

Currently, the Albany team is supporting (@ikalash and @mperego ). The users include Albany and Drekar, but Drekar will be deprecating its use in the future. I can help out in the short term as well if needed.

@ikalash
Copy link
Contributor

ikalash commented Oct 2, 2018

I am not familiar with the sensitivity/adjoint sensitivity tests. @mperego , are you?

@mperego
Copy link
Contributor

mperego commented Oct 2, 2018

I can look into that. B.t.w. why EMPIRE ATDM Trilinos build enables Piro?

@rppawlo
Copy link
Contributor

rppawlo commented Oct 2, 2018

It's currently a required dependency for panzer.

@mperego
Copy link
Contributor

mperego commented Oct 3, 2018

@bartlettroscoe If I configure Trilinos as you documented neither Panzer nor Piro are enabled

@mperego
Copy link
Contributor

mperego commented Oct 3, 2018

@bartlettroscoe it seems it is not even reading the Trilinos_CONFIGURE_OPTIONS_FILE option. I get the same conifguration if I remove that option or if I pass a nonexisting file

@bartlettroscoe
Copy link
Member Author

@mperego, it was a simple copy-and-paste failure (-DTrilinos_ENABLE_TrilinosCouplings=ON instead of -DTrilinos_ENABLE_Piro=ON). I fixed the error. Please try the "Steps to Reproduce" again.

@mperego
Copy link
Contributor

mperego commented Oct 12, 2018

The code is failing when calling a Lapack routine in Stokhos (@etphipp ):

my_lapack.STEQR('I', num_points, &a[0], &b[1], eig_vectors.values(),

but I could not find anything wrong with that. Any ideas?

@bartlettroscoe
Copy link
Member Author

Okay, the routine DSTEQR() is the same routine causing Stokhos tests to fail as reported by @etphipp in #3542 (comment) and by @hkthorn in #2410 (comment).

@bartlettroscoe
Copy link
Member Author

As described in #3542 (comment), PR #3741 merged on 10/26/2018 repalced STEQR with PTEQR in Stokhos and fixed all of the Stokhos tests. This had the side-effect of also fixing this failing Piro tests as shown here with the subscript -1 and the subscript +1 for the number of 0 failing tests and 13 passing tests for Piro. And this test Piro_UnitTests_MPI_1 is shown newly passing here.

Closing as complete!

@bartlettroscoe bartlettroscoe added the PA: Nonlinear Solvers Issues that fall under the Trilinos Nonlinear Linear Solvers Product Area label Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: ATDM Any issue primarily impacting the ATDM project PA: Nonlinear Solvers Issues that fall under the Trilinos Nonlinear Linear Solvers Product Area pkg: Piro type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

4 participants