Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zoltan: Several tests fail with 64 bit builds of of Scotch and ParMETIS #475

Closed
bartlettroscoe opened this issue Jun 30, 2016 · 11 comments
Closed

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Jun 30, 2016

Next Action Status:

64-bit Scotch and ParMETIS not enabled for Zoltan yet. Next: Zoltan team to fix failing tests then enable ...

CC: @trilinos/zoltan

Description:

As @kddevin predicted in this #158 comment, several of the Scotch and ParMETIS tests fail when using a 64 bit build of Scott and ParMETIS. These are the only TPLs that are available with the SEMS Dev Env (see a lengthy discussion in #158).

In particular, the following Zoltan tests failed with the 64 bit builds of Scotch and ParMETIS:

$ grep " Test " ctest.out | grep "Failed" | grep "Zoltan_" | grep -i "\(parmetis\|scotch\)"
182/221 Test   #3: Zoltan_ch_brack2_3_parmetis_parallel ....................***Failed  Error regular expression found in output. Regex=[FAILED]  4.54 sec
184/221 Test   #6: Zoltan_ch_bug_parmetis_parallel .........................***Failed  Error regular expression found in output. Regex=[FAILED]  2.37 sec
192/221 Test  #18: Zoltan_ch_ewgt_parmetis_parallel ........................***Failed  Error regular expression found in output. Regex=[FAILED]  3.64 sec
193/221 Test  #19: Zoltan_ch_ewgt_scotch_parallel ..........................***Failed  Error regular expression found in output. Regex=[FAILED]  0.26 sec
194/221 Test  #21: Zoltan_ch_grid20x19_parmetis_parallel ...................***Failed  Error regular expression found in output. Regex=[FAILED]  3.81 sec
196/221 Test  #24: Zoltan_ch_hammond_parmetis_parallel .....................***Failed  Error regular expression found in output. Regex=[FAILED]  7.71 sec
197/221 Test  #25: Zoltan_ch_hammond_scotch_parallel .......................***Failed  Error regular expression found in output. Regex=[FAILED]  0.43 sec
202/221 Test  #33: Zoltan_ch_nograph_parmetis_parallel .....................***Failed  Error regular expression found in output. Regex=[FAILED]  1.67 sec
204/221 Test  #36: Zoltan_ch_onedbug_parmetis_parallel .....................***Failed  Error regular expression found in output. Regex=[FAILED]  0.54 sec
208/221 Test  #42: Zoltan_ch_simple_parmetis_parallel ......................***Failed  Error regular expression found in output. Regex=[FAILED]  5.28 sec
209/221 Test  #43: Zoltan_ch_simple_scotch_parallel ........................***Failed  Error regular expression found in output. Regex=[FAILED]  0.52 sec
212/221 Test  #48: Zoltan_ch_vwgt_parmetis_parallel ........................***Failed  Error regular expression found in output. Regex=[FAILED]  3.72 sec
213/221 Test  #49: Zoltan_ch_vwgt_scotch_parallel ..........................***Failed  Error regular expression found in output. Regex=[FAILED]  0.26 sec

However, what is interesting is that several Zoltan "scotch" and "parmetis" tests also passed:

$ grep " Test " ctest.out | grep "Passed" | grep "Zoltan_" | grep -i "\(parmetis\|scotch\)"
183/221 Test   #4: Zoltan_ch_brack2_3_scotch_parallel ......................   Passed    0.11 sec
185/221 Test   #7: Zoltan_ch_bug_scotch_parallel ...........................   Passed    0.11 sec
186/221 Test   #9: Zoltan_ch_degenerate_parmetis_parallel ..................   Passed    0.11 sec
187/221 Test  #10: Zoltan_ch_degenerate_scotch_parallel ....................   Passed    0.11 sec
188/221 Test  #12: Zoltan_ch_degenerateAA_parmetis_parallel ................   Passed    0.11 sec
189/221 Test  #13: Zoltan_ch_degenerateAA_scotch_parallel ..................   Passed    0.11 sec
190/221 Test  #15: Zoltan_ch_drake_parmetis_parallel .......................   Passed    0.11 sec
191/221 Test  #16: Zoltan_ch_drake_scotch_parallel .........................   Passed    0.11 sec
195/221 Test  #22: Zoltan_ch_grid20x19_scotch_parallel .....................   Passed    0.11 sec
198/221 Test  #27: Zoltan_ch_hammond2_parmetis_parallel ....................   Passed    0.11 sec
199/221 Test  #28: Zoltan_ch_hammond2_scotch_parallel ......................   Passed    0.11 sec
200/221 Test  #30: Zoltan_ch_hughes_parmetis_parallel ......................   Passed    0.11 sec
201/221 Test  #31: Zoltan_ch_hughes_scotch_parallel ........................   Passed    0.13 sec
203/221 Test  #34: Zoltan_ch_nograph_scotch_parallel .......................   Passed    0.11 sec
205/221 Test  #37: Zoltan_ch_onedbug_scotch_parallel .......................   Passed    0.11 sec
206/221 Test  #39: Zoltan_ch_serial_parmetis_parallel ......................   Passed    0.11 sec
207/221 Test  #40: Zoltan_ch_serial_scotch_parallel ........................   Passed    0.11 sec
210/221 Test  #45: Zoltan_ch_simple3d_parmetis_parallel ....................   Passed    0.11 sec
211/221 Test  #46: Zoltan_ch_simple3d_scotch_parallel ......................   Passed    0.11 sec
214/221 Test  #51: Zoltan_ch_vwgt2_parmetis_parallel .......................   Passed    0.68 sec
215/221 Test  #52: Zoltan_ch_vwgt2_scotch_parallel .........................   Passed    0.11 sec

There are many possible options to address these failing tests that I can think of:

  1. Disable only the currently failing tests for just the SEMS Dev Env build: This could be done by setting cache vars <test_name>_DISABLE=TRUE in the SEMSDevEnv.cmake file.
    • Pro: Easy to implement by non-Zoltan developers
    • Pro: Still enables Scotch and ParMETIS TPLs and gets at least some tests run using these
    • Con: Does not exercise some functionality of Zoltan for Scotch and ParMETIS
    • Con: As Zoltan tests using Scotch and ParMETIS are changed but only tested with 32 bit builds of Scotch and ParMETIS, there is greater risk that these updated tests which are currently passing on the SEMS Dev Env may then fail with the 64 bit builds of these TPLs.
    • Summary: Easy short-term solution that yields all passing CI tests with Zoltan
  2. Disable Scotch and ParMETIS TPL support for Zoltan for just the SEMS Dev Env build: This could be done by setting Zoltan_ENABLE_Scotch=OFF and Zoltan_ENABLE_ParMETIS=OFF in the SEMSDevEnv.cmake file.
    • Pro: Easy to implement by non-Zoltan developers
    • Pro: There would never be a Scotch or ParMETIS related test failure on the SEMS Dev Env.
    • Con: The build and usage of Zoltan with Scotch and ParMETIS would not be getting tested on the SEMS Dev Env.
    • Summary: Easy short-term solution that yields all passing CI tests with Zoltan
  3. Update the Zoltan test suite to work with 64 bit Scotchand ParMETIS: This would require Zoltan developers to do the updates.
    • Pro: Would allow full Zoltan test suite to be run on the SEMS Dev Env.
    • Pro: Strengthens the Zoltan
    • Con: Requires Zoltan developers to update the Zoltan test suite
    • Summary: Best long-term solution but requires work from the Zoltan developers

I will provide detailed reproducibility instructions in a later comment.

Definition of Done:

  • No failing Zoltan tests in pre-push CI testing with the SEMS Dev Env
  • Zoltan developers decide on best approach to dealing with these failing tests.

Tasks:

???

@bartlettroscoe
Copy link
Member Author

I think in the short-term, going with option-1 "Disable only the currently failing tests for just the SEMS Dev Env build" above is the best short-term solution.

NOTE: These tests will only be disabled for the SEMS Dev Env build and no other builds.

I will provide reproducible instructions once I push the commit for the SEMSDevEnv.cmake file that disables these tests.

@trilinos/framework,

FYI: we need all clean Zoltan tests in pre-push CI testing.

@kddevin
Copy link
Contributor

kddevin commented Jun 30, 2016

Please do not disable Zoltan tests.

There is a fourth option: providing 32-bit ParMETIS and Scotch builds in SEMS. Indeed, 32-bit ParMETIS builds are used by our primary SNL customer of Zoltan.

However, the Zoltan code already handles 64-bit ParMETIS correctly. We can update our test suite's answers to use the 64-bit builds. Please let us know by what date these builds are needed; no timeline was requested in #158.

@bartlettroscoe
Copy link
Member Author

Please do not disable Zoltan tests.

These would only be disabled when the file SEMSDevEnv.cmake gets included and that will currently load the 64 bit versions these libraries. If that is not acceptable, for now, we can just go with option-2 "Disable Scotch and ParMETIS TPL support for Zoltan for just the SEMS Dev Env build". Since you can't successfully test Zoltan with the 64 bit versions of these libraries, I can't see that possible problem this could cause.

There is a fourth option: providing 32-bit ParMETIS and Scotch builds in SEMS. Indeed, 32-bit ParMETIS builds are used by our primary SNL customer of Zoltan.

That is not in my power to do on any short time-scale. But we can push the SEMS team to see if they can provide that for us. But SEACAS wants/requires 64 bit versions (see this comment). Therefore, I don't think going with 32-bit versions of these is fair to SEACAS since that would affect its testing.

However, the Zoltan code already handles 64-bit ParMETIS correctly.

We can update our test suite's answers to use the 64-bit builds.

That is the best option and this would strengthen the Zoltan testing for 64 bit builds anyway.

Please let us know by what date these builds are needed; no timeline was requested in #158.

We would like to have an effective pre-push CI build yesterday but I don't see why this one issue of 32-bit vs. 64 bit Scotch and ParMETIS needs to hold that up.

Let's discuss this at the next Trilinos Leaders Meeting.

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jun 30, 2016

Another way to look at this issue is that if a user reads the SEACAS documentation:

https://github.com/gsjaardema/seacas/blob/master/README-PARALLEL.md

follow the instructions to build a 64 bit version of ParMETIS, and then does:

$ cmake -DTrilinos_ENABLE_SEACAS=ON -DTrilinos_ENABLE_Zoltan=ON \
  -DTriilnos_ENABLE_TESTS=ON \
  -DTPL_ENABLE_Scotch=ON -DTPL_ENABLE_ParMETIS=ON \
  $TRILINOS_HOME
$ make -j16
$ ctest -j16

they are going to see these same failing Zoltan tests. I think you can argue that this is a rational thing for a user to try.

One could argue that Zoltan (and Zoltan2 in #476) should address these failing tests with 64 bit builds of these libraries one way.

@kddevin
Copy link
Contributor

kddevin commented Jun 30, 2016

Responding to @kddevin:

Again, please do not disable Zoltan tests.

There must be some type of misunderstanding going on here. I think you think I am suggesting something that I am not suggesting.

The disabling going on with option-1 and option-2 will only impact builds when the SEMSDevEnv.cmake is included. That file only gets included by default if you source the SEMS dev env with load_sems_dev_env.sh. Therefore, this will not affect any existing build of Trilinos that anyone has ever done or ever will do on the planet earth unless they source the script load_sems_dev_env.sh. It is not my fault that SEACAS demands 64 bit ParMETIS and that the SEMS team only installed 64 bit versions of these libraries. I am just trying to take what we have and make incremental progress.

And disabling ParMETIS/Scotch support is even more dangerous, as some code is not even compiled when this support is disabled.

Exactly, that is why option-1 is much better than option-2 in the short term. The rule of disabling failing software/tests is that you always disable a little as possible. That is, you want to (temporarily) disable with a scalpel and not a machete. Option-1 is a scalpel. Simply never enabling ParMETIS is a machete.

Clearly, I cannot respond to a deadline of "yesterday." But I will try to get 64-bit GID answer files into the repository by the end of next week.

I am not trying to create a fire-drill here; I am trying to do just the opposite. Going with option-1 let's Zoltan developers have time to fix the failing Zoltan tests with 64 bit builds of these TPLs but does not block us being able to start pushing forward with a more effective CI process for Trilinos. The statement "don't disable tests that are already failing" is what is creating a fire-drill (for no good reason that I can see).

I will try to give you a call over the phone to try to come to a common understanding about what is actually being proposed and what the issues are.

@bartlettroscoe
Copy link
Member Author

A little more info on this. Where Zoltan developers aware that Scotch and ParMETIS are not currently being enabled with the current offical Trilinos post-push CI testing process? You can clearly see that by looking at the first CI iteration this morning for Zoltan at:

http://testing.sandia.gov/cdash/viewConfigure.php?buildid=2488428

which shows:

_Final set of non-enabled TPLs:_ ... Scotch ... ParMETIS ... 94

In fact, I can't find a single automated post-push build of Zoltan that ever enables Scotch or ParMETIS Zoltan from looking at both CDash sites:

http://testing.sandia.gov/cdash/index.php?project=Trilinos&date=2016-06-29&subproject=Zoltan
http://my.cdash.org/index.php?project=Trilinos&date=2016-06-28&subproject=Zoltan

So, what I am proposing is to actually add more automated testing for Zoltan than currently exists. Therefore, can Zoltan developers please work with me on this?

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jun 30, 2016

I locally investigated using:

SET(Zoltan_ch_brack2_3_parmetis_parallel_DISABLE  ON  CACHE BOOL
  "Disabled in SEAMSDevEnv.cmake")
...

in the SEMSDevEnv.cmake file to temporarily disable these failing tests (until they can be fixed) but it fails the configure with the error:

-- Zoltan_ch_brack2_3_parmetis_parallel: NOT added test because Zoltan_ch_brack2_3_parmetis_parallel_DISABLE='ON'!
CMake Error at packages/zoltan/test/ch_brack2_3/CMakeLists.txt:66 (SET_PROPERTY):
  set_property given TEST names that do not exist:

    Zoltan_ch_brack2_3_parmetis_parallel

This issue is that these CMakeLists.txt files are not properly using the output variable ADDED_TEST_NAME_OUT <testName> documented here. This does not allow TriBITS to disable tests when it needs to smoothly.

Therefore, the most sensible thing to do is to avoid enabling (not disabling) Scotch and ParMETIS for Zoltan testing for now for the SEMS Dev Env. Testing with Scotch and ParMETIS has never been enabled for the SEMS Dev Env up to this point (I am the first person to try).

@kddevin
Copy link
Contributor

kddevin commented Jul 1, 2016

As I said above, I will try to update the answer files by the end of next week.

bartlettroscoe added a commit that referenced this issue Jul 1, 2016
Currently if you eanble 64-bit Scotch and ParMETIS with Zoltan several tests
fail.  This is a known issue and these tests will be fixed soon.  After that,
this commit can be reverted.
bartlettroscoe added a commit that referenced this issue Jul 1, 2016
This enables ParMETIS with passing tests for Amesos, Amesos2, ML, and SEACAS
(see Trilinos #158).

Currently these 64-bit TPLs are not enabled for Zoltan and Zoltan2 in this
build because the Zoltan and Zoltan2 test suites don't currently work with
64-bit libraries (see Trilinos #475 and #476).  Also, ShyLU support for
ParMETIS is also not enabled because it needs ParMETIS suppot from Zoltan2
which is not enabled.  Once the Zoltan and Zoltan2 test suites using 64 bit
TPLs are fixed, then these TPLs can be enabled.  Note that the ShyLU test for
ParMETIS passes for the 64 bit ParMETIS so nothing in ShyLU needs to be fixed.

Note that SEMS only provides MPI builds of these TPLs so they are disabled for
serial builds.

Build/Test Cases Summary
Enabled Packages:
Disabled Packages: PyTrilinos,Pliris,Claps,TriKota
Enabled all Packages
0) MPI_DEBUG => Test case MPI_DEBUG was not run! => Does not affect push readiness! (-1.00 min)
1) SERIAL_RELEASE => Test case SERIAL_RELEASE was not run! => Does not affect push readiness! (-1.00 min)
2) MPI_RELEASE_DEBUG_ST => passed: passed=2346,notpassed=0 (341.29 min)
3) SERIAL_RELEASE_ST => passed: passed=2163,notpassed=0 (172.52 min)
Other local commits for this build/test group: 92a1d8d, f2b3c92
@bartlettroscoe
Copy link
Member Author

Zoltan Developers,

To reproduce the failing Zoltan tests building against the 64-bit ParMETIS and Scotch TPLs in the SEMS Dev Env in order to fix them, one just needs to be on a machine that provides the SEMS Dev Env and then do something like the following:

$ cd Trilinos/  # Make sure you are on the 'develop' tracking branch
$ git pull   # from origin/develop
$ mkdir BUILD/
$ echo /BUILD/ >> .git/info/exclude
$ cd BUILD/
$ source ../cmake/load_ci_sems_dev_env.sh
$ cmake \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DTrilinos_ENABLE_DEBUG=ON \
  -DTPL_ENABLE_MPI=ON \
  -DTrilinos_ENABLE_TESTS=ON \
  -DTrilinos_ENABLE_Zoltan=ON \
  -DZoltan_ENABLE_Scotch=ON \
  -DZoltan_ENABLE_ParMETIS=ON \
  ..
$ make -j16
$ ctest -j16

If that does not work to reproduce the failing tests shown above, please let me know.

Once all of the Zoltan tests are passing, then the commit f2b3c92 just needs to be reverted using:

$ cd Trilinos/
$ git revert f2b3c92 

Then, if desired, a Zoltan developer could test and push these changes on a machine with the SEMS Dev Env with:

$ cd Trilinos/
$ mkdir CHECKIN/
$ cd CHECKIN/
$ ln -s ../cmake/std/sems/checkin-test-sems.sh .
$ ./checkin-test-sems.sh --enable-all-packages=off --no-enable-fwd-packages \
   --enable-packages=Zoltan --do-all --push

Thanks,

-Ross

kddevin added a commit that referenced this issue Jul 2, 2016
32-bit local indices in TPLs ParMETIS and Scotch.
Requested by Ross B. in issue #475 for SEMS testing.
SEMS builds 64-bit TPLs and will not add 32-bit TPLs,
but the default builds of ParMETIS and Scotch are 32-bit.
Thus, we need to test both.
@kddevin
Copy link
Contributor

kddevin commented Jul 2, 2016

Fixed with 3e68c54
Please re-enable Zoltan tests in #158. Thank you.

@kddevin kddevin closed this as completed Jul 2, 2016
@kddevin
Copy link
Contributor

kddevin commented Jul 3, 2016

Note: The Zoltan tests pass when consistent versions of ParMETIS and Scotch are used. The SEMS builds currently have 32-bit IDs in Scotch and 64-bit IDs in ParMETIS.
I re-enabled the Zoltan tests in #158, but Zoltan's Scotch tests fail due to this inconsistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants