990 - Non-MPI tests run without MPI #998

pnstickne · 2020-08-20T06:18:50Z

Tests that use the TestHarness (not the ParallelTestHarness) don't / shouldn't need or use MPI.

When run under MPI, multiple processes are still spawned which can result in unexpected conflicts that can't be gated via rank because.. well, there is no rank.

As a side-effect of these "nompi" tests only running once, there is a slight reduction of test executions overall.

The "nompi" tests implicitly guarantee that they can't use MPI (as they will fail if an attempt is made). The reverse is no true, although it's largely inconsequential to spin up VT/MPI in the ParallelTestHarness and then not use it.

pnstickne · 2020-08-20T06:24:51Z

tests/CMakeLists.txt

+      set_tests_properties(
+        ${${CUR_TEST_LIST}}
+        PROPERTIES TIMEOUT 60
+        FAIL_REGULAR_EXPRESSION "FAILED;should be deleted but never is;Segmentation fault"


I always wonder about that line..

codecov · 2020-08-20T06:36:39Z

Codecov Report

Merging #998 into develop will decrease coverage by 0.02%.
The diff coverage is 88.23%.

@@             Coverage Diff             @@
##           develop     #998      +/-   ##
===========================================
- Coverage    77.43%   77.41%   -0.03%     
===========================================
  Files          660      656       -4     
  Lines        25204    25130      -74     
===========================================
- Hits         19517    19454      -63     
+ Misses        5687     5676      -11

Impacted Files	Coverage Δ
src/vt/vrt/collection/balance/read_lb.h	`58.82% <ø> (ø)`
tests/unit/epoch/test_epoch.nompi.cc	`100.00% <ø> (ø)`
tests/unit/index/test_index.nompi.cc	`100.00% <ø> (ø)`
tests/unit/index/test_index_linearization.nompi.cc	`100.00% <ø> (ø)`
tests/unit/mapping/test_mapping.nompi.cc	`100.00% <ø> (ø)`
tests/unit/tls/test_tls.nompi.cc	`100.00% <ø> (ø)`
tests/unit/utils/test_demangler_utils.nompi.cc	`100.00% <ø> (ø)`
tests/unit/utils/test_histogram_approx.nompi.cc	`100.00% <ø> (ø)`
tests/unit/utils/test_linear_regression.nompi.cc	`100.00% <ø> (ø)`
tests/unit/utils/test_safe_union.nompi.cc	`96.92% <ø> (ø)`
... and 13 more

tests/unit/lb/test_lb_reader.nompi.cc

cz4rs · 2020-08-20T07:33:55Z

I have switched test_lb_reader.cc to parallel harness to get proper initialization before running the test

when you include TestHarness header, you will start getting segfaults - it's because theConfig() is used directly here both by tested code and in test setup (getting LB file name, debug printing)
this means that in order to ditch parallel harness, we have to get theConfig and theContext components properly initialized in some other way

sample stacktrace:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555fea37c in vt::theConfig () at ../src/vt/runtime/runtime_get.cc:136
136     vt::arguments::AppConfig*   theConfig()             { return &CUR_RT->theArgConfig->config_;      }
(gdb) bt
#0  0x0000555555fea37c in vt::theConfig () at ../src/vt/runtime/runtime_get.cc:136
#1  0x0000555555f97407 in vt::tests::unit::TestLBReader_test_lb_read_1_Test::TestBody (this=0x555556a97d50) at ../tests/unit/lb/test_lb_reader.cc:98
#2  0x0000555555fd633b in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x555556a97d50, method=&virtual testing::Test::TestBody(), 
    location=0x555556660ff3 "the test body") at ../tests/extern/googletest/googletest/src/gtest.cc:2433
#3  0x0000555555fcee51 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x555556a97d50, method=&virtual testing::Test::TestBody(), 
    location=0x555556660ff3 "the test body") at ../tests/extern/googletest/googletest/src/gtest.cc:2469
#4  0x0000555555fa9c8c in testing::Test::Run (this=0x555556a97d50) at ../tests/extern/googletest/googletest/src/gtest.cc:2508
#5  0x0000555555faa677 in testing::TestInfo::Run (this=0x555556a85fa0) at ../tests/extern/googletest/googletest/src/gtest.cc:2684
#6  0x0000555555faadb2 in testing::TestSuite::Run (this=0x555556a86430) at ../tests/extern/googletest/googletest/src/gtest.cc:2816
#7  0x0000555555fb6e6e in testing::internal::UnitTestImpl::RunAllTests (this=0x555556a860f0) at ../tests/extern/googletest/googletest/src/gtest.cc:5338
#8  0x0000555555fd7864 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x555556a860f0, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x555555fb6a66 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x555556661a30 "auxiliary test code (environments or event listeners)") at ../tests/extern/googletest/googletest/src/gtest.cc:2433
#9  0x0000555555fd008f in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x555556a860f0, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x555555fb6a66 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x555556661a30 "auxiliary test code (environments or event listeners)") at ../tests/extern/googletest/googletest/src/gtest.cc:2469
#10 0x0000555555fb56ae in testing::UnitTest::Run (this=0x555556a70e40 <testing::UnitTest::GetInstance()::instance>) at ../tests/extern/googletest/googletest/src/gtest.cc:4925
#11 0x0000555555f77623 in RUN_ALL_TESTS () at ../tests/extern/googletest/googletest/include/gtest/gtest.h:2473
#12 0x0000555555f76967 in main (argc=1, argv=0x7fffffffda98) at ../tests/unit/main.cc:70

cz4rs · 2020-08-20T10:14:38Z

@pnstickne
some more thoughts on test_lb_reader.cc:
the LB reader tests have been unstable for some time (I believe this is independent from changing the test harness, although it could have made the issue worse)

I have run them locally a couple of times and they fail quite regularly because of some file reading problems:

vt: [2] ------------------------------------------------------------------------------------------------------------------------
vt: [2] ------------------------------------------- Runtime Error: System Aborting! --------------------------------------------
vt: [2] ------------------------------------------------ Fatal Error on Node 2 -------------------------------------------------
vt: [2] ------------------------------------------------------------------------------------------------------------------------
vt: [2] 
vt: [2]              Reason: Valid LB not found: "name="

vt: [2]                Type: vtAbort() Invoked
vt: [2]                Node: 2
vt: [2]           Num Nodes: 4
vt: [2]                File: ../src/vt/vrt/collection/balance/read_lb.cc
vt: [2]                Line: 211
vt: [2]            Function: readFile

I think this could be extracted into a separate issue with 2 goals:

remove parallel harness and make this a nompi test
improve file handling in the test code (using unique names / temporary files as suggested by Phil during weekly meeting)

pnstickne · 2020-08-21T16:32:59Z

I think this could be extracted into a separate issue with 2 goals:

remove parallel harness and make this a nompi test

improve file handling in the test code (using unique names / temporary files as suggested by Phil during weekly meeting)

Removing the dependencies is the 'proper' method. This test has never a reason to be run in parallel or MPI.

Thee was not usages of theConfig on local branches. That should probably be able to be stubbed in tests without a dependency.

pnstickne · 2020-08-21T22:53:53Z

I have switched test_lb_reader.cc to parallel harness to get proper initialization before running the test

The debug_ support now uses a preConfig (much like preNode) which is guaranteed to never be invalid. This allows vtAssert, debug_print, etc. to appear in code that can be used from a pure unit test without causing a SEGV.

Also update the LB reading to have no external theConfig dependency.

pnstickne · 2020-08-23T19:08:34Z

src/vt/runtime/runtime_get.cc

-}
+} /* end namespace vt */
+
+namespace vt { namespace debug {


Only meant to be used from the debug/diagnostic code, hence the vt::debug namespace. This also aligns with vt::debug::preNode. Only single access is exposed, freeing singular-use case sites of logic.

pnstickne · 2020-08-23T19:09:51Z

src/vt/runtime/runtime_get.cc

 }

-} /* end namespace runtime */
-
-#undef CUR_RT


Not relevant to undef in .cc

src/vt/vrt/collection/balance/baselb/baselb.cc

src/vt/vrt/collection/balance/read_lb.cc

src/vt/vrt/collection/balance/read_lb.h

cz4rs

the builds look fine and the PR looks good overall, only minor comments to address

- Tests that use TestHarness (as opposed to TestParallelHarness) should be in files with a '.nompi' somewhere in their extension. These tests will be invoked as direct executables without going through mpirun (and thus should not invoke an MPI calls). separate out .nompi tests in cmake cmake;

- Move some exist tests over to such.

- The spec API has changed slightly to increase encapsulation. The 'openSpec(filename)' call now replaces disparate calls to read/check the specification and avoid any internal assumptions about which file is read.

- The preConfig method is like the preNode method. It is SAFE to use without a run-time, freeing any debug print from being explicitly tied to the RT. (However, the defaults are 'flags off', so debugging is effectively entirely disabled until the VT init process starts.)

- Small changes for clarity.

lifflander · 2020-08-25T18:48:26Z

Here is an overview of what got changed by this pull request:

Clones added
============
- tests/unit/lb/test_lb_reader.nompi.cc  2

See the complete overview on Codacy

pnstickne changed the title ~~990 nompi tests~~ 990 - Non-MPI tests run without MPI Aug 20, 2020

pnstickne requested review from lifflander, bradybray, cz4rs, fnrizzi and JacobDomagala August 20, 2020 06:22

pnstickne commented Aug 20, 2020

View reviewed changes

pnstickne marked this pull request as ready for review August 20, 2020 06:47

pnstickne force-pushed the 990-nompi-tests branch from 62d2dcb to 9dedbdd Compare August 20, 2020 07:07

cz4rs reviewed Aug 20, 2020

View reviewed changes

tests/unit/lb/test_lb_reader.nompi.cc Show resolved Hide resolved

pnstickne force-pushed the 990-nompi-tests branch 2 times, most recently from 08d6616 to 633cfe8 Compare August 21, 2020 22:33

pnstickne force-pushed the 990-nompi-tests branch 2 times, most recently from 6b65305 to 565be76 Compare August 21, 2020 23:16

pnstickne requested a review from cz4rs August 22, 2020 00:08

pnstickne commented Aug 23, 2020

View reviewed changes

src/vt/runtime/runtime_get.cc

}

} /* end namespace runtime */

#undef CUR_RT

Copy link

Contributor Author

pnstickne Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to undef in .cc

pnstickne commented Aug 23, 2020

View reviewed changes

src/vt/vrt/collection/balance/baselb/baselb.cc Show resolved Hide resolved

bradybray reviewed Aug 23, 2020

View reviewed changes

src/vt/vrt/collection/balance/read_lb.cc Show resolved Hide resolved

src/vt/vrt/collection/balance/read_lb.cc Outdated Show resolved Hide resolved

cz4rs reviewed Aug 24, 2020

View reviewed changes

src/vt/vrt/collection/balance/read_lb.h Outdated Show resolved Hide resolved

cz4rs requested changes Aug 24, 2020

View reviewed changes

pnstickne requested review from cz4rs and bradybray August 25, 2020 07:11

cz4rs approved these changes Aug 25, 2020

View reviewed changes

JacobDomagala mentioned this pull request Aug 25, 2020

959 Add more tests for various load models Part 1 #1001

Merged

4 tasks

PhilMiller approved these changes Aug 25, 2020

View reviewed changes

JacobDomagala approved these changes Aug 25, 2020

View reviewed changes

pnstickne added 5 commits August 25, 2020 11:45

#990 tests- move files to 'nompi' as appropriate

74d9d9f

- Move some exist tests over to such.

#990 lb reader- remove theConfig dep and avoid ParallelTestHarness

e99ca28

- The spec API has changed slightly to increase encapsulation. The 'openSpec(filename)' call now replaces disparate calls to read/check the specification and avoid any internal assumptions about which file is read.

#990 lb reader- update code per PR

83b1f75

- Small changes for clarity.

pnstickne force-pushed the 990-nompi-tests branch from 545586f to 83b1f75 Compare August 25, 2020 18:46

lifflander mentioned this pull request Aug 25, 2020

Meeting Agenda [do not close] #925

Open

PhilMiller merged commit 306d713 into develop Aug 25, 2020

PhilMiller mentioned this pull request Aug 25, 2020

Don't use MPI to launch non-parallel tests #990

Closed

PhilMiller linked an issue Aug 25, 2020 that may be closed by this pull request

Don't use MPI to launch non-parallel tests #990

Closed

cz4rs mentioned this pull request Jun 15, 2021

move vt version number into a separate file #1467

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

990 - Non-MPI tests run without MPI #998

990 - Non-MPI tests run without MPI #998

pnstickne commented Aug 20, 2020 •

edited

Loading

pnstickne Aug 20, 2020

codecov bot commented Aug 20, 2020 •

edited

Loading

cz4rs commented Aug 20, 2020

cz4rs commented Aug 20, 2020 •

edited

Loading

pnstickne commented Aug 21, 2020 •

edited

Loading

pnstickne commented Aug 21, 2020 •

edited

Loading

pnstickne Aug 23, 2020 •

edited

Loading

pnstickne Aug 23, 2020

cz4rs left a comment

lifflander commented Aug 25, 2020

990 - Non-MPI tests run without MPI #998

990 - Non-MPI tests run without MPI #998

Conversation

pnstickne commented Aug 20, 2020 • edited Loading

pnstickne Aug 20, 2020

Choose a reason for hiding this comment

codecov bot commented Aug 20, 2020 • edited Loading

Codecov Report

cz4rs commented Aug 20, 2020

cz4rs commented Aug 20, 2020 • edited Loading

pnstickne commented Aug 21, 2020 • edited Loading

pnstickne commented Aug 21, 2020 • edited Loading

pnstickne Aug 23, 2020 • edited Loading

Choose a reason for hiding this comment

pnstickne Aug 23, 2020

Choose a reason for hiding this comment

cz4rs left a comment

Choose a reason for hiding this comment

lifflander commented Aug 25, 2020

pnstickne commented Aug 20, 2020 •

edited

Loading

codecov bot commented Aug 20, 2020 •

edited

Loading

cz4rs commented Aug 20, 2020 •

edited

Loading

pnstickne commented Aug 21, 2020 •

edited

Loading

pnstickne commented Aug 21, 2020 •

edited

Loading

pnstickne Aug 23, 2020 •

edited

Loading