Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

990 - Non-MPI tests run without MPI #998

Merged
merged 5 commits into from
Aug 25, 2020
Merged

990 - Non-MPI tests run without MPI #998

merged 5 commits into from
Aug 25, 2020

Conversation

pnstickne
Copy link
Contributor

@pnstickne pnstickne commented Aug 20, 2020

Tests that use the TestHarness (not the ParallelTestHarness) don't / shouldn't need or use MPI.

When run under MPI, multiple processes are still spawned which can result in unexpected conflicts that can't be gated via rank because.. well, there is no rank.

As a side-effect of these "nompi" tests only running once, there is a slight reduction of test executions overall.

The "nompi" tests implicitly guarantee that they can't use MPI (as they will fail if an attempt is made). The reverse is no true, although it's largely inconsequential to spin up VT/MPI in the ParallelTestHarness and then not use it.

@pnstickne pnstickne changed the title 990 nompi tests 990 - Non-MPI tests run without MPI Aug 20, 2020
set_tests_properties(
${${CUR_TEST_LIST}}
PROPERTIES TIMEOUT 60
FAIL_REGULAR_EXPRESSION "FAILED;should be deleted but never is;Segmentation fault"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always wonder about that line..

@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #998 into develop will decrease coverage by 0.02%.
The diff coverage is 88.23%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #998      +/-   ##
===========================================
- Coverage    77.43%   77.41%   -0.03%     
===========================================
  Files          660      656       -4     
  Lines        25204    25130      -74     
===========================================
- Hits         19517    19454      -63     
+ Misses        5687     5676      -11     
Impacted Files Coverage Δ
src/vt/vrt/collection/balance/read_lb.h 58.82% <ø> (ø)
tests/unit/epoch/test_epoch.nompi.cc 100.00% <ø> (ø)
tests/unit/index/test_index.nompi.cc 100.00% <ø> (ø)
tests/unit/index/test_index_linearization.nompi.cc 100.00% <ø> (ø)
tests/unit/mapping/test_mapping.nompi.cc 100.00% <ø> (ø)
tests/unit/tls/test_tls.nompi.cc 100.00% <ø> (ø)
tests/unit/utils/test_demangler_utils.nompi.cc 100.00% <ø> (ø)
tests/unit/utils/test_histogram_approx.nompi.cc 100.00% <ø> (ø)
tests/unit/utils/test_linear_regression.nompi.cc 100.00% <ø> (ø)
tests/unit/utils/test_safe_union.nompi.cc 96.92% <ø> (ø)
... and 13 more

@pnstickne pnstickne marked this pull request as ready for review August 20, 2020 06:47
@cz4rs
Copy link
Contributor

cz4rs commented Aug 20, 2020

I have switched test_lb_reader.cc to parallel harness to get proper initialization before running the test

  • when you include TestHarness header, you will start getting segfaults - it's because theConfig() is used directly here both by tested code and in test setup (getting LB file name, debug printing)
  • this means that in order to ditch parallel harness, we have to get theConfig and theContext components properly initialized in some other way

sample stacktrace:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555fea37c in vt::theConfig () at ../src/vt/runtime/runtime_get.cc:136
136     vt::arguments::AppConfig*   theConfig()             { return &CUR_RT->theArgConfig->config_;      }
(gdb) bt
#0  0x0000555555fea37c in vt::theConfig () at ../src/vt/runtime/runtime_get.cc:136
#1  0x0000555555f97407 in vt::tests::unit::TestLBReader_test_lb_read_1_Test::TestBody (this=0x555556a97d50) at ../tests/unit/lb/test_lb_reader.cc:98
#2  0x0000555555fd633b in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x555556a97d50, method=&virtual testing::Test::TestBody(), 
    location=0x555556660ff3 "the test body") at ../tests/extern/googletest/googletest/src/gtest.cc:2433
#3  0x0000555555fcee51 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x555556a97d50, method=&virtual testing::Test::TestBody(), 
    location=0x555556660ff3 "the test body") at ../tests/extern/googletest/googletest/src/gtest.cc:2469
#4  0x0000555555fa9c8c in testing::Test::Run (this=0x555556a97d50) at ../tests/extern/googletest/googletest/src/gtest.cc:2508
#5  0x0000555555faa677 in testing::TestInfo::Run (this=0x555556a85fa0) at ../tests/extern/googletest/googletest/src/gtest.cc:2684
#6  0x0000555555faadb2 in testing::TestSuite::Run (this=0x555556a86430) at ../tests/extern/googletest/googletest/src/gtest.cc:2816
#7  0x0000555555fb6e6e in testing::internal::UnitTestImpl::RunAllTests (this=0x555556a860f0) at ../tests/extern/googletest/googletest/src/gtest.cc:5338
#8  0x0000555555fd7864 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x555556a860f0, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x555555fb6a66 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x555556661a30 "auxiliary test code (environments or event listeners)") at ../tests/extern/googletest/googletest/src/gtest.cc:2433
#9  0x0000555555fd008f in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x555556a860f0, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x555555fb6a66 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x555556661a30 "auxiliary test code (environments or event listeners)") at ../tests/extern/googletest/googletest/src/gtest.cc:2469
#10 0x0000555555fb56ae in testing::UnitTest::Run (this=0x555556a70e40 <testing::UnitTest::GetInstance()::instance>) at ../tests/extern/googletest/googletest/src/gtest.cc:4925
#11 0x0000555555f77623 in RUN_ALL_TESTS () at ../tests/extern/googletest/googletest/include/gtest/gtest.h:2473
#12 0x0000555555f76967 in main (argc=1, argv=0x7fffffffda98) at ../tests/unit/main.cc:70

@cz4rs
Copy link
Contributor

cz4rs commented Aug 20, 2020

@pnstickne
some more thoughts on test_lb_reader.cc:
the LB reader tests have been unstable for some time (I believe this is independent from changing the test harness, although it could have made the issue worse)

I have run them locally a couple of times and they fail quite regularly because of some file reading problems:

vt: [2] ------------------------------------------------------------------------------------------------------------------------
vt: [2] ------------------------------------------- Runtime Error: System Aborting! --------------------------------------------
vt: [2] ------------------------------------------------ Fatal Error on Node 2 -------------------------------------------------
vt: [2] ------------------------------------------------------------------------------------------------------------------------
vt: [2] 
vt: [2]              Reason: Valid LB not found: "name="

vt: [2]                Type: vtAbort() Invoked
vt: [2]                Node: 2
vt: [2]           Num Nodes: 4
vt: [2]                File: ../src/vt/vrt/collection/balance/read_lb.cc
vt: [2]                Line: 211
vt: [2]            Function: readFile

I think this could be extracted into a separate issue with 2 goals:

  • remove parallel harness and make this a nompi test
  • improve file handling in the test code (using unique names / temporary files as suggested by Phil during weekly meeting)

@pnstickne
Copy link
Contributor Author

pnstickne commented Aug 21, 2020

I think this could be extracted into a separate issue with 2 goals:

  • remove parallel harness and make this a nompi test
  • improve file handling in the test code (using unique names / temporary files as suggested by Phil during weekly meeting)

Removing the dependencies is the 'proper' method. This test has never a reason to be run in parallel or MPI.

Thee was not usages of theConfig on local branches. That should probably be able to be stubbed in tests without a dependency.

@pnstickne pnstickne force-pushed the 990-nompi-tests branch 2 times, most recently from 08d6616 to 633cfe8 Compare August 21, 2020 22:33
@pnstickne
Copy link
Contributor Author

pnstickne commented Aug 21, 2020

I have switched test_lb_reader.cc to parallel harness to get proper initialization before running the test

The debug_ support now uses a preConfig (much like preNode) which is guaranteed to never be invalid. This allows vtAssert, debug_print, etc. to appear in code that can be used from a pure unit test without causing a SEGV.

Also update the LB reading to have no external theConfig dependency.

@pnstickne pnstickne force-pushed the 990-nompi-tests branch 2 times, most recently from 6b65305 to 565be76 Compare August 21, 2020 23:16
@pnstickne pnstickne requested a review from cz4rs August 22, 2020 00:08
}
} /* end namespace vt */

namespace vt { namespace debug {
Copy link
Contributor Author

@pnstickne pnstickne Aug 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only meant to be used from the debug/diagnostic code, hence the vt::debug namespace. This also aligns with vt::debug::preNode. Only single access is exposed, freeing singular-use case sites of logic.

}

} /* end namespace runtime */

#undef CUR_RT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to undef in .cc

Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builds look fine and the PR looks good overall, only minor comments to address

@pnstickne pnstickne requested review from cz4rs and bradybray August 25, 2020 07:11
- Tests that use TestHarness (as opposed to TestParallelHarness)
  should be in files with a '.nompi' somewhere in their
  extension.

  These tests will be invoked as direct executables without going
  through mpirun (and thus should not invoke an MPI calls).

separate out .nompi tests in cmake

cmake;
- Move some exist tests over to such.
- The spec API has changed slightly to increase encapsulation.

  The 'openSpec(filename)' call now replaces disparate calls
  to read/check the specification and avoid any internal assumptions
  about which file is read.
- The preConfig method is like the preNode method.
  It is SAFE to use without a run-time, freeing any debug print
  from being explicitly tied to the RT.

  (However, the defaults are 'flags off', so debugging is
   effectively entirely disabled until the VT init process starts.)
- Small changes for clarity.
Copy link
Collaborator

Codacy Here is an overview of what got changed by this pull request:

Clones added
============
- tests/unit/lb/test_lb_reader.nompi.cc  2
         

See the complete overview on Codacy

@PhilMiller PhilMiller merged commit 306d713 into develop Aug 25, 2020
@PhilMiller PhilMiller linked an issue Aug 25, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't use MPI to launch non-parallel tests
6 participants