We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug I've reproduced this in two contexts. Either in the docker container (GNU gcc-7, debug) or on my Mac with (clang-5).
Run this to reproduce:
ctest -I 157,158 --repeat-until-fail 1000 --output-on-failure .
Assertion that breaks:
vt: [0] lb: LBManager::releaseNow: finished LB, phase=3, invocations=1 vt: [0] lb: BaseLB: Statistic=P_l: max=5.10, min=4.55, sum=19.24, avg=4.81, var=0.04, stdev=0.20, nproc=4, cardinality=4 skewness=0.17, kurtosis=-1.87, npr=4, imb=0.06, num_stats=1 vt: [0] lb: BaseLB: Statistic=O_l: max=0.001, min=0.000, sum=0.02, avg=0.000, var=0.000, stdev=0.000, nproc=64, cardinality=64 skewness=0.02, kurtosis=-1.25, npr=64, imb=1.06, num_stats=2 vt: [0] lb: loadStats: load=4.55, total=19.24, avg=4.81, I=0.06,should_lb=true, auto=true, threshold=0.9390901317338556 vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [1] ------------------------------------------------ Fatal Error on Node 1 ------------------------------------------------- vt: [1] ------------------------------------------------------------------------------------------------------------------------ vt: [1] vt: [1] Reason: Must have object vt: [1] Assertion failed: (theProcStats()->hasObjectToMigrate(obj_id)) vt: [1] Node: 1 vt: [1] Num Nodes: 4 vt: [1] File: /vt/src/vt/vrt/collection/balance/baselb/baselb.cc vt: [1] Line: 230 vt: [1] Function: transferMigrations vt: [1] Code: 1 vt: [1] Build SHA: 181e188d3fca91bab0a2d0efc765d8366031e5da vt: [1] Build Ref: refs/heads/develop vt: [1] Description: heads/develop-0-g181e188d3f vt: [1] GIT Repo: *dirty* vt: [1] Hostname: 41fe2b81da16 vt: [1] vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [2] ------------------------------------------------ Fatal Error on Node 2 ------------------------------------------------- vt: [2] ------------------------------------------------------------------------------------------------------------------------ vt: [2] vt: [2] Reason: Must have object vt: [2] Assertion failed: (theProcStats()->hasObjectToMigrate(obj_id)) vt: [2] Node: 2 vt: [2] Num Nodes: 4 vt: [2] File: /vt/src/vt/vrt/collection/balance/baselb/baselb.cc vt: [2] Line: 230 vt: [2] Function: transferMigrations vt: [2] Code: 1 vt: [2] Build SHA: 181e188d3fca91bab0a2d0efc765d8366031e5da vt: [2] Build Ref: refs/heads/develop vt: [2] Description: heads/develop-0-g181e188d3f vt: [2] GIT Repo: *dirty* vt: [2] Hostname: 41fe2b81da16 vt: [2] vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] ------------------------------------------- Runtime Error: System Aborting! -------------------------------------------- vt: [3] ------------------------------------------------ Fatal Error on Node 3 ------------------------------------------------- vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] vt: [3] Reason: Must have object vt: [3] Assertion failed: (theProcStats()->hasObjectToMigrate(obj_id)) vt: [3] Node: 3 vt: [3] Num Nodes: 4 vt: [3] File: /vt/src/vt/vrt/collection/balance/baselb/baselb.cc vt: [3] Line: 230 vt: [3] Function: transferMigrations vt: [3] Code: 1 vt: [3] Build SHA: 181e188d3fca91bab0a2d0efc765d8366031e5da vt: [3] Build Ref: refs/heads/develop vt: [3] Description: heads/develop-0-g181e188d3f vt: [3] GIT Repo: *dirty* vt: [3] Hostname: 41fe2b81da16 vt: [3] vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] -------------------------------------------- Dump Stack Backtrace on Node 3 -------------------------------------------- vt: [3] ------------------------------------------------------------------------------------------------------------------------ vt: [3] 0 18 0x55be2ff00548 vt::debug::stack::dumpStack[abi:cxx11](int) + 83 vt: [3] 1 18 0x55be2fb00c98 vt::runtime::Runtime::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool) + 1868 vt: [3] 2 18 0x55be2f99e3cf vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool, bool) + 209 vt: [3] 3 18 0x55be2f99d163 vt::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool, bool) + 143 vt: [3] 4 18 0x55be2f78b85f std::enable_if<std::tuple_size<std::tuple<> >::value==(0), void>::type vt::debug::assert::assertOut<>(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::tuple<>&&) + 359 vt: [3] 5 18 0x55be30102aae vt::vrt::collection::lb::BaseLB::transferMigrations(vt::vrt::collection::lb::TransferMsg<std::vector<std::tuple<unsigned long, short>, std::allocator<std::tuple<unsigned long, short> > > >*) + 682 vt: [3] 6 18 0x55be2fcfe1e6 vt::objgroup::dispatch::Dispatch<vt::vrt::collection::lb::BaseLB>::run(long, vt::messaging::BaseMsg*) + 920 vt: [3] 7 18 0x55be2fd137b6 vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long) + 860 vt: [3] 8 18 0x55be2fd142c8 vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long) + 150 vt: [3] 9 18 0x55be2f7fcb1f vt::runnable::Runnable<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >::runObj(long, vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope>*, short) + 725 vt: [3] 10 18 0x55be2fd143ab ./test_lb_extended(+0x1d393ab) [0x55be2fd143ab] + 0 vt: [3] 11 18 0x55be2fd147bf ./test_lb_extended(+0x1d397bf) [0x55be2fd147bf] + 0 vt: [3] 12 18 0x55be2f79136b std::function<void ()>::operator()() const + 77 vt: [3] 13 18 0x55be2feaee2d vt::sched::PriorityUnit::execute() + 467 vt: [3] 14 18 0x55be2feaec4d vt::sched::PriorityUnit::operator()() + 33 vt: [3] 15 18 0x55be2fea803f vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 691 vt: [3] 16 18 0x55be2fea8a1e vt::sched::Scheduler::scheduler(bool) + 566 vt: [3] 17 18 0x55be2fea8f75 vt::sched::Scheduler::runSchedulerWhile(std::function<bool ()>) + 845 vt: [3] 18 18 0x55be2feaa06b vt::runSchedulerThrough(unsigned long) + 145 vt: [3] 19 18 0x55be2feaa4f1 vt::runInEpochCollective(std::function<void ()>&&) + 437 vt: [3] 20 18 0x55be2fcc379c void vt::vrt::collection::balance::LBManager::makeLB<vt::vrt::collection::lb::GreedyLB>(vt::messaging::MsgSharedPtr<vt::vrt::collection::balance::StartLBMsg>) + 702 vt: [3] 21 18 0x55be2fc98fd0 vt::vrt::collection::balance::LBManager::collectiveImpl(unsigned long, vt::vrt::collection::balance::LBType, bool, unsigned long) + 738 vt: [3] 22 18 0x55be2f86c742 void vt::vrt::collection::balance::LBManager::sysLB<vt::vrt::collection::balance::InvokeBaseMsg<vt::collective::reduce::operators::ReduceTMsg<char> > >(vt::vrt::collection::balance::InvokeBaseMsg<vt::collective::reduce::operators::ReduceTMsg<char> >*) + 214 vt: [3] 23 18 0x55be2fcfe7cc vt::objgroup::dispatch::Dispatch<vt::vrt::collection::balance::LBManager>::run(long, vt::messaging::BaseMsg*) + 920 vt: [3] 24 18 0x55be2fd137b6 vt::objgroup::ObjGroupManager::dispatch(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long) + 860 vt: [3] 25 18 0x55be2fd142c8 vt::objgroup::dispatchObjGroup(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >, long) + 150 vt: [3] 26 18 0x55be2f7fcb1f vt::runnable::Runnable<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >::runObj(long, vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope>*, short) + 725 vt: [3] 27 18 0x55be2f7e8924 vt::runnable::Runnable<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >::run(long, void (*)(vt::messaging::BaseMsg*), vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope>*, short, int) + 144 vt: [3] 28 18 0x55be2fd4e7b3 vt::messaging::ActiveMessenger::deliverActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, bool, std::function<void ()>) + 1821 vt: [3] 29 18 0x55be2fd4dfa6 vt::messaging::ActiveMessenger::processActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, int const&, bool, std::function<void ()>) + 476 vt: [3] 30 18 0x55be2fd4d853 ./test_lb_extended(+0x1d72853) [0x55be2fd4d853] + 0 vt: [3] 31 18 0x55be2fd51a52 ./test_lb_extended(+0x1d76a52) [0x55be2fd51a52] + 0 vt: [3] 32 18 0x55be2f79136b std::function<void ()>::operator()() const + 77 vt: [3] 33 18 0x55be2feaee2d vt::sched::PriorityUnit::execute() + 467 vt: [3] 34 18 0x55be2feaec4d vt::sched::PriorityUnit::operator()() + 33 vt: [3] 35 18 0x55be2fea803f vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 691 vt: [3] 36 18 0x55be2fea8a1e vt::sched::Scheduler::scheduler(bool) + 566 vt: [3] 37 18 0x55be2fea8f75 vt::sched::Scheduler::runSchedulerWhile(std::function<bool ()>) + 845 vt: [3] 38 18 0x55be2fc99abd vt::vrt::collection::balance::LBManager::waitLBCollective() + 181 vt: [3] 39 18 0x55be2fbe34f6 vt::vrt::collection::CollectionManager::startPhaseCollective(std::function<void ()>, unsigned long) + 196 vt: [3] 40 18 0x55be2f6a2ef4 ./test_lb_extended(+0x16c7ef4) [0x55be2f6a2ef4] + 0 vt: [3] 41 18 0x55be2f6a4054 ./test_lb_extended(+0x16c9054) [0x55be2f6a4054] + 0 vt: [3] 42 18 0x55be2f79136b std::function<void ()>::operator()() const + 77 vt: [3] 43 18 0x55be2feaa444 vt::runInEpochCollective(std::function<void ()>&&) + 264 vt: [3] 44 18 0x55be2f6a3232 vt::tests::unit::TestLoadBalancer_test_load_balancer_1_Test::TestBody() + 726 vt: [3] 45 18 0x55be2f90cefb void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 101 vt: [3] 46 18 0x55be2f906ef7 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 90 vt: [3] 47 18 0x55be2f8e41d4 testing::Test::Run() + 238 vt: [3] 48 18 0x55be2f8e4b59 testing::TestInfo::Run() + 271 vt: [3] 49 18 0x55be2f8e524f testing::TestSuite::Run() + 297 vt: [3] 50 18 0x55be2f8f0c61 testing::internal::UnitTestImpl::RunAllTests() + 1029 vt: [3] 51 18 0x55be2f90dff3 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 101 vt: [3] 52 18 0x55be2f907dd3 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 90 vt: [3] 53 18 0x55be2f8ef55a testing::UnitTest::Run() + 192 vt: [3] 54 18 0x55be2f677c1e RUN_ALL_TESTS() + 35 vt: [3] 55 18 0x55be2f6769aa main + 109 vt: [3] 56 18 0x7fe93d5a6b97 __libc_start_main + 231 vt: [3] 57 18 0x55be2f6761aa _start + 42 vt: [3] ------------------------------------------------------------------------------------------------------------------------
The text was updated successfully, but these errors were encountered:
This is causing test failures on develop regularly now.
Sorry, something went wrong.
https://github.com/DARMA-tasking/vt/pull/1013/checks?check_run_id=1418863757 again, though the assertion output is different.
Seeing this again in https://dev.azure.com/DARMA-tasking/DARMA/_build/results?buildId=13207&view=logs&j=3dc8fd7e-4368-5a92-293e-d53cefc8c4b3&t=28db5144-7e5d-5c90-2820-8676d630d9d2&l=2376
This is fixed. YAY
No branches or pull requests
Describe the bug
I've reproduced this in two contexts. Either in the docker container (GNU gcc-7, debug) or on my Mac with (clang-5).
Run this to reproduce:
ctest -I 157,158 --repeat-until-fail 1000 --output-on-failure .
Assertion that breaks:
The text was updated successfully, but these errors were encountered: