You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from the log we can see that the leader keep trying append log to follower without success:
...
W1111 16:22:41.996908 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.013818 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.030689 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.047605 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.064508 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.081408 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.098253 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.115130 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.132002 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
W1111 16:22:42.148905 1509746 RaftPart.cpp:944] [Port: 46162, Space: 1, Part: 1] Only 0 hosts succeeded, Need to try again
Part of leader storage stack, seems that it was blocked on ChainResumeProcessor.cpp:57
1162 #8 0x0000000002e36992 in folly::Future<folly::Unit>::get()+61 in /data/src/nebula/build/bin/nebula-storaged at Future-inl.h:2323
1163 #9 0x0000000002e362bd in nebula::storage::ChainResumeProcessor::process()+2010 in /data/src/nebula/build/bin/nebula-storaged at ChainResumeProcessor.cpp:57
1164 #10 0x0000000002df7dd1 in nebula::storage::TransactionManager::resumeThread()+94 in /data/src/nebula/build/bin/nebula-storaged at TransactionManager.cpp:87
1165 #11 0x0000000002e1d3e7 in std::__invoke_impl<void, void (nebula::storage::TransactionManager::*&)(), nebula::storage::TransactionManager*&>()+107 in /data/src/nebula/build/bin/nebula-storaged at invoke.h:73
1166 #12 0x0000000002e1d346 in std::__invoke<void (nebula::storage::TransactionManager::*&)(), nebula::storage::TransactionManager*&>()+56 in /data/src/nebula/build/bin/nebula-storaged at invoke.h:95
1167 #13 0x0000000002e1d248 in std::_Bind<void (nebula::storage::TransactionManager::*(nebula::storage::TransactionManager*))()>::__call<void, 0>()+91 in /data/src/nebula/build/bin/nebula-storaged at functional:400
1168 #14 0x0000000002e1cfe7 in std::_Bind<void (nebula::storage::TransactionManager::*(nebula::storage::TransactionManager*))()>::operator()<>()+54 in /data/src/nebula/build/bin/nebula-storaged at functional:484
1169 #15 0x0000000002e1bce2 in std::_Function_handler<void(), std::_Bind<void (nebula::storage::TransactionManager::*(nebula::storage::TransactionManager*))()> >::_M_invoke()+33 in /data/src/nebula/build/bin/nebula-storaged at std_function.h:300
1170 #16 0x0000000002d7c4dc in std::function<void()>::operator()()+53 in /data/src/nebula/build/bin/nebula-storaged at std_function.h:688
1171 #17 0x0000000002dfbbe8 in _ZZN6nebula6thread13GenericWorker12addDelayTaskIMNS_7storage18TransactionManagerEFvvEJPS4_EEENSt9enable_ifIXsrSt7is_voidINSt9result_ofIFT_DpT0_EE4typeEE5valueEN5folly10SemiFutureINSI_4UnitEEEE4typeEmOSB_DpOSC_ENKUlvE_clEv!()+51 in /data/src/nebula/buil d/bin/nebula-storaged at GenericWorker.h:215
1172 #18 0x0000000002e18274 in std::__invoke_impl<void, nebula::thread::GenericWorker::addDelayTask(size_t, F&&, Args&& ...) [with F = void (nebula::storage::TransactionManager::*)(); Args = {nebula::storage::TransactionManager*}]::<lambda()>&>()+33 in /data/src/nebula/build/bin/neb ula-storaged at invoke.h:60
1173 #19 0x0000000002e151c1 in std::__invoke<nebula::thread::GenericWorker::addDelayTask(size_t, F&&, Args&& ...) [with F = void (nebula::storage::TransactionManager::*)(); Args = {nebula::storage::TransactionManager*}]::<lambda()>&>()+33 in /data/src/nebula/build/bin/nebula-storage d at invoke.h:95
1174 #20 0x0000000002e11b80 in std::_Bind<nebula::thread::GenericWorker::addDelayTask(size_t, F&&, Args&& ...) [with F = void (nebula::storage::TransactionManager::*)(); Args = {nebula::storage::TransactionManager*}; typename std::enable_if<std::is_void<typename std::result_of<_Func tor(_ArgTypes ...)>::type>::value, folly::SemiFuture<folly::Unit> >::type = folly::SemiFuture<folly::Unit>; size_t = long unsigned int]::<lambda()>()>::__call<void>()+29 in /data/src/nebula/build/bin/nebula-storaged at functional:400
1175 #21 0x0000000002e0ed97 in std::_Bind<nebula::thread::GenericWorker::addDelayTask(size_t, F&&, Args&& ...) [with F = void (nebula::storage::TransactionManager::*)(); Args = {nebula::storage::TransactionManager*}; typename std::enable_if<std::is_void<typename std::result_of<_Func tor(_ArgTypes ...)>::type>::value, folly::SemiFuture<folly::Unit> >::type = folly::SemiFuture<folly::Unit>; size_t = long unsigned int]::<lambda()>()>::operator()<>()+54 in /data/src/nebula/build/bin/nebula-storaged at functional:484
1176 #22 0x0000000002e0aa85 in std::_Function_handler<void(), std::_Bind<nebula::thread::GenericWorker::addDelayTask(size_t, F&&, Args&& ...) [with F = void (nebula::storage::TransactionManager::*)(); Args = {nebula::storage::TransactionManager*}; typename std::enable_if<std::is_voi d<typename std::result_of<_Functor(_ArgTypes ...)>::type>::value, folly::SemiFuture<folly::Unit> >::type = folly::SemiFuture<folly::Unit>; size_t = long unsigned int]::<lambda()>()> >::_M_invoke()+33 in /data/src/nebula/build/bin/nebula-storaged at std_function.h:300
1177 #23 0x0000000002d7c4dc in std::function<void()>::operator()()+53 in /data/src/nebula/build/bin/nebula-storaged at std_function.h:688
Please check the FAQ documentation before raising an issue
Describe the bug (required)
Leader storage instance blocked when exiting, also block two other follower instances.
Your Environments (required)
uname -a
g++ --version
orclang++ --version
lscpu
How To Reproduce(required)
5storage + 1meta + 1graph, keep stopping and resuming leader, run for a while, stop the cluster, we see three storage instance failed to exit.
the leader which is process of pid run with very high cpu:
from the log we can see that the leader keep trying append log to follower without success:
Part of leader storage stack, seems that it was blocked on ChainResumeProcessor.cpp:57
ChainResumeProcessor.cpp:57
https://github.com/critical27/nebula/blob/74ceaeae356233dfdac044993f950f38c3037f5b/src/storage/transaction/ChainResumeProcessor.cpp#L45-L62
leader storage pstack:
1509717.txt
pstack of stucked followers:
1431246.txt
1431404.txt
The text was updated successfully, but these errors were encountered: