Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the Space when Rebuilding Index, resulting in storaged crash #3170

Closed
Donald-Su opened this issue Oct 20, 2021 · 1 comment
Closed

Drop the Space when Rebuilding Index, resulting in storaged crash #3170

Donald-Su opened this issue Oct 20, 2021 · 1 comment
Assignees
Labels
type/bug Type: something is unexpected
Milestone

Comments

@Donald-Su
Copy link

Describe the bug (must be provided)

  • a cluster run the same machine(use different port), and work well
  • create the space, and set replica_factor 3
  • Drop the Space when Rebuilding Index, resulting in storaged crash

Your Environments (must be provided)

  • OS:
    • CentOS release 6.6
    • CentOS Linux release 7.3.1611
  • nebula version: v2.5.1

How To Reproduce(must be provided)

Steps to reproduce the behavior:

  1. Step 1:create the space, and then rebuild index, the status like follow
    image

  2. Step 2: drop the space, the status still have QUEUE and RUNNING
    image

  3. Step 3: later, one of the the cluster will crash, use show hosts display "OFFLINE", like follow
    image

Additional context

  • test many times, the status same core in RebuildEdgeIndexTask or RebuildTagIndexTask like follow.
    image

  • the debug info of storaged process

#0 0x00007f9fd86c79d9 in raise () from /lib64/libc.so.6
#1 0x00007f9fd86c90e8 in abort () from /lib64/libc.so.6
#2 0x00000000028316ad in rocksdb::port::PthreadCall(char const*, int) [clone .part.1] ()
#3 0x0000000003df24b0 in rocksdb::port::Mutex::Lock() ()
#4 0x0000000003b8290b in rocksdb::(anonymous namespace)::CleanupIteratorState(void*, void*) ()
#5 0x0000000003d6c8ec in rocksdb::Cleanable::~Cleanable() ()
#6 0x0000000003bfff52 in rocksdb::DBIter::~DBIter() ()
#7 0x0000000003e107cc in rocksdb::ArenaWrappedDBIter::~ArenaWrappedDBIter() ()
#8 0x0000000002b5298b in std::default_delete<rocksdb::Iterator>::operator() (this=0x7f9fd5e08ea8, __ptr=0x7f9fd5e35a00)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/unique_ptr.h:78
#9 0x0000000002b516d3 in std::unique_ptr<rocksdb::Iterator, std::default_delete<rocksdb::Iterator> >::~unique_ptr (this=0x7f9fd5e08ea8, __in_chrg=<optimized out>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/unique_ptr.h:263
#10 0x0000000002b565ff in nebula::kvstore::RocksPrefixIter::~RocksPrefixIter (this=0x7f9fd5e08ea0, __in_chrg=<optimized out>)
at /home/00-NG-Code/V2.5.1/nebula-storage-2.5.1/src/kvstore/RocksEngine.h:59
#11 0x0000000002b56628 in nebula::kvstore::RocksPrefixIter::~RocksPrefixIter (this=0x7f9fd5e08ea0, __in_chrg=<optimized out>)
at /home/00-NG-Code/V2.5.1/nebula-storage-2.5.1/src/kvstore/RocksEngine.h:59
#12 0x00000000028fe0c7 in std::default_delete<nebula::kvstore::KVIterator>::operator() (this=0x7f9f58fef278, __ptr=0x7f9fd5e08ea0)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/unique_ptr.h:78
#13 0x00000000028fdbfb in std::unique_ptr<nebula::kvstore::KVIterator, std::default_delete<nebula::kvstore::KVIterator> >::~unique_ptr (this=0x7f9f58fef278,
__in_chrg=<optimized out>) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/unique_ptr.h:263
#14 0x0000000002a28acc in nebula::storage::RebuildEdgeIndexTask::buildIndexGlobal (this=0x7f9f6bda3100, space=1, part=18, items=...)
at /home/00-NG-Code/V2.5.1/nebula-storage-2.5.1/src/storage/admin/RebuildEdgeIndexTask.cpp:46

#15 0x0000000002a184d7 in nebula::storage::RebuildIndexTask::invoke (this=0x7f9f6bda3100, space=1, part=18, items=...)
at /home/00-NG-Code/V2.5.1/nebula-storage-2.5.1/src/storage/admin/RebuildIndexTask.cpp:84
#16 0x0000000002a24a6c in std::__invoke_impl<nebula::cpp2::ErrorCode, nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*&)(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&), nebula::storage::RebuildIndexTask*&, int&, int&, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > >&> (__f=
@0x7f9f91016c00: (nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*)(nebula::storage::RebuildIndexTask * const, int, int, const std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > &)) 0x2a181de <nebula::storage::RebuildIndexTask::invoke(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&)>, __t=@0x7f9f91016c30: 0x7f9f6bda3100)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/invoke.h:73
#17 0x0000000002a22c0f in std::__invoke<nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*&)(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&), nebula::storage::RebuildIndexTask*&, int&, int&, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > >&> (__fn=
@0x7f9f91016c00: (nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*)(nebula::storage::RebuildIndexTask * const, int, int, const std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > &)) 0x2a181de <nebula::storage::RebuildIndexTask::invoke(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&)>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/invoke.h:96
---Type <return> to continue, or q <return> to quit---
#18 0x0000000002a2145d in std::_Bind<nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*(nebula::storage::RebuildIndexTask*, int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > >))(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&)>::__call<nebula::cpp2::ErrorCode, , 0ul, 1ul, 2ul, 3ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul>) (this=0x7f9f91016c00, __args=<unknown type in /apps/svr/nebula-2.5.1-R0924-glog40-mod/bin/nebula-storaged, CU 0x4c6d96b, DIE 0x4f61e3f>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/functional:469
#19 0x0000000002a1f7e1 in std::_Bind<nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*(nebula::storage::RebuildIndexTask*, int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > >))(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&)>::operator()<, nebula::cpp2::ErrorCode>() (this=0x7f9f91016c00)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/functional:551
#20 0x0000000002a1d498 in std::_Function_handler<nebula::cpp2::ErrorCode (), std::_Bind<nebula::cpp2::ErrorCode (nebula::storage::RebuildIndexTask::*(nebula::storage::RebuildIndexTask*, int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > >))(int, int, std::vector<std::shared_ptr<nebula::meta::cpp2::IndexItem>, std::allocator<std::shared_ptr<nebula::meta::cpp2::IndexItem> > > const&)> >::_M_invoke(std::_Any_data const&) (
__functor=...) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/std_function.h:302
#21 0x00000000029f642e in std::function<nebula::cpp2::ErrorCode ()>::operator()() const (this=0x7f9f58fefba0)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/std_function.h:706
#22 0x00000000029f3862 in nebula::storage::AdminSubTask::invoke (this=0x7f9f58fefba0) at /home/00-NG-Code/V2.5.1/nebula-storage-2.5.1/src/storage/admin/AdminTask.h:30
#23 0x00000000029f5a4d in nebula::storage::AdminTaskManager::runSubTask (this=0x5686280 <nebula::storage::AdminTaskManager::instance()::sAdminTaskManager>, handle=...)
at /home/00-NG-Code/V2.5.1/nebula-storage-2.5.1/src/storage/admin/AdminTaskManager.cpp:162
#24 0x0000000002a0772a in std::__invoke_impl<void, void (nebula::storage::AdminTaskManager::*&)(std::pair<int, int>), nebula::storage::AdminTaskManager*&, std::pair<int, int>&> (__f=
@0x7f9f91000160: (void (nebula::storage::AdminTaskManager::*)(nebula::storage::AdminTaskManager * const, std::pair<int, int>)) 0x29f5844 <nebula::storage::AdminTaskManager::runSubTask(std::pair<int, int>)>, __t=@0x7f9f91000178: 0x5686280 <nebula::storage::AdminTaskManager::instance()::sAdminTaskManager>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/invoke.h:73
#25 0x0000000002a064ad in std::__invoke<void (nebula::storage::AdminTaskManager::*&)(std::pair<int, int>), nebula::storage::AdminTaskManager*&, std::pair<int, int>&> (__fn=
@0x7f9f91000160: (void (nebula::storage::AdminTaskManager::*)(nebula::storage::AdminTaskManager * const, std::pair<int, int>)) 0x29f5844 <nebula::storage::AdminTaskManager::runSubTask(std::pair<int, int>)>) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/invoke.h:95
#26 0x0000000002a04177 in std::_Bind<void (nebula::storage::AdminTaskManager::*(nebula::storage::AdminTaskManager*, std::pair<int, int>))(std::pair<int, int>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x7f9f91000160,
__args=<unknown type in /apps/svr/nebula-2.5.1-R0924-glog40-mod/bin/nebula-storaged, CU 0x403b630, DIE 0x434615c>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/functional:467
#27 0x0000000002a00a9f in std::_Bind<void (nebula::storage::AdminTaskManager::*(nebula::storage::AdminTaskManager*, std::pair<int, int>))(std::pair<int, int>)>::operator()<, void>() (this=0x7f9f91000160) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/functional:551
#28 0x00000000029fde11 in folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (nebula::storage::AdminTaskManager::*(nebula::storage::AdminTaskManager*, std::pair<int, int>))(std::pair<int, int>)> >(folly::detail::function::Data&) (p=...) at /opt/vesoft/third-party/2.0/include/folly/Function.h:385
#29 0x0000000004289706 in folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) ()
#30 0x000000000427c7cc in ?? ()
#31 0x00000000042f58bb in bool folly::AtomicNotificationQueue<folly::Function<void ()> >::drive<folly::EventBase::FuncRunner&>(folly::EventBase::FuncRunner&) ()
#32 0x00000000042f6a2d in non-virtual thunk to folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::handlerReady(unsigned short)
---Type <return> to continue, or q <return> to quit---
()
#33 0x00000000043c782f in ?? ()
#34 0x00000000043c814f in event_base_loop ()
#35 0x00000000042f177e in folly::EventBase::loopBody(int, bool) ()
#36 0x00000000042f1c26 in folly::EventBase::loop() ()
#37 0x00000000042f36d6 in folly::EventBase::loopForever() ()
#38 0x000000000427d309 in folly::IOThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>) ()
#39 0x000000000428aed9 in void folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) ()
#40 0x0000000002860578 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f9f91030320) at /opt/vesoft/third-party/2.0/include/folly/Function.h:400
#41 0x000000000286ffe4 in folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}::operator()() (__closure=0x7f9f91030320)
at /opt/vesoft/third-party/2.0/include/folly/executors/thread_factory/NamedThreadFactory.h:40
#42 0x000000000287f6a2 in std::__invoke_impl<void, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(std::__invoke_other, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}&&) (__f=<unknown type in /apps/svr/nebula-2.5.1-R0924-glog40-mod/bin/nebula-storaged, CU 0x34da9c, DIE 0x6ee021>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/invoke.h:60
#43 0x000000000287a258 in std::__invoke<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(std::__invoke_result&&, (folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}&&)...) (__fn=<unknown type in /apps/svr/nebula-2.5.1-R0924-glog40-mod/bin/nebula-storaged, CU 0x34da9c, DIE 0x6f981e>)
at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/invoke.h:95
#44 0x00000000028b4dde in std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x7f9f91030320) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/thread:234
#45 0x00000000028b46c3 in std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> >::operator()() (
this=0x7f9f91030320) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/thread:243
#46 0x00000000028b1de6 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> > >::_M_run() (this=0x7f9f91030310) at /opt/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/thread:186
#47 0x000000000477c03f in execute_native_thread_routine ()
#48 0x00007f9fd8a5adf3 in start_thread () from /lib64/libpthread.so.0
#49 0x00007f9fd87882cd in clone () from /lib64/libc.so.6
@Donald-Su Donald-Su added the type/bug Type: something is unexpected label Oct 20, 2021
@Sophie-Xie Sophie-Xie modified the milestones: v2.6.0, v3.0.0 Oct 21, 2021
@critical27 critical27 modified the milestones: v3.0.0, v3.1.0 Nov 8, 2021
@critical27
Copy link
Contributor

Thx. It is a bug, but this involves quite a big refactor (we need to make sure rocksdb outlives than all kinds of job). The solution need a mechanism to make sure similar problem won't show up. Maybe don't do the drop temporally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

4 participants