Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kvrocks crashed in Cluster::updateSlotsInfo() #1598

Closed
2 tasks done
mnizhq007 opened this issue Jul 20, 2023 · 6 comments · Fixed by #1615
Closed
2 tasks done

Kvrocks crashed in Cluster::updateSlotsInfo() #1598

mnizhq007 opened this issue Jul 20, 2023 · 6 comments · Fixed by #1615
Labels
bug type bug crash type crash

Comments

@mnizhq007
Copy link

mnizhq007 commented Jul 20, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

Version

  • CentOS Linux release 7.9.2009 (Core)
  • kvrocks 2.4.0
ldd /usr/local/bin/kvrocks2.4.0 
        linux-vdso.so.1 =>  (0x00007fff09cb9000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f98c9f80000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f98c9d7c000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f98c9b60000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f98c985e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f98c9490000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f98ca188000)

yum list glibc
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Installed Packages
glibc.x86_64                                                                                                   2.17-326.el7_9                                                                                                   @Nexus-Update
Available Packages
glibc.i686                                                                                                     2.17-326.el7_9                                                                                                   Nexus-Update

Minimal reproduce step

kernel: worker[45424]: segfault at 7fbea660c1 ip 0000000000aac777 sp 00007fbe9cdf27f0 error 4 in kvrocks2.4.0[406000+8cb000]

What did you expect to see?

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/bin/kvrocks2.4.0...done.
[New LWP 45424]
[New LWP 48250]
[New LWP 45381]
[New LWP 45421]
[New LWP 45382]
[New LWP 45418]
[New LWP 45426]
[New LWP 45422]
[New LWP 45414]
[New LWP 45417]
[New LWP 45420]
[New LWP 45427]
[New LWP 45383]
[New LWP 45425]
[New LWP 45423]
[New LWP 45380]
[New LWP 45419]
[New LWP 45372]
[New LWP 45416]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `kvrocks2.4.0 -c /data/kvrocks/conf/kvrocks-cluster-6669.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000aac777 in evmap_io_active_ (base=base@entry=0x7fbeaa921200, fd=131, events=34) at /root/kvrocks-2.4.0/build/_deps/libevent-src/evmap.c:434

warning: Source file is more recent than executable.
434             LIST_FOREACH(ev, &ctx->events, ev_io_next) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64

What did you see instead?

I want it to run stably
But I can't solve this problem

Anything Else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@mnizhq007 mnizhq007 added the bug type bug label Jul 20, 2023
@mnizhq007
Copy link
Author

mnizhq007 commented Jul 28, 2023

The following are the issues encountered with the kvrocks 2.5.0 -

W20230727 23:30:41.603250 38680 event_listener.cc:132] [event_listener/stall_cond_changed] column family: metadata write stall condition was changed, from stop to normal
W20230727 23:30:43.232915 37553 event_listener.cc:132] [event_listener/stall_cond_changed] column family: metadata write stall condition was changed, from normal to stop
E20230727 23:30:45.543407 37663 main.cc:84] ======= Ooops! kvrocks version 2.5.0 got signal: Segmentation fault (11) =======
E20230727 23:30:45.576381 37663 main.cc:101] /lib64/libpthread.so.0(+0xf62f) [0x7f8ceedaa62f]
E20230727 23:30:45.577598 37663 main.cc:99] /lib64/libc.so.6(+0x15bbed) [0x7f8cee826bed]      __memmove_ssse3_back
E20230727 23:30:45.594018 37663 main.cc:99] kvrocks2.5.0() [0xc5dcc1]                         std::string::_M_mutate()
E20230727 23:30:45.594180 37663 main.cc:99] kvrocks2.5.0() [0x4a9025]                         Cluster::updateSlotsInfo()
E20230727 23:30:45.599767 37663 main.cc:99] kvrocks2.5.0() [0x4a99d8]                         Cluster::genNodesDescription()
E20230727 23:30:45.604859 37663 main.cc:99] kvrocks2.5.0() [0x4c53b9]                         redis::CommandCluster::Execute()
E20230727 23:30:45.609015 37663 main.cc:99] kvrocks2.5.0() [0x55990a]                         redis::Connection::ExecuteCommands()
E20230727 23:30:45.611111 37663 main.cc:99] kvrocks2.5.0() [0x4bd5e3]                         EvbufCallbackBase<>::readCB()
E20230727 23:30:45.622982 37663 main.cc:99] kvrocks2.5.0() [0xabf22d]                         bufferevent_run_deferred_callbacks_unlocked
E20230727 23:30:45.623812 37663 main.cc:99] kvrocks2.5.0() [0xac54fb]                         event_process_active_single_queue
E20230727 23:30:45.629799 37663 main.cc:99] kvrocks2.5.0() [0xac5eae]                         event_base_loop
E20230727 23:30:45.631390 37663 main.cc:99] kvrocks2.5.0() [0x5783a9]                         _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4util12CreateThreadIZN12WorkerThread5StartEvEUlvE_EE8StatusOrIS_EPKcT_EUlvE_EEEEE6_M_runEv.lto_priv.0
E20230727 23:30:45.634894 37663 main.cc:99] kvrocks2.5.0() [0xcbdc83]                         execute_native_thread_routine
E20230727 23:30:45.635279 37663 main.cc:99] /lib64/libpthread.so.0(+0x7ea4) [0x7f8ceeda2ea4]  start_thread
E20230727 23:30:45.636025 37663 main.cc:99] /lib64/libc.so.6(clone+0x6c) [0x7f8cee7c9b0c]     __clone

@mnizhq007
Copy link
Author

mnizhq007 commented Jul 28, 2023

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8cee826bed in __memmove_ssse3_back () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f8ce47f6700 (LWP 37663))]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libunwind-1.2-2.el7.x86_64
(gdb) bt
#0  0x00007f8cee826bed in __memmove_ssse3_back () from /lib64/libc.so.6
#1  0x0000000000c5dcc2 in std::string::_M_mutate(unsigned long, unsigned long, unsigned long) ()
#2  0x00000000004a9026 in std::string::erase (__n=1, __pos=<optimized out>, this=<optimized out>) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3424
#3  std::string::pop_back (this=<optimized out>) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:4807
#4  Cluster::updateSlotsInfo (this=0x7f8ce9e8eac0) at /root/apache-kvrocks-2.5.0-src/src/cluster/cluster.cc:539
#5  0x00000000004a99d9 in Cluster::genNodesDescription (this=0x7f8ce9e8eac0) at /root/apache-kvrocks-2.5.0-src/src/cluster/cluster.cc:475
#6  0x00000000004c53ba in Cluster::GetClusterNodes (nodes_str=0x7f8ce47ed4a0, this=<optimized out>) at /root/apache-kvrocks-2.5.0-src/src/cluster/cluster.cc:470
#7  redis::CommandCluster::Execute (this=<optimized out>, svr=<optimized out>, conn=0x7f8ce100e800, output=0x7f8ce47ed5b0) at /root/apache-kvrocks-2.5.0-src/src/commands/cmd_cluster.cc:88
#8  0x000000000055990b in redis::Connection::ExecuteCommands (this=<optimized out>, to_process_cmds=<optimized out>) at /root/apache-kvrocks-2.5.0-src/src/server/redis_connection.cc:414
#9  0x00000000004bd5e4 in redis::Connection::OnRead (bev=<optimized out>, this=0x7f8ce100e800) at /root/apache-kvrocks-2.5.0-src/src/server/redis_connection.cc:88
#10 EvbufCallbackBase<redis::Connection, true, true, true>::readCB (bev=<optimized out>, ctx=0x7f8ce100e800) at /root/apache-kvrocks-2.5.0-src/src/common/event_util.h:76
#11 0x0000000000abf22e in bufferevent_run_deferred_callbacks_unlocked (cb=<optimized out>, arg=0x7f8ce1000c80) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/bufferevent.c:208
#12 0x0000000000ac54fc in event_process_active_single_queue (base=base@entry=0x7f8cee0cd400, activeq=0x7f8cee026210, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0)
    at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/event.c:1720
#13 0x0000000000ac5eaf in event_process_active (base=0x7f8cee0cd400) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/event.c:1783
#14 event_base_loop (base=0x7f8cee0cd400, flags=0) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/event.c:2006
#15 0x00000000005783aa in Worker::Run (tid=..., this=0x7f8cee08f000) at /root/apache-kvrocks-2.5.0-src/src/server/worker.cc:296
#16 operator() (__closure=<optimized out>, __closure=<optimized out>) at /root/apache-kvrocks-2.5.0-src/src/server/worker.cc:510
#17 operator() (__closure=<optimized out>) at /root/apache-kvrocks-2.5.0-src/src/common/thread_util.h:38
#18 std::__invoke_impl<void, util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > (__f=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#19 std::__invoke<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > (__fn=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:96
#20 std::thread::_Invoker<std::tuple<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > >::_M_invoke<0> (this=<optimized out>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:253
#21 std::thread::_Invoker<std::tuple<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > >::operator() (this=<optimized out>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:260
#22 std::thread::_State_impl<std::thread::_Invoker<std::tuple<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > > >::_M_run(void) (this=<optimized out>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:211
#23 0x0000000000cbdc84 in execute_native_thread_routine ()
#24 0x00007f8ceeda2ea5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f8cee7c9b0d in clone () from /lib64/libc.so.6

@mnizhq007
Copy link
Author

mnizhq007 commented Jul 28, 2023

Program terminated with signal SIGSEGV, Segmentation fault.
#0  evmap_io_active_ (base=base@entry=0x7fee43f35600, fd=<optimized out>, events=34) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/evmap.c:435
435                     if (ev->ev_events & (events & ~EV_ET))
[Current thread is 1 (Thread 0x7fee429fe700 (LWP 8463))]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-326.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libunwind-1.2-2.el7.x86_64
(gdb) bt
#0  evmap_io_active_ (base=base@entry=0x7fee43f35600, fd=<optimized out>, events=34) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/evmap.c:435
#1  0x0000000000acf508 in epoll_dispatch (base=0x7fee43f35600, tv=<optimized out>) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/epoll.c:505
#2  0x0000000000ac5d49 in event_base_loop (base=0x7fee43f35600, flags=0) at /root/apache-kvrocks-2.5.0-src/build/_deps/libevent-src/event.c:1992
#3  0x00000000005783aa in Worker::Run (tid=..., this=0x7fee4848f1c0) at /root/apache-kvrocks-2.5.0-src/src/server/worker.cc:296
#4  operator() (__closure=<optimized out>, __closure=<optimized out>) at /root/apache-kvrocks-2.5.0-src/src/server/worker.cc:510
#5  operator() (__closure=<optimized out>) at /root/apache-kvrocks-2.5.0-src/src/common/thread_util.h:38
#6  std::__invoke_impl<void, util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > (__f=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:61
#7  std::__invoke<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > (__fn=...) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/invoke.h:96
#8  std::thread::_Invoker<std::tuple<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > >::_M_invoke<0> (this=<optimized out>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:253
#9  std::thread::_Invoker<std::tuple<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > >::operator() (this=<optimized out>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:260
#10 std::thread::_State_impl<std::thread::_Invoker<std::tuple<util::CreateThread<WorkerThread::Start()::<lambda()> >(char const*, WorkerThread::Start()::<lambda()>)::<lambda()> > > >::_M_run(void) (this=<optimized out>)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/std_thread.h:211
#11 0x0000000000cbdc84 in execute_native_thread_routine ()
#12 0x00007fee4916aea5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fee48b91b0d in clone () from /lib64/libc.so.6

@git-hulk
Copy link
Member

git-hulk commented Jul 28, 2023

@mnizhq007 Thanks for your detail stack! It looks like caused by the concurrent access to the slots_info.

@PragmaTwice PragmaTwice changed the title my kvrocks crashed Kvrocks crashed in Cluster::updateSlotsInfo() Jul 28, 2023
@PragmaTwice PragmaTwice added the crash type crash label Jul 28, 2023
@git-hulk
Copy link
Member

This bug should be introduced by the PR #1219 because some of the subcommands won't have the exclusive lock, so they may update the slot info at the same time. A lot sorry for this bug.

@caipengbo
Copy link
Contributor

It looks like caused by the concurrent access to the slots_info.

Can we move slots_info out of class ClusterNode, I noticed that it is not shared elsewhere. In the Cluster:: genNodesInfo() and Cluster::genNodesDescription() use an external temporary vector and pass it into the Cluster::updateSlotsInfo? @git-hulk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug type bug crash type crash
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants