Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

程序内部core但是core栈不完整 #1997

Closed
vinllen opened this issue Nov 16, 2022 · 5 comments
Closed

程序内部core但是core栈不完整 #1997

vinllen opened this issue Nov 16, 2022 · 5 comments

Comments

@vinllen
Copy link

vinllen commented Nov 16, 2022

Is your feature request related to a problem? (你需要的功能是否与某个问题有关?)
目前程序core掉(非brpc问题,是我们程序内部bug),core栈只到brpc,我们用的是brpc+tcmalloc,是否有额外的参数能够看到完整的core栈:

Core was generated by `xxx/6700/bi'.
Program terminated with signal 6, Aborted.
#0  0x00007f17ee2401f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-11.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007f17ee2401f7 in raise () from /lib64/libc.so.6
#1  0x00007f17ee2418e8 in abort () from /lib64/libc.so.6
#2  0x00000000004b0bd7 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src_tree/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00000000012a01e6 in __cxxabiv1::__terminate(void (*)()) () at ../../../../src_tree/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x0000000001332e99 in __cxa_call_terminate (ue_header=ue_header@entry=0x2feaa8f90) at ../../../../src_tree/gcc/libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x000000000129fc25 in __gxx_personality_v0 () at ../../../../src_tree/gcc/libstdc++-v3/libsupc++/eh_personality.cc:676
#6  0x00007f17ee5dd8a3 in ?? () from /lib64/libgcc_s.so.1
#7  0x00007f17ee5dddd7 in _Unwind_Resume () from /lib64/libgcc_s.so.1
#8  0x0000000000489888 in operator() (this=<optimized out>, obj=<optimized out>) at infra-cpp-thirdparty/brpc-096/src/brpc/socket_id.h:46
#9  ~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at external/ks_build_tools/gcc-8.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/unique_ptr.h:274
#10 ~DestroyingPtr (this=<optimized out>, __in_chrg=<optimized out>) at infra-cpp-thirdparty/brpc-096/src/brpc/destroyable.h:42
#11 brpc::policy::ProcessRpcRequest (msg_base=<optimized out>) at infra-cpp-thirdparty/brpc-096/src/brpc/policy/baidu_rpc_protocol.cpp:306
#12 0x0000000000ce5937 in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x56734080) at infra-cpp-thirdparty/brpc-096/src/brpc/input_messenger.cpp:136
#13 0x0000000000ce684a in operator() (this=<synthetic pointer>, last_msg=0x56734080) at infra-cpp-thirdparty/brpc-096/src/brpc/input_messenger.cpp:142
#14 ~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at external/ks_build_tools/gcc-8.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/unique_ptr.h:274
#15 brpc::InputMessenger::OnNewMessages(brpc::Socket*) () at external/ks_build_tools/gcc-8.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/bits/unique_ptr.h:270
#16 0x0000000000d9a9fd in brpc::Socket::ProcessEvent(void*) () at infra-cpp-thirdparty/brpc-096/src/brpc/socket.cpp:1020
#17 0x0000000000e230cf in bthread::TaskGroup::task_runner(long) () at infra-cpp-thirdparty/brpc-096/src/bthread/task_group.cpp:297
#18 0x0000000000e0caa1 in bthread_make_fcontext () at external/ks_build_tools/gcc-8.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/new:169
Cannot access memory at address 0x7f17914b1000
(gdb) p *((brpc::InputMessageBase *)0x56734080)
$1 = {<brpc::Destroyable> = {_vptr.Destroyable = 0x1d94590 <vtable for brpc::policy::MostCommonMessage+16>}, _received_us = 8395195236063, _base_real_us = 1660190435757806, _socket = {_M_t = {
      _M_t = {<std::_Tuple_impl<0, brpc::Socket*, brpc::SocketDeleter>> = {<std::_Tuple_impl<1, brpc::SocketDeleter>> = {<std::_Head_base<1, brpc::SocketDeleter, true>> = {<brpc::SocketDeleter> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, brpc::Socket*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}, _process = 0xcf03a0 <brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)>,
  _arg = 0x7ffe60596d80}

Describe the solution you'd like (描述你期望的解决方法)
同一块代码导致的coredump,有的时候coredump能够打到程序中,有的时候打不出来(只能到brpc层),是否有配置上的建议,使得我们能够打印完整的core栈,已经开启了no-omit-frame-pointer,tcmalloc是链接的gperftools源码。

Describe alternatives you've considered (描述你想到的折衷方案)

Additional context/screenshots (更多上下文/截图)

@wwbmmm
Copy link
Contributor

wwbmmm commented Nov 16, 2022

这就是完整的core栈。bthread的堆栈就是从bthread_make_fcontext 开始的。

@guodongxiaren
Copy link
Member

guodongxiaren commented Nov 16, 2022

这个可能你业务代码抛出了异常,没有catch,异常逃逸到了brpc框架内。本质原因是业务代码的问题,不是brpc的问题。
建议给service函数接口加上noexcept声明,这样不会core栈更准确,不会在brpc框架内部。你业务代码中调用的函数也可以加上noexcept,方便core的时候定位问题。

@wwbmmm
Copy link
Contributor

wwbmmm commented Nov 17, 2022

这个是你业务代码抛出了异常,没有catch,异常逃逸到了brpc框架内。

业务代码抛异常为什么会跑到brpc的 SocketDeleter::operator() 函数里呢?
这种堆栈以前也见过,但想不明白为啥会core在这里

@guodongxiaren
Copy link
Member

guodongxiaren commented Nov 19, 2022

这个是你业务代码抛出了异常,没有catch,异常逃逸到了brpc框架内。

业务代码抛异常为什么会跑到brpc的 SocketDeleter::operator() 函数里呢?

这种堆栈以前也见过,但想不明白为啥会core在这里

struct SocketDeleter {
    void operator()(Socket* m) const {
        DereferenceSocket(m);
    }
};

typedef std::unique_ptr<Socket, SocketDeleter> SocketUniquePtr;

因为这个是SocketDeleter的operator(),SocketDeleter是作为unique_ptr的删除器的。会在unique_ptr对象析构的时候调用。而所有析构函数在C++11以后默认是noexcept的。所以当检测到异常的时候,程序立即终止。

前几天我没仔细看这个网友的栈信息。如果是core在SocketDeleter的operator()中,应该是Socket::Dereference这个函数抛出了异常。
我怀疑是这里

        LOG(FATAL) << "Invalid SocketId=" << id;
        return -1;
    }
    LOG(FATAL) << "Over dereferenced SocketId=" << id;

老问题。当链接了glog的时候,FATAL级别日志,会触发abort抛出异常,导致core。
@wwbmmm

请问你们是不是链接了glog? @vinllen

@vinllen
Copy link
Author

vinllen commented Nov 21, 2022

这个是你业务代码抛出了异常,没有catch,异常逃逸到了brpc框架内。

业务代码抛异常为什么会跑到brpc的 SocketDeleter::operator() 函数里呢?
这种堆栈以前也见过,但想不明白为啥会core在这里

struct SocketDeleter {
    void operator()(Socket* m) const {
        DereferenceSocket(m);
    }
};

typedef std::unique_ptr<Socket, SocketDeleter> SocketUniquePtr;

因为这个是SocketDeleter的operator(),SocketDeleter是作为unique_ptr的删除器的。会在unique_ptr对象析构的时候调用。而所有析构函数在C++11以后默认是noexcept的。所以当检测到异常的时候,程序立即终止。

前几天我没仔细看这个网友的栈信息。如果是core在SocketDeleter的operator()中,应该是Socket::Dereference这个函数抛出了异常。 我怀疑是这里

        LOG(FATAL) << "Invalid SocketId=" << id;
        return -1;
    }
    LOG(FATAL) << "Over dereferenced SocketId=" << id;

老问题。当链接了glog的时候,FATAL级别日志,会触发abort抛出异常,导致core。 @wwbmmm

请问你们是不是链接了glog? @vinllen

我们修改了LOG,统一链接用的外部的spdlog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants