Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在brpc接口内部core,但是使用gdb分析时遇到问题 #165

Closed
adanteng opened this issue Dec 21, 2017 · 27 comments
Closed

在brpc接口内部core,但是使用gdb分析时遇到问题 #165

adanteng opened this issue Dec 21, 2017 · 27 comments
Labels
wontfix Not belonging to other labels

Comments

@adanteng
Copy link

#include <string>
std::stof("a");

在brpc接口内部使用该方法,导致服务core。通过分析core有办法能直接定位到哪一行出现的问题吗?

使用gdb打开core文件,bt后得到下面的信息:

(gdb) bt
#0  0x00007f98eb2b11f7 in raise () from /lib64/libc.so.6
#1  0x00007f98eb2b28e8 in abort () from /lib64/libc.so.6
#2  0x0000000000a7a6b5 in __gnu_cxx::__verbose_terminate_handler() ()
#3  0x0000000000a23406 in __cxxabiv1::__terminate(void (*)()) ()
#4  0x0000000000a7a159 in __cxa_call_terminate ()
#5  0x0000000000a232f4 in __gxx_personality_v0 ()
#6  0x0000000000a7eeb3 in _Unwind_RaiseException_Phase2 ()
#7  0x0000000000a7f6a7 in _Unwind_Resume ()
#8  0x00000000007bdd2a in operator() (this=<optimized out>, obj=<optimized out>) at ./src/brpc/destroyable.h:33
#9  ~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#10 ~DestroyingPtr (this=<synthetic pointer>, __in_chrg=<optimized out>) at ./src/brpc/destroyable.h:39
#11 brpc::policy::ProcessRpcRequest (msg_base=<optimized out>) at src/brpc/policy/baidu_rpc_protocol.cpp:508
#12 0x00000000008df7fa in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7f98a8021930) at src/brpc/input_messenger.cpp:132
#13 0x00000000008e07e4 in operator() (this=<optimized out>, last_msg=0x7f98a8021930) at src/brpc/input_messenger.cpp:138
#14 brpc::InputMessenger::OnNewMessages (m=0x7f987401ac80) at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#15 0x00000000008d04ed in brpc::Socket::ProcessEvent (arg=0x7f987401ac80) at src/brpc/socket.cpp:1049
#16 0x000000000071c1f4 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at src/bthread/task_group.cpp:291
#17 0x0000000000834791 in bthread_make_fcontext ()
#18 0x0000000000000000 in ?? ()


@adanteng
Copy link
Author

我用下面会引起core的代码试验了下:

+char *str;
+str = "GfG";
+*(str+1) = 'n';

会给出业务代码的位置。像stof这种throw std::invalid_argument异常的情况我在非brpc环境下也试验了一下。是可以正常提示具体core位置的。

@jamesge
Copy link
Contributor

jamesge commented Dec 22, 2017

你可以跑下asan

@adanteng
Copy link
Author

adanteng commented Dec 22, 2017

asan是内存监测工具,您的意思是stof这种,因为传入字符串导致的core,能监测出来?

我先用下试试

@jamesge
Copy link
Contributor

jamesge commented Dec 22, 2017

如果你不知道为什么core,asan是帮助找出可能有内存问题的地方。如果你知道那句话铁定crash,但coredump显示位置不准,一般是开了优化的关系。

@adanteng
Copy link
Author

adanteng commented Dec 22, 2017

我在下面的函数中,在CallMethod之前直接std::stof("a"),core文件中提示的位置和最上面代码段中的一致。

#11 brpc::policy::ProcessRpcRequest (msg_base=<optimized out>) at src/brpc/policy/baidu_rpc_protocol.cpp:508

所以,可能是编译brpc的时候,增加了优化项?

CXXFLAGS=$(CPPFLAGS) -O2 -g -rdynamic -pipe -Wall -W -fPIC -fstrict-aliasing -Wno-invalid-offsetof -Wno-unused-parameter -fno-omit-frame-pointer -std=c++0x

我尝试将 -O2 去掉,重新编译实验下

@adanteng
Copy link
Author

不是O2的问题,增加-g,brpc内部函数调用的堆栈信息已经打印出来,可以调试。

不过brpc接口内部的业务代码 throw exception,导致服务core掉,这个core文件,bt后丢掉了业务代码的堆栈。这个问题是什么原因那?

@jamesge
Copy link
Contributor

jamesge commented Dec 22, 2017

O2会影响bt的准确度

@adanteng
Copy link
Author

我去掉了O2,但是rpc接口内部的函数调用栈确实是没有体现在core文件当中。这个事是为啥呢?

@jamesge
Copy link
Contributor

jamesge commented Dec 22, 2017

这是不可能的,说明没有去全。

@adanteng
Copy link
Author

只有下面的堆栈信息:

(gdb) bt
#0  0x00007fc23f2261f7 in raise () from /lib64/libc.so.6
#1  0x00007fc23f2278e8 in abort () from /lib64/libc.so.6
#2  0x00000000010e0235 in __gnu_cxx::__verbose_terminate_handler() ()
#3  0x0000000001089e56 in __cxxabiv1::__terminate(void (*)()) ()
#4  0x00000000010dfcd9 in __cxa_call_terminate ()
#5  0x0000000001089d44 in __gxx_personality_v0 ()
#6  0x00000000010e58e3 in _Unwind_RaiseException_Phase2 ()
#7  0x00000000010e60d7 in _Unwind_Resume ()
#8  0x0000000000da5d75 in brpc::policy::ProcessRpcRequest (msg_base=0x7fc208022700) at src/brpc/policy/baidu_rpc_protocol.cpp:508

如果是我自己非rpc服务,throw exception导致的core,core文件会明确指出哪里出问题

@adanteng
Copy link
Author

brpc怎么修改Makefile,才能保证业务内部堆栈的输出呢?

@adanteng
Copy link
Author

adanteng commented Dec 22, 2017

我也构造了segmentfault类型的错误在rpc的接口中,类似我之前说的:

+char *str;
+str = "GfG";
+*(str+1) = 'n';

这种报错,在core文件中,是有明确体现的,能直接定位问题的位置

@stale
Copy link

stale bot commented Mar 20, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. 由于最近缺乏更新,这个issue已被自动标记为过期。如果接下来几天仍没有更新,它将会被关闭。感谢你的贡献。

@stale stale bot added the wontfix Not belonging to other labels label Mar 20, 2018
@stale stale bot closed this as completed Mar 27, 2018
@kenshinxf
Copy link

我这里有类似的问题, 只不过是我自己实现的thrift协议的时候, 如果上游传来的协议数据有问题, server在解析失败的情况下会抛异常, 现场和这几基本一样.

@adanteng
Copy link
Author

基于brpc构建的应用程序,每个请求是一个bthread,bthread调用的应用程序的方法 throw exception,并且没有catch,导致exception被抛出,在栈回退的时候会包含brpc的各种对象析构,我推测可能是brpc在这块没有处理好。

@d0ngjun
Copy link

d0ngjun commented Aug 14, 2018

@adanteng 遇到了和你一样的问题,call stack也是一样的,实际都是在业务层导致的crash

@jamesge
Copy link
Contributor

jamesge commented Aug 15, 2018

google和baidu的代码规范都不允许使用异常,所以用户callback里抛出异常默认是不支持的。后面在thrift中由于抛异常是常态,所以做了特殊支持

@d0ngjun
Copy link

d0ngjun commented Aug 15, 2018

@jamesge 我同意尽量不适用异常,但是在使用第三方库的时候,难免会有未捕获的异常,这种情况下coredump应该体现导致crash的具体位置。不知道brpc是做了什么处理吗?

另,“用户callback里抛出异常默认是不支持的”是什么意思?

@scottzzq
Copy link

我也遇到类似的问题,跟了一下gcc5.2源码
image
这个地方 fs.personality函数指针是NULL,所以在业务代码抛异常的时候,进程不会挂掉,但是在栈回退到brpc内部的地方fs.personality这个函数指针指向terminal函数,最终执行了abort,导致进程挂掉。

设置personality指针的代码如下:
image

麻烦 @jamesge 看下,谢谢!

@scottzzq
Copy link

执行到brpc内部挂掉:
#0 0x00007f0621a605f7 in raise () from /usr/lib64/libc.so.6
#1 0x00007f0621a61ce8 in abort () from /usr/lib64/libc.so.6
#2 0x0000000000489f11 in myterminate () at brpc/example/echo_c++/server.cpp:322
#3 0x0000000000b651e6 in __cxxabiv1::__terminate (handler=) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x0000000000bf0c09 in __cxa_call_terminate (ue_header=ue_header@entry=0x7f060c047570) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#5 0x0000000000b64a05 in __cxxabiv1::__gxx_personality_v0 (version=, actions=, exception_class=5138137972254386944,
ue_header=, context=0x7f05fdfea8e0) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:676
#6 0x0000000000bf9023 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f060c047570, context=context@entry=0x7f05fdfea8e0) at ../../../libgcc/unwind.inc:62
#7 0x0000000000bf9877 in _Unwind_Resume (exc=exc@entry=0x7f060c047570) at ../../../libgcc/unwind.inc:230
#8 0x0000000000508036 in operator() (this=, obj=) at brpc/src/brpc/destroyable.h:33
#9 ~unique_ptr (this=, __in_chrg=) at /usr/include/c++/5.2.0/bits/unique_ptr.h:236
#10 ~DestroyingPtr (this=, __in_chrg=) at brpc/src/brpc/destroyable.h:39
#11 brpc::policy::ProcessRpcRequest (msg_base=) at brpc/src/brpc/policy/baidu_rpc_protocol.cpp:333
#12 0x000000000055b887 in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7f060c036b20) at brpc/src/brpc/input_messenger.cpp:133
#13 0x000000000055c7c8 in operator() (this=, last_msg=0x7f060c036b20) at brpc/src/brpc/input_messenger.cpp:139
#14 brpc::InputMessenger::OnNewMessages (m=0x7f05ec01ac80) at /usr/include/c++/5.2.0/bits/unique_ptr.h:236
#15 0x00000000004a6fed in brpc::Socket::ProcessEvent (arg=0x7f05ec01ac80) at brpc/src/brpc/socket.cpp:1079
#16 0x0000000000609694 in bthread::TaskGroup::task_runner (skip_remained=) at brpc/src/bthread/task_group.cpp:293
#17 0x00000000005f10e1 in bthread_make_fcontext ()
#18 0x00010102464c457f in ?? ()
#19 0x0000000000000000 in ?? ()

业务执行map.at,抛异常,栈回退的过程完全正常,本来应该在_Unwind_RaiseException_Phase2这个函数中执行fs.personality就会挂掉,但是这个指针为NULL

(gdb) bt
#0 _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7fffd003d5b0, context=context@entry=0x7fffcb3ec780) at ../../../libgcc/unwind.inc:40
#1 0x0000000000bf9877 in _Unwind_Resume (exc=exc@entry=0x7fffd003d5b0) at ../../../libgcc/unwind.inc:230
#2 0x000000000048c0d4 in ~_Rb_tree (this=0x7fffcb3ec940, __in_chrg=) at /usr/include/c++/5.2.0/bits/stl_tree.h:858
#3 ~map (this=0x7fffcb3ec940, __in_chrg=) at /usr/include/c++/5.2.0/bits/stl_map.h:96
#4 example::EchoServiceImpl::Echo (this=, cntl_base=, request=0x7fffd0039ae0, response=0x7fffd0039c38, done=0x7fffd003d4d0)
at brpc/example/echo_c++/server.cpp:254
#5 0x000000000043a075 in example::EchoService::CallMethod (this=, method=, controller=, request=,
response=, done=) at build64_release/brpc/example/echo_c++/echo.pb.cc:675
#6 0x0000000000507d39 in brpc::policy::ProcessRpcRequest (msg_base=0x7fffd002cb20) at brpc/src/brpc/policy/baidu_rpc_protocol.cpp:553
#7 0x000000000055b887 in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7fffd002cb20) at brpc/src/brpc/input_messenger.cpp:133
#8 0x000000000055c7c8 in operator() (this=, last_msg=0x7fffd002cb20) at brpc/src/brpc/input_messenger.cpp:139
#9 brpc::InputMessenger::OnNewMessages (m=0x7fffcc01ac80) at /usr/include/c++/5.2.0/bits/unique_ptr.h:236
#10 0x00000000004a6fed in brpc::Socket::ProcessEvent (arg=0x7fffcc01ac80) at brpc/src/brpc/socket.cpp:1079
#11 0x0000000000609694 in bthread::TaskGroup::task_runner (skip_remained=) at brpc/src/bthread/task_group.cpp:293
#12 0x00000000005f10e1 in bthread_make_fcontext ()
#13 0x00010102464c457f in ?? ()
#14 0x0000000000000000 in ?? ()

@Xavier1994
Copy link

我也遇到了这个问题, bt也是展示了core在brpc里, 所以应该怎么看到core在我业务代码的backtrace上

@d0ngjun
Copy link

d0ngjun commented May 5, 2019

我的做法是在回调内写一个大的try catch,这样可以捕获,避免brpc“吞掉”异常。
或者可以在关键代码块附近添加log,可以辅助判断

@Xavier1994
Copy link

Xavier1994 commented May 5, 2019

@scottzzq 请教一下你的后面一个与业务逻辑相关的backtrace是如何打出来的

@Xavier1994
Copy link

@d0ngjun 可以在brpc回调的函数都加上noexcept, 然后core的时候backtrace就会出来

@guodongxiaren
Copy link
Member

guodongxiaren commented Apr 22, 2021

#include <string>
std::stof("a");

在brpc接口内部使用该方法,导致服务core。通过分析core有办法能直接定位到哪一行出现的问题吗?

使用gdb打开core文件,bt后得到下面的信息:

(gdb) bt
#0  0x00007f98eb2b11f7 in raise () from /lib64/libc.so.6
#1  0x00007f98eb2b28e8 in abort () from /lib64/libc.so.6
#2  0x0000000000a7a6b5 in __gnu_cxx::__verbose_terminate_handler() ()
#3  0x0000000000a23406 in __cxxabiv1::__terminate(void (*)()) ()
#4  0x0000000000a7a159 in __cxa_call_terminate ()
#5  0x0000000000a232f4 in __gxx_personality_v0 ()
#6  0x0000000000a7eeb3 in _Unwind_RaiseException_Phase2 ()
#7  0x0000000000a7f6a7 in _Unwind_Resume ()
#8  0x00000000007bdd2a in operator() (this=<optimized out>, obj=<optimized out>) at ./src/brpc/destroyable.h:33
#9  ~unique_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#10 ~DestroyingPtr (this=<synthetic pointer>, __in_chrg=<optimized out>) at ./src/brpc/destroyable.h:39
#11 brpc::policy::ProcessRpcRequest (msg_base=<optimized out>) at src/brpc/policy/baidu_rpc_protocol.cpp:508
#12 0x00000000008df7fa in brpc::ProcessInputMessage (void_arg=void_arg@entry=0x7f98a8021930) at src/brpc/input_messenger.cpp:132
#13 0x00000000008e07e4 in operator() (this=<optimized out>, last_msg=0x7f98a8021930) at src/brpc/input_messenger.cpp:138
#14 brpc::InputMessenger::OnNewMessages (m=0x7f987401ac80) at /usr/include/c++/4.8.2/bits/unique_ptr.h:184
#15 0x00000000008d04ed in brpc::Socket::ProcessEvent (arg=0x7f987401ac80) at src/brpc/socket.cpp:1049
#16 0x000000000071c1f4 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at src/bthread/task_group.cpp:291
#17 0x0000000000834791 in bthread_make_fcontext ()
#18 0x0000000000000000 in ?? ()

很明显是出现了未捕获的异常,下次遇到这种core栈,给自己的业务代码多加noexcept来缩小排查范围

@muziandmuzi
Copy link

muziandmuzi commented Jun 28, 2023

brpc问题太多了,名不副实啊

@chenBright
Copy link
Contributor

chenBright commented Jun 28, 2023

#2256 应该已经修复了这个问题了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix Not belonging to other labels
Projects
None yet
Development

No branches or pull requests

9 participants