Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

发送http响应时间耗时过长 #1064

Closed
steel2013 opened this issue Mar 16, 2020 · 9 comments
Closed

发送http响应时间耗时过长 #1064

steel2013 opened this issue Mar 16, 2020 · 9 comments

Comments

@steel2013
Copy link

statusvarsconnectionsflagsrpczcpuheapgrowthcontentionmore?
2020/01/13-17:21:50.756843 Received request(368) from 127.0.0.1:33223 http log_id=0 trace=dd6968a2cb63197a span=3fee23e0a136197a
17:21:50.756940 . 97 Processing the request in a new bthread
17:21:50.756950 . 10 Enter HttpService.EntryDoc2_Get
17:21:50.756969 . 19 [ncEACHttpServerUtil.cpp:54] ip: 172.16.251.4
17:21:50.757498 . 6 [ncEACEntryDoc2Handler.cpp:102] this: 0x10df630, cntl: 0x7fea8c216db0, userId: b3c51f50-394f-11e7-b7c9-1866daf2fda0 begin
17:21:50.757513 . 15 [ncEACHttpServerUtil.cpp:54] ip: 172.16.251.4
17:21:50.757517 . 4 [ncACSDocManager.cpp:1549] [BEGIN]this: 0x10dfa10
17:21:50.757522 . 5 [ncACSPermManager.cpp:673] this: 0x10dfaf0, userId: b3c51f50-394f-11e7-b7c9-1866daf2fda0, gnsPath: gns:// begin
17:21:50.757529 . 7 [ncACSShareMgnt.cpp:159] [BEGIN]this: 0x10d5450
17:21:50.757535 . 6 [ncACSShareMgnt.cpp:110] [BEGIN]this: 0x10d5450
17:21:50.758062 . 527 [ncACSShareMgnt.cpp:129] [END]this: 0x10d5450
17:21:50.759630 . 1568 [ncACSShareMgnt.cpp:191] [END]this: 0x10d5450
17:21:50.759647 . 17 [ncACSShareMgnt.cpp:429] [BEGIN]this: 0x10d5450
17:21:50.759983 . 336 [ncACSShareMgnt.cpp:446] [END]this: 0x10d5450
17:21:50.759991 . 8 [ncDBPermManager.cpp:1041] docId: -1945481168, accessToken size: 5 begin
17:21:50.773208 . 13217 [ncDBPermManager.cpp:1094] docId: gns://, accessToken size: 5 end, ret perms size: 396
17:21:50.773468 . 260 [ncDBOwnerManager.cpp:340] [BEGIN]this: 0x10b9540, userId: b3c51f50-394f-11e7-b7c9-1866daf2fda0
17:21:50.774407 . 939 [ncDBOwnerManager.cpp:362] [END]this: 0x10b9540, userId: b3c51f50-394f-11e7-b7c9-1866daf2fda0, ret subObjs size: 1
17:21:50.774452 . 45 [ncACSConfManager.cpp:380] this: 0x10e10e0 begin
17:21:50.774456 . 4 [ncACSConfManager.cpp:385] this: 0x10e10e0 end
17:21:50.774460 . 4 [ncACSShareMgnt.cpp:3354] this: 0x10d5450
17:21:50.775530 . 1070 [ncDBDocManager.cpp:1123] this: 0x10bac10, types size: 0 begin
17:21:50.777138 . 1608 [ncDBDocManager.cpp:1212] this: 0x10bac10, types size: 0 end, ret infos size: 26
17:21:50.777169 . 31 [ncACSPermManager.cpp:884] this: 0x10dfaf0, userId: b3c51f50-394f-11e7-b7c9-1866daf2fda0, size: 26 begin
17:21:50.777174 . 5 [ncACSShareMgnt.cpp:2733] [BEGIN]this: 0x10d5450
17:21:50.777570 . 396 [ncACSShareMgnt.cpp:2749] [END]this: 0x10d5450
17:21:50.777633 . 63 [ncACSProcessorUtil.cpp:345] [BEGIN]this: 0x10de420
17:21:50.780448 . 1271 [ncACSShareMgnt.cpp:886] [BEGIN]this: 0x10d5450
17:21:50.781456 . 1008 [ncACSShareMgnt.cpp:916] [END]this: 0x10d5450
17:21:50.782712 . 22 [ncACSProcessorUtil.cpp:475] [END]this: 0x10de420
17:21:50.782884 . 2 [ncACSDocManager.cpp:1592] [END]this: 0x10dfa10
17:21:50.783507 . 623 [ncEACEntryDoc2Handler.cpp:158] this: 0x10df630, cntl: 0x7fea8c216db0, userId: b3c51f50-394f-11e7-b7c9-1866daf2fda0 end
17:21:50.783593 . 86 Leave HttpService.EntryDoc2_Get
17:21:55.225824 4.442231 Responded(10961)

服务偶现响应慢,打开trace,最后两行显示耗时4s多,倒数第三行已是请求handler函数的最后一行,那么剩下的耗时是否消耗在brpc内部了,很疑惑,望解答!

@jamesge
Copy link
Contributor

jamesge commented Mar 16, 2020

可能服务瞬间并发度比较高,看看此时的监控是否有一些突增

@steel2013
Copy link
Author

可能服务瞬间并发度比较高,看看此时的监控是否有一些突增

具体如何排查和避免这种情况?比如查看具体时间消耗在框架的哪里?

@steel2013
Copy link
Author

image
当出现响应卡顿时,会发现断开连接时,brpc端没有发送FIN包

@steel2013
Copy link
Author

steel2013 commented Apr 23, 2020

@jamesge 偶现bthread_worker_usage是bthread_worker_count的2至3倍,一旦出现,就会一直持续,此时服务出现卡顿,api响应很慢,将此服务节点lvs负载关闭,bthread_worker_usage即恢复正常,维持在1左右,再次打开负载,又出现bthread_worker_usage是bthread_worker_count的2至3倍,此时抓取堆栈,如下:
Thread 16 (Thread 0x7fe05b9f9700 (LWP 27903)):
#0 0x00007fe09056dba9 in syscall () from /lib64/libc.so.6
#1 0x00007fe091ebaa3e in bthread::TaskGroup::wait_task(unsigned long*) () from ./libbrpc.so
#2 0x00007fe091ebcb3b in bthread::TaskGroup::run_main_task() () from ./libbrpc.so
#3 0x00007fe091eb87ee in bthread::TaskControl::worker_thread(void*) () from ./libbrpc.so
#4 0x00007fe090260e65 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fe09057388d in clone () from /lib64/libc.so.6
出现大量上述堆栈信息
另一次堆栈如下
Thread 20 (Thread 0x7f1c0cdfa700 (LWP 22801)):
#0 0x00007f1c3702b57c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f1c36f8e7c4 in _L_lock_62 () from /lib64/libc.so.6
#2 0x00007f1c36f8e64e in fwrite () from /lib64/libc.so.6
#3 0x00007f1c3789b642 in std::ostream::write(char const*, long) () from /lib64/libstdc++.so.6
#4 0x00007f1c389192ca in butil::operator<<(std::ostream&, butil::IOBuf const&) () from ./libbrpc.so
#5 0x00007f1c38a22323 in brpc::policy::PrintMessage(butil::IOBuf const&, bool, bool) () from ./libbrpc.so
#6 0x00007f1c38a25ffa in brpc::policy::SendHttpResponse(brpc::Controller*, google::protobuf::Message const*, google::protobuf::Message const*, brpc::Server const*, brpc::MethodStatus*, long) () from ./libbrpc.so
#7 0x00007f1c38a2822a in brpc::internal::FunctionClosure6<brpc::Controller*, google::protobuf::Message const*, google::protobuf::Message const*, brpc::Server const*, brpc::MethodStatus*, long>::Run() () from ./libbrpc.so
#8 0x00007f1c389d1299 in brpc::VarsService::default_method(google::protobuf::RpcController*, brpc::VarsRequest const*, brpc::VarsResponse*, google::protobuf::Closure*) () from ./libbrpc.so
#9 0x00007f1c38abd4f5 in brpc::vars::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*) () from ./libbrpc.so
#10 0x00007f1c38a274e6 in brpc::policy::ProcessHttpRequest(brpc::InputMessageBase*) () from ./libbrpc.so
#11 0x00007f1c38a01d1a in brpc::ProcessInputMessage(void*) () from ./libbrpc.so
#12 0x00007f1c38a02d04 in brpc::InputMessenger::OnNewMessages(brpc::Socket*) () from ./libbrpc.so
#13 0x00007f1c38a9a8bd in brpc::Socket::ProcessEvent(void*) () from ./libbrpc.so
#14 0x00007f1c389668ea in bthread::TaskGroup::task_runner(long) () from ./libbrpc.so
#15 0x00007f1c3894f741 in bthread_make_fcontext () from ./libbrpc.so
#16 0x0000000000000000 in ?? ()

重启服务后,打开负载,即可恢复正常。
image

@jamesge
Copy link
Contributor

jamesge commented Apr 24, 2020

是不是打开了-http_verbose? 否则thread 20中不会出现PrintMessage。这个开关是用于debug的,不应该在线上系统中打开

@steel2013
Copy link
Author

@jamesge 是打开了http_verbose,发现服务卡顿,临时打开看trace的

@troycheng
Copy link

请问这个问题最终有解决的办法了吗

@wwbmmm
Copy link
Contributor

wwbmmm commented May 4, 2023

We close this issue because it is irreproducible and inactive for a long time.
If you can reproduce this issue with the latest version of bRPC, please reopen this issue and tell us how to reproduce.

@wwbmmm wwbmmm closed this as completed May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants