Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自定义进程wait futex queue异常 #5659

Closed
crystal9002 opened this issue Jan 10, 2025 · 7 comments
Closed

自定义进程wait futex queue异常 #5659

crystal9002 opened this issue Jan 10, 2025 · 7 comments

Comments

@crystal9002
Copy link

Please answer these questions before submitting your issue.

  1. 问题:
    使用hyperf3.1的redis异步队列从redis消费数据,消费进程卡死,看起来是等待futex信号回调, 麻烦帮忙看下怎么解决

异常进程堆栈信息如下:

/proc/35 # cat /proc/35/task/*/stack
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x163/0x260
[<0>] do_futex+0x12d/0x1d0
[<0>] __x64_sys_futex+0x73/0x1d0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x163/0x260
[<0>] do_futex+0x12d/0x1d0
[<0>] __x64_sys_futex+0x73/0x1d0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x163/0x260
[<0>] do_futex+0x12d/0x1d0
[<0>] __x64_sys_futex+0x73/0x1d0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x163/0x260
[<0>] do_futex+0x12d/0x1d0
[<0>] __x64_sys_futex+0x73/0x1d0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] __skb_wait_for_more_packets+0x13c/0x180
[<0>] __skb_recv_udp+0x202/0x330
[<0>] udpv6_recvmsg+0x181/0x790
[<0>] inet6_recvmsg+0x116/0x130
[<0>] ____sys_recvmsg+0x87/0x180
[<0>] ___sys_recvmsg+0x7c/0xd0
[<0>] __sys_recvmsg+0x56/0xa0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x163/0x260
[<0>] do_futex+0x12d/0x1d0
[<0>] __x64_sys_futex+0x73/0x1d0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

/proc/35 # cat wchan
futex_wait_queue

  1. What version of Swoole are you using (show your php --ri swoole)?
    swoole

Swoole => enabled
Author => Swoole Team team@swoole.com
Version => 5.1.3
Built => Jul 25 2024 02:11:01
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
spinlock => enabled
rwlock => enabled
openssl => OpenSSL 3.1.6 4 Jun 2024
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
pcre => enabled
c-ares => 1.19.1
zlib => 1.2.13
brotli => E16777225/D16777225
mutex_timedlock => enabled
pthread_barrier => enabled
async_redis => enabled
coroutine_pgsql => enabled
coroutine_odbc => enabled
coroutine_sqlite => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => Off => Off
swoole.display_errors => On => On
swoole.use_shortname => Off => Off
swoole.unixsock_buffer_size => 8388608 => 8388608

  1. What is your machine environment used (show your uname -a & php -v & gcc -v) ?

/opt/www # uname -a
Linux playlet_api01 6.1.92-99.174.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jun 4 15:43:46 UTC 2024 x86_64 Linux

/opt/www # php -v
PHP 8.1.27 (cli) (built: Feb 21 2024 14:48:59) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.27, Copyright (c) Zend Technologies
with Zend OPcache v8.1.27, Copyright (c), by Zend Technologies

@crystal9002
Copy link
Author

补充一下strace日志
/proc/35 # strace -p 35
strace: Process 35 attached
futex(0x7efc1a697b68, FUTEX_WAIT_PRIVATE, 2, NULL

@matyhtf
Copy link
Member

matyhtf commented Jan 10, 2025

gdb -p bt

看调用栈

@kurumii
Copy link

kurumii commented Jan 10, 2025

gdb -p bt

看调用栈

/opt/www # gdb -p 35
GNU gdb (GDB) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 35
[New LWP 103]
[New LWP 104]
[New LWP 105]
[New LWP 106]
[New LWP 3565]
0x00007efc22a8cf63 in ?? () from /lib/ld-musl-x86_64.so.1
(gdb) bt
#0 0x00007efc22a8cf63 in ?? () from /lib/ld-musl-x86_64.so.1
#1 0x00007efc22a8a0ee in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007efc22aceb84 in ?? () from /lib/ld-musl-x86_64.so.1
#3 0x0000000000000000 in ?? ()
(gdb)

@matyhtf
Copy link
Member

matyhtf commented Jan 10, 2025

没有调试符号,你需要换成有调试符号的环境去重现看一下

@kurumii
Copy link

kurumii commented Jan 10, 2025

没有调试符号,你需要换成有调试符号的环境去重现看一下

这个是线上偶现的,异步队列里面的逻辑是请求第三方接口获取数据,设置guzzleHttp超时时间5s,从1月6号开始,大概经过1天后会突然出现这样子的情况

@crystal9002
Copy link
Author

crystal9002 commented Jan 10, 2025

这个问题暂时没办法在测试环境复现,没有相关的debug日志;
从运行日志上来看,出问题的时候请求对端接口出现大量http异常响应,猜测因此触发底层信号监听异常;

  1. cURL error 6: Could not resolve host: xxx.xxx.xxx (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)
  2. cURL error 28: Operation timed out after 10003 milliseconds with 0 bytes received (see https://curl.haxx.se/libcurl/c/libcurl-errors.html
  3. cURL error 35: Recv failure: Bad file descriptor (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)
  4. cURL error 35: TLS connect error: error:00000000:lib(0)::reason(0) (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)

@matyhtf
Copy link
Member

matyhtf commented Jan 11, 2025

@crystal9002 你需要查看阻塞时的调用栈才能知道问题出现在哪里。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants