-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: execution halts with goroutines stuck in runtime.gopark
and runtime.futex
#58798
Comments
runtime.gopark
and runtime.futex
runtime.gopark
and runtime.futex
If there is any other info you need, please let us know. We can reproduce this with our application reliably and have a |
Upon further examination, this appears where at least 1 goroutine gets stuck (waiting for a new object alloction that triggers * Goroutine 4921 - User: /home/runner/work/avalanchego/avalanchego/message/inbound_msg_builder.go:313 github.com/ava-labs/avalanchego/message.(*outMsgBuilder).Chits (0x92477b) (thread 3356575)
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x00000000004756c0 in runtime.systemstack_switch
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:463
2 0x0000000000422c7c in runtime.gcStart
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/mgc.go:665
3 0x0000000000415297 in runtime.mallocgc
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/malloc.go:1172
4 0x0000000000415507 in runtime.newobject
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/malloc.go:1254
5 0x000000000092477b in github.com/ava-labs/avalanchego/message.encodeIDs
at /home/runner/work/avalanchego/avalanchego/message/inbound_msg_builder.go:313
6 0x000000000092477b in github.com/ava-labs/avalanchego/message.(*outMsgBuilder).Chits
at /home/runner/work/avalanchego/avalanchego/message/outbound_msg_builder.go:618
7 0x000000000092773c in github.com/ava-labs/avalanchego/message.(*creator).Chits
at <autogenerated>:1
8 0x0000000000c1d5cb in github.com/ava-labs/avalanchego/snow/networking/sender.(*sender).SendChits
at /home/runner/work/avalanchego/avalanchego/snow/networking/sender/sender.go:1160
9 0x0000000000cd3027 in github.com/ava-labs/avalanchego/snow/engine/snowman.(*Transitive).sendChits
at /home/runner/work/avalanchego/avalanchego/snow/engine/snowman/transitive.go:461
10 0x0000000000cd0445 in github.com/ava-labs/avalanchego/snow/engine/snowman.(*Transitive).PullQuery
at /home/runner/work/avalanchego/avalanchego/snow/engine/snowman/transitive.go:198
(truncated) |
one theory from a cursory look (I'm not a contributor so pardon my ignorance) is that Line 665 in 202a1a5
And the total "halt" we see is because most other locks have already returned but gc can't acquire some non-zero number of them. Alternatively, there could be some off-by-one issue within the world lock sema function that causes it to iterate forever in a for loop instead of returning to continue GC (which would explain the CPU spike). |
If you have a core dump, could you open it with gdb rather than dlv, and run “thread apply all bt”? This should show what the system threads are doing rather than the goroutines. |
I'm spinning up a linux box to view the dump in gdb right now ( In the meantime, I used Thread 3354731 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354732 at 0x4791dd /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:135 runtime.usleep
Thread 3354733 at 0x409f8e /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/internal/syscall/asm_linux_amd64.s:36 runtime/internal/syscall.Syscall6
Thread 3354734 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354735 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354736 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354738 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354743 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354812 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354813 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
Thread 3354861 at 0x45ffdb /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/stubs.go:18 runtime.pcdatavalue
Thread 3354862 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
* Thread 3356575 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex |
Here you go:
|
Found a detailed thread view in Thread 3356575 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x00000000004756c0 in runtime.systemstack_switch
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:463
2 0x0000000000422c7c in runtime.gcStart
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/mgc.go:665
3 0x0000000000415297 in runtime.mallocgc
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/malloc.go:1172
4 0x0000000000415507 in runtime.newobject
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/malloc.go:1254
5 0x000000000092477b in github.com/ava-labs/avalanchego/message.encodeIDs
at /home/runner/work/avalanchego-internal/avalanchego-internal/message/inbound_msg_builder.go:313
6 0x000000000092477b in github.com/ava-labs/avalanchego/message.(*outMsgBuilder).Chits
at /home/runner/work/avalanchego-internal/avalanchego-internal/message/outbound_msg_builder.go:618
7 0x000000000092773c in github.com/ava-labs/avalanchego/message.(*creator).Chits
at <autogenerated>:1
8 0x0000000000c1d5cb in github.com/ava-labs/avalanchego/snow/networking/sender.(*sender).SendChits
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/networking/sender/sender.go:1160
9 0x0000000000cd3027 in github.com/ava-labs/avalanchego/snow/engine/snowman.(*Transitive).sendChits
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/engine/snowman/transitive.go:461
10 0x0000000000cd0445 in github.com/ava-labs/avalanchego/snow/engine/snowman.(*Transitive).PullQuery
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/engine/snowman/transitive.go:198
11 0x0000000000bbdbad in github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).handleSyncMsg
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/networking/handler/handler.go:637
12 0x0000000000bb8c28 in github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).dispatchSync
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/networking/handler/handler.go:355
13 0x0000000000bb80e5 in github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).Start.func1
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/networking/handler/handler.go:238
14 0x0000000000809ab3 in github.com/ava-labs/avalanchego/utils/logging.(*log).RecoverAndPanic
at /home/runner/work/avalanchego-internal/avalanchego-internal/utils/logging/log.go:125
15 0x0000000000bb7dcf in github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).Start.func11
15 0x0000000000bb7dcf in github.com/ava-labs/avalanchego/snow/networking/handler.(*handler).Start.func11
at /home/runner/work/avalanchego-internal/avalanchego-internal/snow/networking/handler/handler.go:257
16 0x00000000004778e1 in runtime.goexit
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:1598
Thread 3354862 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004471cc in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x00000000004471cc in runtime.stopm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2334
5 0x0000000000448a1c in runtime.findRunnable
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3007
6 0x0000000000449851 in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3360
7 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
8 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452
Thread 3354861 at 0x45ffdb /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/stubs.go:18 runtime.pcdatavalue
0 0x000000000045ffdb in runtime.add
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/stubs.go:18
1 0x000000000045ffdb in runtime.pcdatastart
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/symtab.go:1102
2 0x000000000045ffdb in runtime.pcdatavalue
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/symtab.go:1109
3 0x00000000004d1107 in syscall.Syscall
at /opt/hostedtoolcache/go/1.20.1/x64/src/syscall/syscall_linux.go:69
4 0x000000c015f38ae9 in ???
at ?:-1
5 0x0000000000415128 in runtime.mallocgc
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/malloc.go:1094
Thread 3354813 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004471cc in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x00000000004471cc in runtime.stopm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2334
5 0x0000000000448a1c in runtime.findRunnable
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3007
6 0x0000000000449851 in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3360
7 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
8 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452
Thread 3354812 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004471cc in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x00000000004471cc in runtime.stopm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2334
5 0x0000000000448a1c in runtime.findRunnable
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3007
6 0x0000000000449851 in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3360
7 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
8 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452
Thread 3354743 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004471cc in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x00000000004471cc in runtime.stopm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2334
5 0x0000000000448a1c in runtime.findRunnable
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3007
6 0x0000000000449851 in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3360
7 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
8 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452
Thread 3354738 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413b05 in runtime.notetsleep_internal
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:183
3 0x0000000000413c25 in runtime.notetsleepg
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:237
4 0x0000000000473acf in os/signal.signal_recv
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sigqueue.go:152
5 0x00000000005a7a79 in os/signal.loop
at /opt/hostedtoolcache/go/1.20.1/x64/src/os/signal/signal_unix.go:23
6 0x00000000004778e1 in runtime.goexit
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:1598
Thread 3354736 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004470b1 in runtime.templateThread
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2312
4 0x0000000000445bf3 in runtime.mstart1
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1498
5 0x0000000000445b3a in runtime.mstart0
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1455
6 0x0000000000475625 in runtime.mstart
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:395
7 0x00000000013b9a7c in ???
at ?:-1
error: error while reading spliced memory at 0x7fdb05af0288: EOF
Thread 3354735 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x0000000000445ce5 in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x0000000000447a25 in runtime.stoplockedm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2561
5 0x00000000004497dd in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3339
6 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
7 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452
Thread 3354734 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004471cc in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x00000000004471cc in runtime.stopm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2334
5 0x0000000000448a1c in runtime.findRunnable
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3007
6 0x0000000000449851 in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3360
7 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
8 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452
Thread 3354733 at 0x409f8e /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/internal/syscall/asm_linux_amd64.s:36 runtime/internal/syscall.Syscall6
0 0x0000000000409f8e in runtime/internal/syscall.Syscall6
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/internal/syscall/asm_linux_amd64.s:36
1 0x0000000000409f73 in syscall.RawSyscall6
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/internal/syscall/syscall_linux.go:38
2 0x00000000004d11be in syscall.Syscall6
at /opt/hostedtoolcache/go/1.20.1/x64/src/syscall/syscall_linux.go:92
3 0x00000000004d1be5 in syscall.Syscall6
at :0
4 0x00000000005a67d8 in golang.org/x/sys/unix.EpollWait
at /home/runner/go/pkg/mod/golang.org/x/sys@v0.5.0/unix/zsyscall_linux_amd64.go:56
5 0x0000000000e7059d in github.com/rjeczalik/notify.(*inotify).loop
at /home/runner/go/pkg/mod/github.com/rjeczalik/notify@v0.9.3/watcher_inotify.go:189
6 0x0000000000e701ca in github.com/rjeczalik/notify.(*inotify).lazyinit.func2
at /home/runner/go/pkg/mod/github.com/rjeczalik/notify@v0.9.3/watcher_inotify.go:129
7 0x00000000004778e1 in runtime.goexit
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:1598
Thread 3354732 at 0x4791dd /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:135 runtime.usleep
0 0x00000000004791dd in runtime.usleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:135
1 0x000000000044e425 in runtime.sysmon
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:5297
2 0x0000000000445bf3 in runtime.mstart1
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1498
3 0x0000000000445b3a in runtime.mstart0
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1455
4 0x0000000000475625 in runtime.mstart
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:395
5 0x00000000013b9a7c in ???
at ?:-1
error: error while reading spliced memory at 0x7fdb07b34288: EOF
Thread 3354731 at 0x4797e3 /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555 runtime.futex
0 0x00000000004797e3 in runtime.futex
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/sys_linux_amd64.s:555
1 0x000000000043c6b6 in runtime.futexsleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/os_linux.go:69
2 0x0000000000413907 in runtime.notesleep
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/lock_futex.go:160
3 0x00000000004471cc in runtime.mPark
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:1530
4 0x00000000004471cc in runtime.stopm
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:2334
5 0x0000000000448a1c in runtime.findRunnable
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3007
6 0x0000000000449851 in runtime.schedule
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3360
7 0x0000000000449d6d in runtime.park_m
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/proc.go:3511
8 0x00000000004756a3 in runtime.mcall
at /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/asm_amd64.s:452 |
Are there any 1.20.2 RCs that have this fix in them yet we could use? If not, it'll just take a bit more leg work to update. Figured I'd ask before embarking on that. |
@patrick-ogrady go1.20.2Not yet released |
Ended up just implementing a pipeline to build any branch from this repo from source and use that for compilation. We'll run a test tomorrow and report back. Thanks for the help! |
So far so good with We'll continue running tests for the next day and will report back tomorrow if it continues to stay stable. Notably, the issue you linked above seems to be related specifically to apply to a panic case, however, I'm assuming the fix you implemented handled a more general issue (as our code does not hit a panic it was blocked on or anything with the fix, it just doesn't get stuck). |
That's great! The issue is with traceback generally. Traceback is used in panic processing, but also in other places. In your case, it is stuck in traceback during CPU profiling. I'm going to close this as a duplicate of #58513. Please reopen if the issue does reproduce after all. |
Thanks for the investigation and explanation. Will reply if I see anything out of the ordinary on that branch. |
I ran this change for about ~20 hours without any issue. Based on how frequently it occurred before, I believe this is resolved. |
Hi @patrick-ogrady, regarding this comment, do you mind sharing any code snippets of the pipeline or ways to build and use an unreleased go version for compilation? Any help would be appreciated. Thanks in advance! |
@manav2401 There are a few options: Build directly:
Use gotip:
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, with 1.20.1
What operating system and processor architecture are you using (
go env
)?What did you do?
After upgrading
avalanchego
to usego1.20.1
instead of1.19.6
, the binary now randomly halts (all goroutines stop and the program becomes unresponsive) after about 20-30 minutes of runtime ongo1.20.1 linux/amd64
.Notably, the binary does not panic or exit at this point (as you would expect if it was
deadlocked
). It just stays pegged to ~30% CPU and spins indefinitely on something (but all async processes halt):The output of
grs -r
ondlv
is attached from the point of stall:goroutines.txt
What did you expect to see?
The program to work like it did on v1.19.6 (not halt without warning or panic). To run this test (and hopefully avoid wasting your time), we compiled the exact same code in both 1.19.6 and 1.20.1 (this only occurred in 1.20.1).
What did you see instead?
The program halted without emitting a panic or any other info that could be used to diagnose the issue. All metrics-related goroutines that would normally inform of system level processes also stopped running (I.e. global program halt).
The text was updated successfully, but these errors were encountered: