Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug sporadic vimto hang #1466

Merged
merged 1 commit into from
May 17, 2024
Merged

debug sporadic vimto hang #1466

merged 1 commit into from
May 17, 2024

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented May 15, 2024

ci: debug sporadic vimto hang

There is a case where one of the CI tests hangs for 10 minutes an is then 
killed by the go test harness' timeout. From the dumped goroutines it seems 
like the runner inside the VM is stuck writing to stdout:

goroutine 82 [syscall, 9 minutes]: syscall.Syscall(0x4a4b1c?, 0x708c80?,
0x80000000000?, 0x7ffff80000000000?)
/opt/hostedtoolcache/go/1.21.9/x64/src/syscall/syscall_linux.go:69 +0x25 
syscall.write(0xc000014120?, {0xc0002340c0?, 0x4f0996?, 0xc002048680?})
/opt/hostedtoolcache/go/1.21.9/x64/src/syscall/zsyscall_linux_amd64.go:949
+0x3b syscall.Write(...)
/opt/hostedtoolcache/go/1.21.9/x64/src/syscall/syscall_unix.go:209 
internal/poll.ignoringEINTRIO(...)
/opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_unix.go:736 
internal/poll.(*FD).Write(0xc000014120, {0xc0002340c0, 0x1c, 0xc0})
/opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_unix.go:380 +0x35f 
os.(*File).write(...)
/opt/hostedtoolcache/go/1.21.9/x64/src/os/file_posix.go:46 
os.(*File).Write(0xc000040028, {0xc0002340c0?, 0x1c, 0xc0018e4e28:225 +0x97 
testing.(*matcher).fullName(0xc0023b6150, 0xc0023b6150, {0xc00228ec0b?,
0x776c00?})
/opt/hostedtoolcache/go/1.21.9/x64/src/testing/match.go:90 +0x106 
testing.(*T).Run(0xc00217eb60, {0xc00228ec0b?, 0x7fe098330108?},
0xc0001d2a80)
/opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1639 +0x2db 
github.com/cilium/ebpf/internal/testutils.Files(0xc00217eb60, {0xc0022ae000,
0x51c, 0x2e?}, 0xc00220d320)
/home/runner/work/ebpf/ebpf/internal/testutils/glob.go:25 +0x91 
github.com/cilium/ebpf.TestLibBPFCompat(0xc00217eb60)
/home/runner/work/ebpf/ebpf/elf_reader_test.go:954 +0x14e 
testing.tRunner(0xc00217eb60, 0x792768)
/opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1595 +0xff created
by testing.(*T).Run in goroutine 1
/opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1648 +0x3ad

Note that writing the goroutine dump to stderr does work, so not all I/O is
broken. Generate and collect a core dump of the Go test the next time this 
happens.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

@lmb lmb force-pushed the debug-vimto-hang branch from 1426402 to 4b3d3ce Compare May 15, 2024 13:45
There is a case where one of the CI tests hangs for 10 minutes an is then
killed by the go test harness' timeout. From the dumped goroutines it seems
like the runner inside the VM is stuck writing to stdout:

goroutine 82 [syscall, 9 minutes]:
syscall.Syscall(0x4a4b1c?, 0x708c80?, 0x80000000000?, 0x7ffff80000000000?)
	/opt/hostedtoolcache/go/1.21.9/x64/src/syscall/syscall_linux.go:69 +0x25
syscall.write(0xc000014120?, {0xc0002340c0?, 0x4f0996?, 0xc002048680?})
	/opt/hostedtoolcache/go/1.21.9/x64/src/syscall/zsyscall_linux_amd64.go:949 +0x3b
syscall.Write(...)
	/opt/hostedtoolcache/go/1.21.9/x64/src/syscall/syscall_unix.go:209
internal/poll.ignoringEINTRIO(...)
	/opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_unix.go:736
internal/poll.(*FD).Write(0xc000014120, {0xc0002340c0, 0x1c, 0xc0})
	/opt/hostedtoolcache/go/1.21.9/x64/src/internal/poll/fd_unix.go:380 +0x35f
os.(*File).write(...)
	/opt/hostedtoolcache/go/1.21.9/x64/src/os/file_posix.go:46
os.(*File).Write(0xc000040028, {0xc0002340c0?, 0x1c, 0xc0018e4e28:225 +0x97
testing.(*matcher).fullName(0xc0023b6150, 0xc0023b6150, {0xc00228ec0b?, 0x776c00?})
	/opt/hostedtoolcache/go/1.21.9/x64/src/testing/match.go:90 +0x106
testing.(*T).Run(0xc00217eb60, {0xc00228ec0b?, 0x7fe098330108?}, 0xc0001d2a80)
	/opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1639 +0x2db
github.com/cilium/ebpf/internal/testutils.Files(0xc00217eb60, {0xc0022ae000, 0x51c, 0x2e?}, 0xc00220d320)
	/home/runner/work/ebpf/ebpf/internal/testutils/glob.go:25 +0x91
github.com/cilium/ebpf.TestLibBPFCompat(0xc00217eb60)
	/home/runner/work/ebpf/ebpf/elf_reader_test.go:954 +0x14e
testing.tRunner(0xc00217eb60, 0x792768)
	/opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1595 +0xff
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.21.9/x64/src/testing/testing.go:1648 +0x3ad

Note that writing the goroutine dump to stderr does work, so not all I/O
is broken. Generate and collect a core dump of the Go test the next time this
happens.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb force-pushed the debug-vimto-hang branch from 4b3d3ce to 5c4e4c7 Compare May 17, 2024 07:57
@lmb lmb changed the title debug sporadic vimto hang WIP debug sporadic vimto hang May 17, 2024
@lmb lmb marked this pull request as ready for review May 17, 2024 10:05
@lmb lmb requested a review from a team as a code owner May 17, 2024 10:05
@lmb lmb merged commit 8079b37 into cilium:main May 17, 2024
17 checks passed
@lmb lmb deleted the debug-vimto-hang branch May 17, 2024 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant