Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error unlock of unlocked mutex #2630

Open
lewandowskim1988 opened this issue Oct 25, 2024 · 4 comments
Open

Error unlock of unlocked mutex #2630

lewandowskim1988 opened this issue Oct 25, 2024 · 4 comments
Labels
bug Error or flaw in the code with unintended result p2 P2 pending customer action question Customer Issue: question about how to use tool

Comments

@lewandowskim1988
Copy link

I have an issue with one of my CI tests.
All tests are mounting FUSE volume in same directory and do some operations but only one is failing with error message that I don't know how to manage.

Error message from Kubernetes events of failing pod:

Normal   Killing      12m                kubelet            Stopping container gke-gcsfuse-sidecar
Warning  FailedMount  12m (x3 over 12m)  kubelet            MountVolume.SetUp failed for volume "gcs-fuse-csi-pv" : rpc error: code = Internal desc = gcsfuse failed with error: fatal error: sync: unlock of unlocked mutex

goroutine 16040 [running]:
sync.fatal({0x172a50c?, 0xc0008158d8?})
  /usr/local/go/src/runtime/panic.go:1007 +0x18
sync.(*Mutex).unlockSlow(0xc000aa3a88, 0xffffffff)
  /usr/local/go/src/sync/mutex.go:229 +0x35
sync.(*Mutex).Unlock(0xc000a00a80?)
  /usr/local/go/src/sync/mutex.go:223 +0x25
github.com/googlecloudplatform/gcsfuse/v2/internal/fs.(*fileSystem).lookUpLocalFileInode.func1()
  /go/gcsfuse/internal/fs/fs.go:984 +0x46
panic({0x13989c0?, 0xc000c8cdc0?})
  /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/googlecloudplatform/gcsfuse/v2/internal/fs/inode.NewFileName({{0x0, 0x0}, {0xc000120bd0, 0x22}}, {0x2897a50, 0x0})
  /go/gcsfuse/internal/fs/inode/name.go:58 +0x108
github.com/googlecloudplatform/gcsfuse/v2/internal/fs.(*fileSystem).lookUpLocalFileInode(0xc000469dc0, {0x7c8aadc39e40?, 0xc00088c390?}, {0x2897a50, 0x1?})
  /go/gcsfuse/internal/fs/fs.go:989 +0x150
github.com/googlecloudplatform/gcsfuse/v2/internal/fs.(*fileSystem).lookUpOrCreateChildInode(0xc000469dc0, {0x1c5e100, 0xc000c8b310}, {0x7c8aadc39e40, 0xc00088c390}, {0x2897a50, 0x1})
  /go/gcsfuse/internal/fs/fs.go:923 +0x66
github.com/googlecloudplatform/gcsfuse/v2/internal/fs.(*fileSystem).LookUpInode(0xc000469dc0, {0x1c5e0c8?, 0xc000ba59b0?}, 0xc0004e0e10)
  /go/gcsfuse/internal/fs/fs.go:1344 +0x146
github.com/googlecloudplatform/gcsfuse/v2/internal/fs/wrappers.(*errorMapping).LookUpInode(0xc000ab3f30, {0x1c5e0c8?, 0xc000ba59b0?}, 0x137b620?)
  /go/gcsfuse/internal/fs/wrappers/error_mapping.go:126 +0x69
github.com/googlecloudplatform/gcsfuse/v2/internal/fs/wrappers.(*monitoring).LookUpInode(0xc000ab3f40, {0x1c5e0c8, 0xc000ba59b0}, 0xc0004e0e10)
  /go/gcsfuse/internal/fs/wrappers/monitoring.go:156 +0x76
github.com/jacobsa/fuse/fuseutil.(*fileSystemServer).handleOp(0xc000ac8720, 0xc000d8a000, {0x1c5e0c8, 0xc000ba59b0}, {0x1361900?, 0xc0004e0e10})
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/fuseutil/file_system.go:144 +0x203
created by github.com/jacobsa/fuse/fuseutil.(*fileSystemServer).ServeOps in goroutine 56
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/fuseutil/file_system.go:123 +0x1be

goroutine 1 [select, 3 minutes]:
github.com/jacobsa/fuse.(*MountedFileSystem).Join(0xc000adecc0, {0x1c5de28, 0x29cdfe0})
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/mounted_file_system.go:46 +0x6f
github.com/googlecloudplatform/gcsfuse/v2/cmd.runCLIApp(0xc0006bc160)
  /go/gcsfuse/cmd/legacy_main.go:468 +0x1825
github.com/googlecloudplatform/gcsfuse/v2/cmd.run.func1(0x1?)
  /go/gcsfuse/cmd/legacy_main.go:488 +0x1c
github.com/urfave/cli.HandleAction({0x139c4c0?, 0xc0004cc070?}, 0xc000880fc0?)
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/urfave/cli@v1.22.15/app.go:526 +0x75
github.com/urfave/cli.(*App).Run(0xc000880fc0, {0xc000140000, 0x11, 0x12})
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/urfave/cli@v1.22.15/app.go:286 +0x79b
github.com/googlecloudplatform/gcsfuse/v2/cmd.run()
  /go/gcsfuse/cmd/legacy_main.go:492 +0xbb
github.com/googlecloudplatform/gcsfuse/v2/cmd.ExecuteLegacyMain()
  /go/gcsfuse/cmd/legacy_main.go:507 +0x2b
main.main()
  /go/gcsfuse/main.go:83 +0x13e

goroutine 13 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc0004b8880)
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:292 +0x9f
created by go.opencensus.io/stats/view.init.0 in goroutine 1
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:34 +0x8d

goroutine 37 [chan receive, 3 minutes]:
github.com/googlecloudplatform/gcsfuse/v2/internal/perf.HandleCPUProfileSignals()
  /go/gcsfuse/internal/perf/cpu.go:58 +0x85
created by github.com/googlecloudplatform/gcsfuse/v2/cmd.ExecuteLegacyMain in goroutine 1
  /go/gcsfuse/cmd/legacy_main.go:503 +0x1a

goroutine 38 [chan receive, 3 minutes]:
github.com/googlecloudplatform/gcsfuse/v2/internal/perf.HandleMemoryProfileSignals()
  /go/gcsfuse/internal/perf/memory.go:67 +0x98
created by github.com/googlecloudplatform/gcsfuse/v2/cmd.ExecuteLegacyMain in goroutine 1
  /go/gcsfuse/cmd/legacy_main.go:504 +0x26

goroutine 49 [syscall, 3 minutes]:
os/signal.signal_recv()
  /usr/local/go/src/runtime/sigqueue.go:152 +0x29
os/signal.loop()
  /usr/local/go/src/os/signal/signal_unix.go:23 +0x13
created by os/signal.Notify.func1.1 in goroutine 37
  /usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 88 [select, 3 minutes]:
github.com/googlecloudplatform/gcsfuse/v2/internal/gcsx.garbageCollect({0x1c5e100, 0xc0006de9b0}, {0x170da5d, 0xd}, {0x1c676f8, 0xc000ac86e0})
  /go/gcsfuse/internal/gcsx/garbage_collect.go:108 +0x106
created by github.com/googlecloudplatform/gcsfuse/v2/internal/gcsx.(*bucketManager).SetUpBucket in goroutine 1
  /go/gcsfuse/internal/gcsx/bucket_manager.go:238 +0x745

goroutine 56 [syscall]:
syscall.Syscall(0x0, 0x3, 0xc001884000, 0x101000)
  /usr/local/go/src/syscall/syscall_linux.go:69 +0x25
syscall.read(0xc000d82060?, {0xc001884000?, 0x10?, 0x10?})
  /usr/local/go/src/syscall/zsyscall_linux_amd64.go:736 +0x38
syscall.Read(...)
  /usr/local/go/src/syscall/syscall_unix.go:181
internal/poll.ignoringEINTRIO(...)
  /usr/local/go/src/internal/poll/fd_unix.go:736
internal/poll.(*FD).Read(0xc000d82060, {0xc001884000, 0x101000, 0x101000})
  /usr/local/go/src/internal/poll/fd_unix.go:160 +0x2ae
os.(*File).read(...)
  /usr/local/go/src/os/file_posix.go:29
os.(*File).Read(0xc000d86000, {0xc001884000?, 0x7c7e10?, 0x7c8aadaa9ea8?})
  /usr/local/go/src/os/file.go:118 +0x52
github.com/jacobsa/fuse/internal/buffer.(*InMessage).Init(0xc0011d9840, {0x1c3d9c0?, 0xc000d86000?})
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/internal/buffer/in_message.go:85 +0x4c
github.com/jacobsa/fuse.(*Connection).readMessage(0xc000d8a000)
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/connection.go:346 +0x45
github.com/jacobsa/fuse.(*Connection).ReadOp(0xc000d8a000)
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/connection.go:405 +0x3c
github.com/jacobsa/fuse/fuseutil.(*fileSystemServer).ServeOps(0xc000ac8720, 0xc000d8a000)
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/fuseutil/file_system.go:106 +0x6b
github.com/jacobsa/fuse.Mount.func1()
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/mount.go:93 +0x32
created by github.com/jacobsa/fuse.Mount in goroutine 1
  /tmp/build_gcsfuse_gopath3695249373/pkg/mod/github.com/jacobsa/fuse@v0.0.0-20240607092844-7285af0d05b0/mount.go:92 +0x5b9

goroutine 57 [chan receive, 3 minutes]:
github.com/googlecloudplatform/gcsfuse/v2/cmd.registerSIGINTHandler.func1()
  /go/gcsfuse/cmd/legacy_main.go:66 +0x65
created by github.com/googlecloudplatform/gcsfuse/v2/cmd.registerSIGINTHandler in goroutine 1
  /go/gcsfuse/cmd/legacy_main.go:64 +0xbc

goroutine 1347 [IO wait]:
internal/poll.runtime_pollWait(0x7c8aaddc0c80, 0x72)
  /usr/local/go/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc000067080?, 0xc000552800?, 0x0)
  /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
  /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000067080, {0xc000552800, 0x4800, 0x4800})
  /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a
net.(*netFD).Read(0xc000067080, {0xc000552800?, 0x7c8aaddf5c98?, 0xc0000dbe48?})
  /usr/local/go/src/net/fd_posix.go:55 +0x25
net.(*conn).Read(0xc000d86360, {0xc000552800?, 0xc000472930?, 0x411d5b?})
  /usr/local/go/src/net/net.go:179 +0x45
crypto/tls.(*atLeastReader).Read(0xc0000dbe48, {0xc000552800?, 0x0?, 0xc0000dbe48?})
  /usr/local/go/src/crypto/tls/conn.go:806 +0x3b
bytes.(*Buffer).ReadFrom(0xc0001f1b30, {0x1c403a0, 0xc0000dbe48})
  /usr/local/go/src/bytes/buffer.go:211 +0x98
crypto/tls.(*Conn).readFromUntil(0xc0001f1888, {0x1c3e720, 0xc000d86360}, 0xc000472978?)
  /usr/local/go/src/crypto/tls/conn.go:828 +0xde
crypto/tls.(*Conn).readRecordOrCCS(0xc0001f1888, 0x0)
  /usr/local/go/src/crypto/tls/conn.go:626 +0x3cf
crypto/tls.(*Conn).readRecord(...)
  /usr/local/go/src/crypto/tls/conn.go:588
crypto/tls.(*Conn).Read(0xc0001f1888, {0xc000c72000, 0x1000, 0x8?})
  /usr/local/go/src/crypto/tls/conn.go:1370 +0x156
net/http.(*persistConn).Read(0xc000141440, {0xc000c72000?, 0xc000c461e0?, 0xc000472d38?})
  /usr/local/go/src/net/http/transport.go:1977 +0x4a
bufio.(*Reader).fill(0xc0006eb9e0)
  /usr/local/go/src/bufio/bufio.go:110 +0x103
bufio.(*Reader).Peek(0xc0006eb9e0, 0x1)
  /usr/local/go/src/bufio/bufio.go:148 +0x53
net/http.(*persistConn).readLoop(0xc000141440)
  /usr/local/go/src/net/http/transport.go:2141 +0x1b9
created by net/http.(*Transport).dialConn in goroutine 1271
  /usr/local/go/src/net/http/transport.go:1799 +0x152f

goroutine 1436 [IO wait]:
internal/poll.runtime_pollWait(0x7c8aaddc0a90, 0x72)
  /usr/local/go/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc000504f00?, 0xc000c4f300?, 0x0)
  /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
  /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000504f00, {0xc000c4f300, 0x1300, 0x1300})
  /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a
net.(*netFD).Read(0xc000504f00, {0xc000c4f300?, 0x7c8aadbf3c58?, 0xc001e864c8?})
  /usr/local/go/src/net/fd_posix.go:55 +0x25
net.(*conn).Read(0xc000d87110, {0xc000c4f300?, 0xc000027930?, 0x411d5b?})
  /usr/local/go/src/net/net.go:179 +0x45
crypto/tls.(*atLeastReader).Read(0xc001e864c8, {0xc000c4f300?, 0x0?, 0xc001e864c8?})
  /usr/local/go/src/crypto/tls/conn.go:806 +0x3b
bytes.(*Buffer).ReadFrom(0xc000926630, {0x1c403a0, 0xc001e864c8})
  /usr/local/go/src/bytes/buffer.go:211 +0x98
crypto/tls.(*Conn).readFromUntil(0xc000926388, {0x1c3e720, 0xc000d87110}, 0xc000027978?)
  /usr/local/go/src/crypto/tls/conn.go:828 +0xde
crypto/tls.(*Conn).readRecordOrCCS(0xc000926388, 0x0)
  /usr/local/go/src/crypto/tls/conn.go:626 +0x3cf
crypto/tls.(*Conn).readRecord(...)
  /usr/local/go/src/crypto/tls/conn.go:588
crypto/tls.(*Conn).Read(0xc000926388, {0xc000e76000, 0x1000, 0x8?})
  /usr/local/go/src/crypto/tls/conn.go:1370 +0x156
net/http.(*persistConn).Read(0xc0009a5e60, {0xc000e76000?, 0xc000c467e0?, 0xc000027d38?})
  /usr/local/go/src/net/http/transport.go:1977 +0x4a
bufio.(*Reader).fill(0xc000297c20)
  /usr/local/go/src/bufio/bufio.go:110 +0x103
bufio.(*Reader).Peek(0xc000297c20, 0x1)
  /usr/local/go/src/bufio/bufio.go:148 +0x53
net/http.(*persistConn).readLoop(0xc0009a5e60)
  /usr/local/go/src/net/http/transport.go:2141 +0x1b9
created by net/http.(*Transport).dialConn in goroutine 1307
  /usr/local/go/src/net/http/transport.go:1799 +0x152f

goroutine 1348 [select]:
net/http.(*persistConn).writeLoop(0xc000141440)
  /usr/local/go/src/net/http/transport.go:2444 +0xf0
created by net/http.(*Transport).dialConn in goroutine 1271
  /usr/local/go/src/net/http/transport.go:1800 +0x1585

goroutine 1437 [select]:
net/http.(*persistConn).writeLoop(0xc0009a5e60)
  /usr/local/go/src/net/http/transport.go:2444 +0xf0
created by net/http.(*Transport).dialConn in goroutine 1307
  /usr/local/go/src/net/http/transport.go:1800 +0x1585
gcsfuse exited with error: exit status 2

My setup is as follows

  • Main container image: nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04
  • Platform: GKE v1.31.1-gke.1146000
  • Gcsfuse v1.5.0

Dose anybody have some idea how I can debug it further?

@lewandowskim1988 lewandowskim1988 added p2 P2 question Customer Issue: question about how to use tool labels Oct 25, 2024
@kislaykishore
Copy link
Collaborator

kislaykishore commented Oct 25, 2024

Could you double check the GCSFuse and GKE versions? The line numbers in your error message align more with v2.4.0 than with 2.5.0.

@kislaykishore kislaykishore added the bug Error or flaw in the code with unintended result label Oct 28, 2024
@Tulsishah
Copy link
Collaborator

Hey @lewandowskim1988 ,

Can you please provide the steps to reproduce the issue?

Thanks,
Tulsi Shah

@gargnitingoogle
Copy link
Collaborator

Gcsfuse v1.5.0

On GCSFuse version, there was no GCSFuse version 1.5.0. After 1.4.2, there was directly 2.0.0.
Also given the error stack, 2.4.0 makes more sense than 2.5.0 which of the following line in stack:

/go/gcsfuse/internal/fs/fs.go:984 +0x46 , which is

    fs.mu.Unlock()

in v2.4.0 and

return parent.LookUpChild(ctx, childName)

in v2.5.0 . So it should be v2.4.0 . @lewandowskim1988 please confirm. Also, we need a way to better understand the issue, so any help in that would help. If you can help us recreate the issue, that'll be the best.

@raj-prince
Copy link
Collaborator

raj-prince commented Nov 12, 2024

Hey @lewandowskim1988,

Thanks for reporting this issue!

Could you please help us in reproducing the issue by answering the below questions?
(a) Is this reproducible 100% or intermittent?
(b) Is your CI test bucket empty or contains some object in the start of the CI test?
(c) What all the operations do you perform as part of CI tests? Also, is any object shared across the test?
(d) Do you perform any operations externally to the bucket during the CI test execution?

Again, a small reproducer would be the best for us, if you could provide. Otherwise, we will try to reproduce based on the response of the above questions.

Regards,
Prince Kumar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Error or flaw in the code with unintended result p2 P2 pending customer action question Customer Issue: question about how to use tool
Projects
None yet
Development

No branches or pull requests

5 participants