Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] csi pod socket 出现断开,无法平滑升级 #1139

Open
YunhuiChen opened this issue Oct 10, 2024 · 2 comments
Open

[BUG] csi pod socket 出现断开,无法平滑升级 #1139

YunhuiChen opened this issue Oct 10, 2024 · 2 comments
Labels
kind/bug Something isn't working

Comments

@YunhuiChen
Copy link

What happened:
执行平滑升级出现:

E1010 08:47:07.515335    2560 grace.go:412] "grace: error connecting to socket" err="dial unix /tmp/juicefs-csi-shutdown.sock: connect: connection refused"
E1010 08:47:07.515385    2560 upgrade.go:42] "main: failed to upgrade mount pod" err="dial unix /tmp/juicefs-csi-shutdown.sock: connect: connection refused"

看到mount pod有日志:

2024/10/10 08:29:21.344859 juicefs[1] <WARNING>: send fd to /tmp/fuse_fd_csi_comm.sock: dial unix /tmp/fuse_fd_csi_comm.sock: connect: no such file or directory [passfd.go:123]

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?

Environment:

  • JuiceFS CSI Driver version (which image tag did your CSI Driver use):
  • Kubernetes version (e.g. kubectl version):
  • Object storage (cloud provider and region):
  • Metadata engine info (version, cloud provider managed or self maintained):
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
  • Others:
@YunhuiChen YunhuiChen added the kind/bug Something isn't working label Oct 10, 2024
@zwwhdls
Copy link
Member

zwwhdls commented Oct 10, 2024

in csi:

root@juicefs-csi-node-9897d:/app# lsof -p 8
COMMAND   PID USER   FD      TYPE             DEVICE SIZE/OFF      NODE NAME
juicefs-c   8 root  cwd       DIR              0,252     4096   1473032 /app
juicefs-c   8 root  rtd       DIR              0,252     4096   5505897 /
juicefs-c   8 root  txt       REG              0,252 40513688   5505805 /usr/local/bin/juicefs-csi-driver
juicefs-c   8 root    0r      CHR                1,3      0t0         6 /dev/null
juicefs-c   8 root    1w     FIFO               0,13      0t0 196809539 pipe
juicefs-c   8 root    2w     FIFO               0,13      0t0 196809540 pipe
juicefs-c   8 root    3u  a_inode               0,14        0     11440 [eventpoll]
juicefs-c   8 root    4r  a_inode               0,14        0     11440 inotify
juicefs-c   8 root    5r     FIFO               0,13      0t0 196808381 pipe
juicefs-c   8 root    6w     FIFO               0,13      0t0 196808381 pipe
juicefs-c   8 root    7u  a_inode               0,14        0     11440 [eventpoll]
juicefs-c   8 root    8r     FIFO               0,13      0t0 196810129 pipe
juicefs-c   8 root    9w     FIFO               0,13      0t0 196810129 pipe
juicefs-c   8 root   12u     unix 0xffff9c5e3b9d8c00      0t0 196812115 /tmp/juicefs-csi-shutdown.sock type=STREAM
juicefs-c   8 root   13u     IPv4          196812116      0t0       TCP juicefs-csi-node-9897d:48228->172-28-39-187.kubernetes.default.svc.cluster.local:10250 (ESTABLISHED)
juicefs-c   8 root   14u     IPv6          196808396      0t0       TCP *:http-alt (LISTEN)
juicefs-c   8 root   15u     IPv4          196809565      0t0       TCP localhost:6060 (LISTEN)
juicefs-c   8 root   16u     unix 0xffff9c5e3c367400      0t0 196811093 /csi/csi.sock type=STREAM
juicefs-c   8 root   17u     unix 0xffff9c60ea935800      0t0 196810221 /csi/csi.sock type=STREAM
juicefs-c   8 root   19u     unix 0xffff9c5e3c361000      0t0 197101508 /tmp/00ce3e423bb79f4822ead350ac9be2a58ff9f2000af905298dab7be62b49500/fuse_fd_csi_comm.sock type=STREAM
juicefs-c   8 root   21u      CHR             10,229      0t0        22 /fuse
root@juicefs-csi-node-9897d:/app#
root@juicefs-csi-node-9897d:/app# netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:6060          0.0.0.0:*               LISTEN      8/juicefs-csi-drive
tcp6       0      0 :::8080                 :::*                    LISTEN      8/juicefs-csi-drive
tcp6       0      0 :::9809                 :::*                    LISTEN      -
tcp6       0      0 :::9909                 :::*                    LISTEN      -
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
unix  2      [ ACC ]     STREAM     LISTENING     197101508 8/juicefs-csi-drive  /tmp/00ce3e423bb79f4822ead350ac9be2a58ff9f2000af905298dab7be62b49500/fuse_fd_csi_comm.sock
unix  2      [ ACC ]     STREAM     LISTENING     196812115 8/juicefs-csi-drive  /tmp/juicefs-csi-shutdown.sock
unix  2      [ ACC ]     STREAM     LISTENING     196813895 -                    /registration/csi.juicefs.com-reg.sock
unix  2      [ ACC ]     STREAM     LISTENING     196811093 8/juicefs-csi-drive  /csi/csi.sock
root@juicefs-csi-node-9897d:/app#

It seems something wrong in socket connection, but csi still listens all sock files.

@zwwhdls
Copy link
Member

zwwhdls commented Oct 10, 2024

Restart csi node can recover this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants