You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Pods get stuck in ContainerCreating state when iSCSI attaching fails. It is never automatically recovered once it happens.
If I understand correctly, iSCSI attaching is done by the NodeStageVolume call, and the kubelet will retry if it fails. But the trident log shows the NodeStageVolume call returned success and never be retried, although the iSCSI attaching failed. The subsequent NodePublishVolume call failed to mount and returned an internal error.
I found that the trident ignores errors other than auth errors from utils.AttachISCSIVolume(). This change seems to be introduced in 22.04.0. I think this change may cause this issue.
@gnarl Thank you for fixing this issue. I confirmed the issue is no longer reproduced on Trident v22.07.0. 👍
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulAttachVolume 72s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-5cb4eb99-6d25-43ed-a495-91e3d51128c4"
Warning FailedMount 15s (x3 over 47s) kubelet MountVolume.MountDevice failed for volume "pvc-5cb4eb99-6d25-43ed-a495-91e3d51128c4" : rpc error: code = Internal desc = failed to stage volume: iSCSI login failed
# I confirmed the pod would mount the volume by retrying after making the node able to connect LIFs.
Normal Pulling 11s kubelet Pulling image "nginx"
Normal Pulled 10s kubelet Successfully pulled image "nginx" in 1.517655829s
Normal Created 10s kubelet Created container nginx
Normal Started 10s kubelet Started container nginx
Describe the bug
Pods get stuck in ContainerCreating state when iSCSI attaching fails. It is never automatically recovered once it happens.
If I understand correctly, iSCSI attaching is done by the
NodeStageVolume
call, and the kubelet will retry if it fails. But the trident log shows theNodeStageVolume
call returned success and never be retried, although the iSCSI attaching failed. The subsequentNodePublishVolume
call failed to mount and returned an internal error.I found that the trident ignores errors other than auth errors from
utils.AttachISCSIVolume()
. This change seems to be introduced in 22.04.0. I think this change may cause this issue.trident/frontend/csi/node_server.go
Lines 1046 to 1060 in e721974
Environment
Provide accurate information about the environment to help us reproduce the issue.
silenceAutosupport: true
(Trident Operator)To Reproduce
Steps to reproduce the behavior:
iptables -A OUTPUT -p tcp -d <LIF_ADDRESS> -j REJECT
iptables -D OUTPUT -p tcp -d <LIF_ADDRESS> -j REJECT
Expected behavior
The trident should retry the iSCSI attaching when it fails.
Additional context
The text was updated successfully, but these errors were encountered: