-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: The csm-isilon-controller keeps getting panic and is restarting #1104
Comments
@cbartoszDell : Please have a look. |
It turns out that the this issue happens when the controller tries to delete a snapshot as well. And we have verified that this snaphots directory was available. 2024-01-23T10:31:15.311451576Z time="2024-01-23T10:31:15Z" level=debug clusterName=sust070a-vl1047 runid=13 msg="Cluster Name: sust070a-vl1047" file="/go/src/service/controller.go:1390" |
It seems that the volumesnapshotcontent for the snapshots that the controller has problems to delete all reports a snapshotHandle in the status filed that looks like "5599939===sust070a-vl1047" (that is two fields separated by ===). While the status for a snapshotHandle for any new volumesnapshotcontent has has three fields (5605183===sust070a-vl1047===az-sust070a-tst). So could that be the issue? |
Bug Description
The driver container of the csm-isilon-controller gets panic: runtime error: index out of range [2] with length 2.
After some investigation it turned out that controller had some issues with a handfull PVCs for som snapshots that it wasn't able to mount and that the directories for thoses snapshots had been deleted on the isilon server, although the snapshot objects was still available mount.
We still don't know how that was possible however.
Logs
In the driver container the last log entry was:
" file="/go/src/service/controllerNodeToArrayConnectivity.go:74"
time="2024-01-19T08:40:00Z" level=debug msg="use pollingFrequency as 60 seconds"
time="2024-01-19T08:40:00Z" level=debug runid=360 msg="last connectivity was 0 sec back, tolerance is 60 sec" file="/go/src/service/controllerNodeToArrayConnectivity.go:86"
time="2024-01-19T08:40:00Z" level=info runid=360 msg="array sust070a-vl1047 is connected to node =#=#=10-32-20-36.node-exporter.openshift-monitoring.svc.cluster.local=#=#=10.32.20.36" file="/go/src/service/csi_extension_server.go:145"
time="2024-01-19T08:40:00Z" level=info runid=360 msg="ValidateVolumeHostConnectivity reply connected:true messages:"array sust070a-vl1047 is connected to node compute-16.dev01.sebshift.sebank.se=#=#=10-32-20-36.node-exporter.openshift-monitoring.svc.cluster.local=#=#=10.32.20.36"" file="/go/src/service/csi_extension_server.go:88"
time="2024-01-19T08:40:00Z" level=info msg="/podmon.v1.Podmon/ValidateVolumeHostConnectivity: REP 0360: Connected=true, IosInProgress=false, Messages=[array sust070a-vl1047 is connected to node compute-16.dev01.sebshift.sebank.se=#=#=10-32-20-36.node-exporter.openshift-monitoring.svc.cluster.local=#=#=10.32.20.36]"
time="2024-01-19T08:40:00Z" level=info msg="Get PVC labels for data-delta-zeebe-1 in namespace zeebe-delta"
time="2024-01-19T08:40:00Z" level=error msg="Error retrieving PVC info: client rate limiter Wait returned an error: context deadline exceeded"
time="2024-01-19T08:40:00Z" level=error msg="Cannot retrieve labels for PVC csi.storage.k8s.io/pvc/name in namespace: csi.storage.k8s.io/pvc/namespace, error: client rate limiter Wait returned an error: context deadline exceeded"
time="2024-01-19T08:40:00Z" level=warning msg="No Values Under Metadata"
time="2024-01-19T08:40:00Z" level=info msg="/csi.v1.Controller/CreateVolume: REQ 0361: Name=dev01-1ffeb494be, CapacityRange=required_bytes:5368709120 , VolumeCapabilities=[mount:<> access_mode:<mode:SINGLE_NODE_MULTI_WRITER > ], Parameters=map[AccessZone:az-sust070a-tst AzServiceIP:sust070a-vl1047.powerscale14.seb.net IsiPath:/ifs/az-sust070a-tst/tst-dr-off-csi-sebshift RootClientEnabled:false csi.storage.k8s.io/pv/name:dev01-1ffeb494be csi.storage.k8s.io/pvc/name:data-delta-zeebe-1 csi.storage.k8s.io/pvc/namespace:zeebe-delta], VolumeContentSource=snapshot:<snapshot_id:"5599941===sust070a-vl1047" > , XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2024-01-19T08:40:00Z" level=info runid=361 msg="Request doesn't include cluster name. Use default cluster 'sust070a-vl1047'" file="/go/src/service/service.go:998"
time="2024-01-19T08:40:00Z" level=debug clusterName=sust070a-vl1047 runid=361 msg="Cluster Name: sust070a-vl1047" file="/go/src/service/controller.go:265"
time="2024-01-19T08:40:00Z" level=debug clusterName=sust070a-vl1047 runid=361 msg="isiSvc already initialized, skip probing" file="/go/src/service/service.go:376"
time="2024-01-19T08:40:00Z" level=debug clusterName=sust070a-vl1047 runid=361 msg="Replication flag unset" file="/go/src/service/controller.go:317"
time="2024-01-19T08:40:00Z" level=info clusterName=sust070a-vl1047 runid=361 msg="Limit parameters considered for quota creation SoftLimit: '' , AdvisoryLimit: '',SoftGracePrd: ''" file="/go/src/service/controller.go:352"
file="/go/src/service/controller.go:352"
panic: runtime error: index out of range [2] with length 2
/go/pkg/mod/github.com/dell/gocsi@v1.9.0/utils/utils_middleware.go:99 +0x37
github.com/dell/gocsi.(*StoragePlugin).injectContext(0xc000392e00, {0x1f3c050, 0xc00078bb00}, {0x1b941c0, 0xc0001982a0}, 0x1b11640?, 0xc000303680)
/go/pkg/mod/github.com/dell/gocsi@v1.9.0/middleware.go:231 +0xa3
github.com/dell/gocsi.(*StoragePlugin).Serve.func1.ChainUnaryServer.func2.1.1({0x1f3c050?, 0xc00078bb00?}, {0x1b941c0?, 0xc0001982a0?})
/go/pkg/mod/github.com/dell/gocsi@v1.9.0/utils/utils_middleware.go:99 +0x37
github.com/dell/csi-isilon/v2/service/interceptor.(*rewriteRequestIDInterceptor).handleServer(0xc00076d3e0?, {0x1f3c050, 0xc00078bb00}, {0x1b941c0, 0xc0001982a0}, 0x19b22c0?, 0xc0003036a0)
/go/src/service/interceptor/interceptor.go:46 +0x133
github.com/dell/gocsi.(*StoragePlugin).Serve.func1.ChainUnaryServer.func2.1.1({0x1f3c050?, 0xc00078bb00?}, {0x1b941c0?, 0xc0001982a0?})
/go/pkg/mod/github.com/dell/gocsi@v1.9.0/utils/utils_middleware.go:99 +0x37
github.com/dell/csi-isilon/v2/service/interceptor.(*interceptor).createVolume(0xc0003234a0, {0x1f3c050, 0xc00078bb00}, 0xc0001982a0, 0xc00062d968?, 0xc0003036c0)
/go/src/service/interceptor/interceptor.go:228 +0x50b
github.com/dell/csi-isilon/v2/service/interceptor.NewCustomSerialLock.func1({0x1f3c050?, 0xc00078bb00?}, {0x1b941c0?, 0xc0001982a0?}, 0xc000762900?, 0xc000303600?)
/go/src/service/interceptor/interceptor.go:114 +0x93
github.com/dell/gocsi.(*StoragePlugin).Serve.func1.ChainUnaryServer.func2.1.1({0x1f3c050?, 0xc00078bb00?}, {0x1b941c0?, 0xc0001982a0?})
/go/pkg/mod/github.com/dell/gocsi@v1.9.0/utils/utils_middleware.go:99 +0x37
github.com/dell/gocsi.(*StoragePlugin).Serve.func1.ChainUnaryServer.func2({0x1f3c050, 0xc00078bb00}, {0x1b941c0, 0xc0001982a0}, 0xc000950a38?, 0x19d3060?)
/go/pkg/mod/github.com/dell/gocsi@v1.9.0/utils/utils_middleware.go:106 +0x99
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler({0x1c742e0?, 0xc000312000}, {0x1f3c050, 0xc00078bb00}, 0xc0001981c0, 0xc0004ee280)
/go/pkg/mod/github.com/container-storage-interface/spec@v1.6.0/lib/go/csi/csi.pb.go:5680 +0x135
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0005741e0, {0x1f44240, 0xc000583860}, 0xc000762900, 0xc000201dd0, 0x2d7b7c0, 0x0)
/go/pkg/mod/google.golang.org/grpc@v1.57.1/server.go:1358 +0xe15
google.golang.org/grpc.(*Server).handleStream(0xc0005741e0, {0x1f44240, 0xc000583860}, 0xc000762900, 0x0)
/go/pkg/mod/google.golang.org/grpc@v1.57.1/server.go:1735 +0x9e7
google.golang.org/grpc.(*Server).serveStreams.func1.1()
/go/pkg/mod/google.golang.org/grpc@v1.57.1/server.go:970 +0xbb
created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 74
/go/pkg/mod/google.golang.org/grpc@v1.57.1/server.go:981 +0x145
Screenshots
No response
Additional Environment Information
The Isilon Nodes is running Code: 9.4.0.14.
VolumeSnapshot:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"snapshot.storage.k8s.io/v1","kind":"VolumeSnapshot","metadata":{"annotations":{},"name":"elasticsearch-master-elasticsearch-master-0-2024-01-16-20230116-before-scaling","namespace":"zeebe-delta"},"spec":{"source":{"persistentVolumeClaimName":"elasticsearch-master-elasticsearch-master-0"},"volumeSnapshotClassName":"file-share-volumesnapshotclass-dr-off"}}
creationTimestamp: "2024-01-16T13:05:40Z"
finalizers:
generation: 1
name: elasticsearch-master-elasticsearch-master-0-2024-01-16-20230116-before-scaling
namespace: zeebe-delta
resourceVersion: "6052036685"
uid: c64afb97-7142-48ae-94f8-2c3a0e05e76b
spec:
source:
persistentVolumeClaimName: elasticsearch-master-elasticsearch-master-0
volumeSnapshotClassName: file-share-volumesnapshotclass-dr-off
status:
boundVolumeSnapshotContentName: snapcontent-c64afb97-7142-48ae-94f8-2c3a0e05e76b
creationTime: "2024-01-16T14:04:10Z"
readyToUse: true
restoreSize: "20694458"
--
isi snapshot snapshots view snapshot-c64afb97-7142-48ae-94f8-2c3a0e05e76b
Alias Target ID: -
Alias Target Name: -
Created: 2024-01-16T15:04:10
Expires: -
Size: 175.45M
Shadow Bytes: 0.00
% Reserve: 0.00%
% Filesystem: 0.00%
State: active
ls -l /ifs/az-sust070a-tst/tst-dr-off-csi-sebshift/dev01-1bf75cb38d
ls: /ifs/az-sust070a-tst/tst-dr-off-csi-sebshift/dev01-1bf75cb38d: No such file or directory
Steps to Reproduce
Create a VolumeSnapshot and mount it as a PVC.
Remove the directory for the snapshot on the isilon server.
Expected Behavior
I had hoped that the controller times out after a while and keeps running.
CSM Driver(s)
isilon v2.9.0
Installation Type
The Operator was Manually using the without OLM using: "bash scripts/install.sh"
Container Storage Modules Enabled
resiliency v1.8.0
observability v1.7.0
Container Orchestrator
OpenShift 4.13.27
Operating System
RHEL 9.2
The text was updated successfully, but these errors were encountered: