Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Pod filesystem not resized while volume gets succesfully expanded #1626

Open
petarvranesevic opened this issue Dec 3, 2024 · 16 comments
Assignees
Labels
area/csi-powermax Issue pertains to the CSI Driver for Dell EMC PowerMax type/bug Something isn't working. This is the default label associated with a bug issue.

Comments

@petarvranesevic
Copy link

Bug Description

While expanding a pvc, the pvc gets successfully expanded and the volume is showing the resize in Unisphere on Powermax. When inspecting mounted pvc inside the pod, it is still showing the size from before the resize. When inspecting the block device, inside the pod, the disk is resized correctly.

Logs

Output of volumexpansiontest.sh

user@host:~/csi-powermax/test/helm$ ./volumeexpansiontest.sh -n delltest -s powermax
installing a 1 volume container
NAME: 1vol
LAST DEPLOYED: Tue Dec  3 13:49:31 2024
NAMESPACE: delltest
STATUS: deployed
REVISION: 1
TEST SUITE: None
waiting 60 seconds on pod to initialize
Name:             powermaxtest-0
Namespace:        delltest
Priority:         0
Service Account:  powermaxtest
Node:             ux-test-v-compute0.ocp.example.com/10.201.91.50
Start Time:       Tue, 03 Dec 2024 13:49:34 +0100
Labels:           app=powermaxtest
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=powermaxtest-7f579f6546
                  statefulset.kubernetes.io/pod-name=powermaxtest-0
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.130.0.55/23"],"mac_address":"0a:58:0a:82:00:37","gateway_ips":["10.130.0.1"],"routes":[{"dest":"10.128.0.0...
                  k8s.v1.cni.cncf.io/network-status:
                    [{
                        "name": "ovn-kubernetes",
                        "interface": "eth0",
                        "ips": [
                            "10.130.0.55"
                        ],
                        "mac": "0a:58:0a:82:00:37",
                        "default": true,
                        "dns": {}
                    }]
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.130.0.55
IPs:
  IP:           10.130.0.55
Controlled By:  StatefulSet/powermaxtest
Containers:
  test:
    Container ID:  cri-o://ad86a16347c562d75ef78e40fcb8e9dd4b4f6f4019d1f9381c10271e63d7dc69
    Image:         docker.io/centos:latest
    Image ID:      docker.io/library/centos@sha256:a1801b843b1bfaf77c501e7a6d3f709401a1e0c83863037fa3aab063a7fdb9dc
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sleep
      3600
    State:          Running
      Started:      Tue, 03 Dec 2024 13:49:42 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data0 from pvol0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fcv8h (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  pvol0:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvol0
    ReadOnly:   false
  kube-api-access-fcv8h:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From                     Message
  ----     ------                  ----  ----                     -------
  Normal   Scheduled               58s   default-scheduler        Successfully assigned delltest/powermaxtest-0 to ux-test-v-compute0.ocp.example.com
  Normal   SuccessfulAttachVolume  55s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pmax-125dadebed"
  Warning  FailedMount             54s   kubelet                  MountVolume.MountDevice failed for volume "pmax-125dadebed" : rpc error: code = Internal desc = failure checking volume (Array: 000297600499, Volume: 00161)status csm-authorization: token has expired
  Normal   AddedInterface          52s   multus                   Add eth0 [10.130.0.55/23] from ovn-kubernetes
  Normal   Pulling                 52s   kubelet                  Pulling image "docker.io/centos:latest"
  Normal   Pulled                  51s   kubelet                  Successfully pulled image "docker.io/centos:latest" in 1.336s (1.336s including waiting)
  Normal   Created                 50s   kubelet                  Created container test
  Normal   Started                 50s   kubelet                  Started container test
Name:             powermaxtest-0
Namespace:        delltest
Priority:         0
Service Account:  powermaxtest
Node:             ux-test-v-compute0.ocp.example.com/10.201.91.50
Start Time:       Tue, 03 Dec 2024 13:49:34 +0100
Labels:           app=powermaxtest
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=powermaxtest-7f579f6546
                  statefulset.kubernetes.io/pod-name=powermaxtest-0
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.130.0.55/23"],"mac_address":"0a:58:0a:82:00:37","gateway_ips":["10.130.0.1"],"routes":[{"dest":"10.128.0.0...
                  k8s.v1.cni.cncf.io/network-status:
                    [{
                        "name": "ovn-kubernetes",
                        "interface": "eth0",
                        "ips": [
                            "10.130.0.55"
                        ],
                        "mac": "0a:58:0a:82:00:37",
                        "default": true,
                        "dns": {}
                    }]
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Running
SeccompProfile:   RuntimeDefault
IP:               10.130.0.55
IPs:
  IP:           10.130.0.55
Controlled By:  StatefulSet/powermaxtest
Containers:
  test:
    Container ID:  cri-o://ad86a16347c562d75ef78e40fcb8e9dd4b4f6f4019d1f9381c10271e63d7dc69
    Image:         docker.io/centos:latest
    Image ID:      docker.io/library/centos@sha256:a1801b843b1bfaf77c501e7a6d3f709401a1e0c83863037fa3aab063a7fdb9dc
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sleep
      3600
    State:          Running
      Started:      Tue, 03 Dec 2024 13:49:42 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data0 from pvol0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fcv8h (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  pvol0:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvol0
    ReadOnly:   false
  kube-api-access-fcv8h:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From                     Message
  ----     ------                  ----  ----                     -------
  Normal   Scheduled               58s   default-scheduler        Successfully assigned delltest/powermaxtest-0 to ux-test-v-compute0.ocp.example.com
  Normal   SuccessfulAttachVolume  55s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pmax-125dadebed"
  Warning  FailedMount             54s   kubelet                  MountVolume.MountDevice failed for volume "pmax-125dadebed" : rpc error: code = Internal desc = failure checking volume (Array: 000297600499, Volume: 00161)status csm-authorization: token has expired
  Normal   AddedInterface          52s   multus                   Add eth0 [10.130.0.55/23] from ovn-kubernetes
  Normal   Pulling                 52s   kubelet                  Pulling image "docker.io/centos:latest"
  Normal   Pulled                  51s   kubelet                  Successfully pulled image "docker.io/centos:latest" in 1.336s (1.336s including waiting)
  Normal   Created                 50s   kubelet                  Created container test
  Normal   Started                 50s   kubelet                  Started container test
/dev/mapper/mpathb   8154332       24   7718496   1% /data0
/dev/mapper/mpathb on /data0 type ext4 (rw,relatime,seclabel)
done installing a 1 volume container
marking volume
creating a file on the volume
total 16
drwxrws---. 2 root       1001040000 16384 Dec  3 12:49 lost+found
-rw-r--r--. 1 1001040000 1001040000     0 Dec  3 12:50 orig

calculating the initial size of the volume
INITIAL SIZE:  7.8G


calculating checksum of /data0/orig
d41d8cd98f00b204e9800998ecf8427e /data0/orig


expanding the volume
Warning: resource persistentvolumeclaims/pvol0 is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
persistentvolumeclaim/pvol0 configured
Processing: ########################################################################################

Output of oc get events

14m         Normal    Pulling                      pod/powermaxtest-0            Pulling image "docker.io/centos:latest"
14m         Normal    AddedInterface               pod/powermaxtest-0            Add eth0 [10.130.0.55/23] from ovn-kubernetes
14m         Normal    Pulled                       pod/powermaxtest-0            Successfully pulled image "docker.io/centos:latest" in 1.336s (1.336s including waiting)
14m         Normal    Created                      pod/powermaxtest-0            Created container test
14m         Normal    Started                      pod/powermaxtest-0            Started container test
13m         Warning   ExternalExpanding            persistentvolumeclaim/pvol0   waiting for an external controller to expand this PVC
13m         Normal    Resizing                     persistentvolumeclaim/pvol0   External resizer is resizing volume pmax-125dadebed
13m         Normal    FileSystemResizeRequired     persistentvolumeclaim/pvol0   Require file system resize of volume on node
12m         Normal    FileSystemResizeSuccessful   persistentvolumeclaim/pvol0   MountVolume.NodeExpandVolume succeeded for volume "pmax-125dadebed" ux-test-v-compute0.ocp.example.com
12m         Normal    FileSystemResizeSuccessful   pod/powermaxtest-0            MountVolume.NodeExpandVolume succeeded for volume "pmax-125dadebed" ux-test-v-compute0.ocp.example.com

Output of df -h inside pod

user@host:~$ oc rsh pods/powermaxtest-0 df -h
Filesystem          Size  Used Avail Use% Mounted on
overlay             120G   18G  102G  15% /
tmpfs                64M     0   64M   0% /dev
shm                  64M     0   64M   0% /dev/shm
tmpfs                41G   72M   41G   1% /etc/passwd
/dev/mapper/mpathb  7.8G   24K  7.4G   1% /data0
/dev/vda4           120G   18G  102G  15% /etc/hosts
tmpfs               124G   24K  124G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                63G     0   63G   0% /proc/acpi
tmpfs                63G     0   63G   0% /proc/scsi
tmpfs                63G     0   63G   0% /sys/firmware

Output of lsblk inside pod

sh-4.4$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0     4G  0 disk
sdb      8:16   0     4G  0 disk
sdc      8:32   0    15G  0 disk
sdd      8:48   0    15G  0 disk
sr0     11:0    1   1.1G  0 rom
vda    252:0    0   120G  0 disk
|-vda1 252:1    0     1M  0 part
|-vda2 252:2    0   127M  0 part
|-vda3 252:3    0   384M  0 part
`-vda4 252:4    0 119.5G  0 part /dev/termination-log

Screenshots

No response

Additional Environment Information

Openshift 4.16.9
Powermax 8000
Unisphere for Powermax 10.1.0.0

Steps to Reproduce

Run volumexpansiontest.sh from https://github.com/dell/csi-powermax/tree/main/test/helm

Expected Behavior

Filesystem inside pod gets successfully expanded to the same size as pvc.

CSM Driver(s)

CSM Operator 1.7.0
CSI v2.12.0 for Powermax

Installation Type

Operator

Container Storage Modules Enabled

CSM for Authorization v2.0.0

Container Orchestrator

Openshift 4.16.9

Operating System

RHEL CoreOS

@petarvranesevic petarvranesevic added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Dec 3, 2024
@csmbot
Copy link
Collaborator

csmbot commented Dec 3, 2024

@petarvranesevic: Thank you for submitting this issue!

The issue is currently awaiting triage. Please make sure you have given us as much context as possible.

If the maintainers determine this is a relevant issue, they will remove the needs-triage label and respond appropriately.


We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at container.storage.modules@dell.com.

@atye
Copy link
Contributor

atye commented Dec 3, 2024

Hi @petarvranesevic. Could you paste the yaml of the powermax storage class and the description of the PVC? My initial thoughts is that the filesystem may not support online resize and the pod would have to be restarted for the resize to occur.

@atye atye self-assigned this Dec 3, 2024
@petarvranesevic
Copy link
Author

petarvranesevic commented Dec 4, 2024

powermax storageclass yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: powermax
  uid: d6415fb4-17f4-415b-ba6d-03d78e09f9a5
  resourceVersion: '137461910'
  creationTimestamp: '2024-12-03T08:52:08Z'
  annotations:
    description: powermax
provisioner: csi-powermax.dellemc.com
parameters:
  SLA: Gold
  SRP: SRP_1
  SYMID: 000297600499
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

PVC description

Name:          pvol0
Namespace:     delltest
StorageClass:  powermax
Status:        Bound
Volume:        pmax-125dadebed
Labels:        app.kubernetes.io/managed-by=Helm
Annotations:   meta.helm.sh/release-name: 1vol
               meta.helm.sh/release-namespace: delltest
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi-powermax.dellemc.com
               volume.kubernetes.io/selected-node: ux-test-v-compute0.ocp.example.com
               volume.kubernetes.io/storage-provisioner: csi-powermax.dellemc.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      15Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       powermaxtest-0
Events:        <none>

We also tested it with a storageclass named powermax-xfs with the paramater: csi.storage.k8s.io/fstype: xfs, which also didn't work.

@thikade
Copy link

thikade commented Dec 4, 2024

We also tried restarting the pod, but the FS was still not resized after restarting (tested both with ext4 and xfs FS).

@atye
Copy link
Contributor

atye commented Dec 4, 2024

/sync

@atye atye added area/csi-powermax Issue pertains to the CSI Driver for Dell EMC PowerMax and removed needs-triage Issue requires triage. labels Dec 4, 2024
@csmbot
Copy link
Collaborator

csmbot commented Dec 4, 2024

link: 30973

@atye atye assigned AkshaySainiDell and unassigned atye Dec 5, 2024
@AkshaySainiDell
Copy link
Contributor

Hi @petarvranesevic ,

I wanted to provide you with an update on the issue you reported. I tried to reproduce the issue on both a Kubernetes cluster and an OCP cluster with PowerMax, but was unsuccessful. The volume expansion test passed in both environments, and the volume size in the pod expanded to 15G without any problems. I ran the test multiple times, and all attempts were successful.
image

One difference I noticed is that you have the Authorization module enabled. Could you please confirm if the issue is observed only with the Authorization module enabled, or if it also occurs with just the PowerMax driver?

Please let me know if there are any additional steps or specific conditions under which you observed the issue, so I can try to replicate it more accurately.

Thank you for your patience and cooperation.

@petarvranesevic
Copy link
Author

Hi @AkshaySainiDell.

Yes we are testing it with the Authorization module enabled. We can't test without the Authorization module.

The issue appears to be a missing filesystem resize inside the pod, as the pvc as well as the volume in PowerMax are showing 15G.
When inspecting with df -h inside of the pod, the filesystem is showing 8G.

Could you try it with the Authorization module enabled?

@petarvranesevic
Copy link
Author

When starting the test:

user@host:~$ oc rsh powermaxtest-0 lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0     8G  0 disk
sdb      8:16   0     8G  0 disk
sr0     11:0    1   1.1G  0 rom
vda    252:0    0   120G  0 disk
|-vda1 252:1    0     1M  0 part
|-vda2 252:2    0   127M  0 part
|-vda3 252:3    0   384M  0 part
`-vda4 252:4    0 119.5G  0 part /dev/termination-log
user@host:~$ oc rsh powermaxtest-0 df -h
Filesystem          Size  Used Avail Use% Mounted on
overlay             120G   17G  103G  15% /
tmpfs                64M     0   64M   0% /dev
shm                  64M     0   64M   0% /dev/shm
tmpfs                31G   69M   31G   1% /etc/passwd
/dev/mapper/mpathd  7.8G   24K  7.4G   1% /data0
/dev/vda4           120G   17G  103G  15% /etc/hosts
tmpfs               125G   24K  125G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                63G     0   63G   0% /proc/acpi
tmpfs                63G     0   63G   0% /proc/scsi
tmpfs                63G     0   63G   0% /sys/firmware

On step "expanding the volume"

user@host:~$ oc rsh powermaxtest-0 lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0    15G  0 disk
sdb      8:16   0    15G  0 disk
sr0     11:0    1   1.1G  0 rom
vda    252:0    0   120G  0 disk
|-vda1 252:1    0     1M  0 part
|-vda2 252:2    0   127M  0 part
|-vda3 252:3    0   384M  0 part
`-vda4 252:4    0 119.5G  0 part /dev/termination-log
user@host:~$ oc rsh powermaxtest-0 df -h
Filesystem          Size  Used Avail Use% Mounted on
overlay             120G   17G  103G  15% /
tmpfs                64M     0   64M   0% /dev
shm                  64M     0   64M   0% /dev/shm
tmpfs                31G   69M   31G   1% /etc/passwd
/dev/mapper/mpathd  7.8G   24K  7.4G   1% /data0
/dev/vda4           120G   17G  103G  15% /etc/hosts
tmpfs               125G   24K  125G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                63G     0   63G   0% /proc/acpi
tmpfs                63G     0   63G   0% /proc/scsi
tmpfs                63G     0   63G   0% /sys/firmware

Powermax volume info

image

@AkshaySainiDell
Copy link
Contributor

Thanks @petarvranesevic for sharing the info. I'll try with Authorization enabled.
Additionally, could you please confirm the results of the volume expansion test—pass or fail?

@petarvranesevic
Copy link
Author

Thank you @AkshaySainiDell . The test stays in the processing state and we had to manually stop it. We let it run for ~30 min.

@thikade
Copy link

thikade commented Dec 10, 2024

Hey @AkshaySainiDell - why do you think this is related to Authorization component? Since the volume is resized successfully on storage, and also visible as the resized disk volume inside container(via lsblk), the only thing missing is the online filesystem resize operation, which iirc should be done by the CSI driver on the node, no?

@karthikk92
Copy link

karthikk92 commented Dec 17, 2024

Installation of powermax driver with authorization enabled installed as expected and then tried volume expand test case using scripts and manually , both ways it works as expected and expanded size increased to 15GB as attached below :

image-2024-12-16-16-17-54-718

Attached the driver logs and the script logs for the same.

@karthikk92
Copy link

Since we could not reproduce the issue with both authorization enabled( (though this seems unlikely since the auth module shouldn't affect volume expansion)/disabled and this could be issue related to customers environment.

Could you please capture the driver logs from the customer environment so that we can analyse if any issues from the driver side.

Another approach would be trying volume expansion manually.

@petarvranesevic
Copy link
Author

petarvranesevic commented Dec 18, 2024

We added the driver logs to the Dell SR: SR#202405925

@AkshaySainiDell
Copy link
Contributor

@petarvranesevic Thanks for sharing the logs.

I reviewed the multipath and system logs but didn't find any errors related to node expansion. However, the driver logs show an inconsistency in the volume identifier.

The expected format is csi-<cluster-prefix>-<vol-name>-<vol-namespace>-<systemid>-<vol-id>, but your logs include an extra ux- prefix (ux-csi-001-pmax-2b10283ccb-delltest-<systemid>-<vol-id>).

As per my analysis, this prefix is not added by the driver. Could you let me know the source of this prefix? Is there any configuration that adds this prefix to the volume identifier/name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-powermax Issue pertains to the CSI Driver for Dell EMC PowerMax type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

6 participants