Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having secondary storage class reports 'did not have enough free storage'. #489

Closed
DeprecatedLuke opened this issue Nov 16, 2023 · 5 comments

Comments

@DeprecatedLuke
Copy link

What steps did you take and what happened:
Creating a secondary storage class 'large' seems to be very broken. It clearly shows that the capacity is available, but the scheduler is reporting 'did not have enough free storage'. This works just fine when using 'standard' storage class.

Failing line:

  Warning  FailedScheduling  27m                 default-scheduler  0/5 nodes are available: 1 node(s) did not have enough free storage, 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..

What did you expect to happen:
Pod to schedule normally.

The output of the following commands will help us better understand what's going on:

I1116 06:17:48.581747       1 grpc.go:72] GRPC call: /csi.v1.Controller/GetCapacity requests {"accessible_topology":{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"s8","kubernetes.io/os":"linux","openebs.io/nodeid":"s8","openebs.io/nodename":"s8"}},"parameters":{"compression":"lz4","dedup":"off","fstype":"zfs","poolname":"k8s-pvs-sd1","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{}}]}
I1116 06:17:48.581794       1 grpc.go:81] GRPC response: {"available_capacity":7541961883648}
scale:0} d:{Dec:<nil>} s:1334457180Ki Format:BinarySI}} {Name:k8s-pvs-sd1 UUID:1404655791832034090 Free:{i:{value:7541961883648 scale:0} d:{Dec:<nil>} s: Format:BinarySI}}], required=[{Name:k8s-pvs UUID:3563441934539523479 Free:{i:{value:1366482853888 scale:0} d:{Dec:<nil>} s: Format:BinarySI}} {Name:k8s-pvs-sd1 UUID:1404655791832034090 Free:{i:{value:7541961883648 scale:0} d:{Dec:<nil>} s: Format:BinarySI}}]
I1116 06:21:44.988151       1 zfsnode.go:110] zfs node controller: updating node object with &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:s8 GenerateName: Namespace:openebs SelfLink: UID:bbbd5b66-052f-4eb0-9a1f-596a21ad3a06 ResourceVersion:74568961 Generation:18253 CreationTimestamp:2023-11-03 14:17:30 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[{APIVersion:v1 Kind:Node Name:s8 UID:68e775a0-ba1a-4766-954b-1c215656b119 Controller:0xc000220198 BlockOwnerDeletion:<nil>}] Finalizers:[] ManagedFields:[{Manager:zfs-driver Operation:Update APIVersion:zfs.openebs.io/v1 Time:2023-11-16 06:20:44 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"68e775a0-ba1a-4766-954b-1c215656b119\"}":{}}},"f:pools":{}} Subresource:}]} Pools:[{Name:k8s-pvs UUID:3563441934539523479 Free:{i:{value:1366482853888 scale:0} d:{Dec:<nil>} s: Format:BinarySI}} {Name:k8s-pvs-sd1 UUID:1404655791832034090 Free:{i:{value:7541961883648 scale:0} d:{Dec:<nil>} s: Format:BinarySI}}]}
I1116 06:21:45.012038       1 zfsnode.go:114] zfs node controller: updated node object openebs/s8
I1116 06:21:45.013236       1 zfsnode.go:139] Got update event for zfs node openebs/s8

Anything else you would like to add:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
  annotations:
    "storageclass.kubernetes.io/is-default-class": "true"
parameters:
  recordsize: "4k"
  compression: "lz4"
  dedup: "off"
  fstype: "zfs"
  thinprovision: "yes"
  poolname: "k8s-pvs"
  shared: "yes"
provisioner: zfs.csi.openebs.io
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: large
parameters:
  recordsize: "128k"
  compression: "lz4"
  dedup: "off"
  fstype: "zfs"
  thinprovision: "yes"
  poolname: "k8s-pvs-sd1"
  shared: "yes"
provisioner: zfs.csi.openebs.io
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain

Environment:

  • 2.3.1 helm chart
  • 1.28.2
  • kubeadm
  • Debian 12
NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
k8s-pvs      1.49T   208G  1.29T        -         -    11%    13%  1.00x    ONLINE  -
k8s-pvs-sd1  6.98T   672K  6.98T        -         -     0%     0%  1.00x    ONLINE  -
@DeprecatedLuke
Copy link
Author

Confirmed that this issue only occurs when using a custom nodeid value.

@hrudaya21
Copy link
Contributor

Confirmed that this issue only occurs when using a custom nodeid value.

@DeprecatedLuke Can you please share the yaml file where you are using custom nodeid ?

@DeprecatedLuke
Copy link
Author

apiVersion: v1
kind: Node
metadata:
  annotations:
    csi.volume.kubernetes.io/nodeid: '{"csi.tigera.io":"s11","zfs.csi.openebs.io":"s11"}'
    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
    node.alpha.kubernetes.io/ttl: "0"
    projectcalico.org/IPv4Address: 10.1.1.3/24
    projectcalico.org/IPv4VXLANTunnelAddr: 192.168.219.192
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2023-11-02T09:34:24Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: s11
    kubernetes.io/os: linux
    ngxdev.com/virtualization-supported: "true"
    #
    # used to be s8, migrated to s11 via manifest magic since nodeid is broken.
    #
    openebs.io/nodeid: s11
    openebs.io/nodename: s11
    #
    #
  name: s11
  resourceVersion: "76744448"
  uid: cfb127c5-863b-4d98-bfa0-3124e4f87f0b
spec:
  podCIDR: 192.168.3.0/24
  podCIDRs:
  - 192.168.3.0/24
status:
  addresses:
  - address: 10.1.1.3
    type: InternalIP
  - address: s11
    type: Hostname
  allocatable:
    cpu: "32"
    ephemeral-storage: "242364885410"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 131475896Ki
    pods: "110"
  capacity:
    cpu: "32"
    ephemeral-storage: 262982732Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 131578296Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2023-11-16T01:21:43Z"
    lastTransitionTime: "2023-11-16T01:21:43Z"
    message: Calico is running on this node
    reason: CalicoIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2023-11-23T01:43:04Z"
    lastTransitionTime: "2023-11-16T01:21:36Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2023-11-23T01:43:04Z"
    lastTransitionTime: "2023-11-16T01:21:36Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2023-11-23T01:43:04Z"
    lastTransitionTime: "2023-11-16T01:21:36Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2023-11-23T01:43:04Z"
    lastTransitionTime: "2023-11-16T01:21:36Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  nodeInfo:
    architecture: amd64
    bootID: cce44334-2a88-456a-839c-7e8aec3bcd41
    containerRuntimeVersion: docker://24.0.7
    kernelVersion: 6.1.0-13-amd64
    kubeProxyVersion: v1.28.2
    kubeletVersion: v1.28.2
    machineID: ebb07cd77321441c8ac57ec146518fd0
    operatingSystem: linux
    osImage: Debian GNU/Linux 12 (bookworm)
    systemUUID: d7f6388e-5ec1-11ee-8742-aec0b28c3900

@sinhaashish
Copy link
Member

Can you try with the latest master now as PR #451 is merged.

@DeprecatedLuke
Copy link
Author

As I mentioned before I don't have this configuration setup anymore as I migrated the pv nodeid's by modifying ZFSVolume and recreating pvc's. But I can assume this is fixed since it was most likely mapping to a node which was no longer online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants