Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to use "local" volumes? #609

Open
remoe opened this issue Dec 30, 2019 · 3 comments
Open

Support to use "local" volumes? #609

remoe opened this issue Dec 30, 2019 · 3 comments

Comments

@remoe
Copy link

remoe commented Dec 30, 2019

Since Kubernetes v1.14 it's possible to use "local" volumes:

https://kubernetes.io/docs/concepts/storage/volumes/#local

This is currently not possible in typhoon, because of this:

https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/faqs.md#volume-does-not-exist-with-containerized-kubelet

It would work when one add the following (as an example) lines here:

https://github.com/poseidon/typhoon/blob/master/bare-metal/container-linux/kubernetes/cl/worker.yaml#L78

          --volume mntdisks,kind=host,source=/mnt \
          --mount volume=mntdisks,target=/mnt \

And then it would be possible to mount disks at "/mnt/*"

Thoughts?

@dghubble
Copy link
Member

dghubble commented Jan 7, 2020

Perhaps, I'm not opposed to local volumes. But I find they offer limited value beyond hostPath.

I'd be remiss to not begin by saying that we all (know we) ought to avoid storing data on specific Kubernetes nodes (regardless of mechanism). Nevertheless, there are plenty of cases where node storage (by which I'll refer to both) is an unfortunate neccessity / temporary tradeoff while we aspire for better / etc.[1]

In an example situation, you might:

  • declare a hostPath in your Deployment and use a nodeSelector to pin the resulting Pod to the node possessing the hostPath
  • declare a persistent local volume with a node selector, allowing the Deployment to be written as though pod scheduling does not rely on a specific node

In both cases, the result is the same - a Pod requiring a specific mount on a specific host. Some would say the first Deployment looks uglier. I'd say it more clearly exposes the real situation, while the local volume tends to mask it (e.g. debug why the pod isn't scheduling by inferring which of its volumes expresses its own selector logic). Some of the local volume motives were around making hostPath "feel" like it is any other volume. Another factor is that local volumes require more moving parts (e.g. it must pass through the scheduler, whereas hostPath is decoupled and works with static pods too). I think local volumes might show their merit if you could entirely eliminate and forbid hostPath, as it might limit host access, but that's rather unlikely. I prefer hostPath, but I'm not opposed to local volumes as more-or-less the same.

There are a few local volume matters that require consideration.

Mounts

For the mounts, Kubelet should not mount /mnt. /mnt is quite a common location for data volumes and its unexpected that mounting there would expose your data to the Kubelet (especially when Kubelet can modify it, below). A better approach would be to carve out mount subdirectories where an admin might mount disks or other storage components that should be exposed as local volumes.

A possible option might be:

/mnt
└── kubernetes-local

Prefixed with Kubernetes for clear opt-in and to allow for future node volume types.

SELinux

With Fedora CoreOS in the mix, SELinux is increasingly a first-class citizen and concern. I don't think its appropriate for Kubelet to relabel mounts of an end user's data, so podman Kubelets should not use relabel options. I suspect this will require some guidance, as users will need to prepare local volumes that align with the SELinux labels of the existing node-local data. I've not tested the various pitfalls around this area.

Bare-Metal Only

Finally, I'd scope this to bare-metal only. Workers on cloud platforms are homogeneous and really ought to be treated as entirely fungible. I'm not keen to provide additional features for workloads to rely on node storage (by neccessity you do have hostPath). I think bare-metal has a more legitmate claim. There, nodes may be very unique (fancy storage arrays of various kinds on particular nodes) and it can be reasonable to think an admin would invest effort into repairing a faulty storage component (i.e. use of hostPath and local volume is more justified when machines are pets that get groomed and loved).

[1]: Some node storage cases are entirely appropriate and justified (e.g. control plane DaemonSets). For these, hostPath is used.

@remoe
Copy link
Author

remoe commented May 1, 2020

sample update for fcos (tested with latest (1.18.2) fcos-typhoon):

    # ...
    - name: kubelet.service
    # ...
        ExecStartPre=/bin/mkdir -p /var/mnt/kubernetes-local/drive0
        ExecStart=/usr/bin/podman run --name kubelet \ 
        # ...
          --volume /var/mnt/kubernetes-local/drive0:/var/mnt/kubernetes-local/drive0:z \

fcos ignition sample:

variant: fcos
version: 1.0.0
storage:
  filesystems:
    - path: /var/mnt/kubernetes-local/drive0
      device: /dev/vdb
      format: ext4
systemd:
  units:
    - name: var-mnt-kubernetes\x2dlocal-drive0.mount
      enabled: true
      contents: |
        [Unit]
        Description=Mount peristent to /var/mnt/kubernetes-local/drive0
        Before=local-fs.target
        [Mount]
        Where=/var/mnt/kubernetes-local/drive0
        What=/dev/vdb
        Type=ext4
        [Install]
        WantedBy=local-fs.target 

and the corresponding PV:

spec:
  storageClassName: local-storage 
  local:
    path: /var/mnt/kubernetes-local/drive0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: "kubernetes.io/hostname"
          operator: "In"
          values: 
            - "your.host.name"    

@jharmison-redhat
Copy link

I'd like to +1 the bare-metal local volume case, and expand the target use case beyond regular "bare metal" considerations to include software-defined storage composed of bare-metal block devices (e.g. Rook). I am building a NUC cluster running Typhoon right now and will be extending the Typhoon modules to support my specific use case - that is, NVME boot drives with SATA block devices, so a slight modification of the above example.

I think some more use cases and input would be healthy to build a better solution, though. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants