Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostpath: Add block volume support #6

Merged

Conversation

darkowlzz
Copy link
Contributor

@darkowlzz darkowlzz commented Jan 25, 2019

This change adds block volume support to hostpath driver.

When a block volume request is received, a block file is created at
provisionRoot with the requested capacity as size and a loop device is
created associated with the block file.

At node publish, a bind mount of the loop device is created at the
publish target path.

At node unpublish, the target path is unmounted and deleted.

At volume delete, loop device is disassociated and the block file is
deleted.

Add plugins-dir to hostpath plugin daemonset
The volume publish target path for block devices are usually under
/var/lib/kubelet/plugins directory. Hence, adding plugins directory to
the pod volumes with bidirectional mount propagation.

Run the plugin as privileged to use loop devices
In order to share loop devices with the host, the plugin container must
be run as a privileged container.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 25, 2019
@darkowlzz
Copy link
Contributor Author

I tried this with canary image of csi-provisioner and the ValidateVolumeCapabilities changes from #5, and it worked as expected. Created a block device and was able to access the device from a pod under /dev/.

root@my-csi-app:/# ls -l /dev/xfoo 
brw-rw---- 1 root disk 7, 0 Jan 30 17:55 /dev/xfoo

The same was also listed in the pod description:

...
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f7df9 (ro)
    Devices:
      /dev/xfoo from my-csi-volume
...

The publish target path for block devices are at /var/lib/kubelet/plugins, hence we have to add that path to the pod with bidirectional mount propagation.

of := fmt.Sprintf("%s=%s", "of", path)
count := fmt.Sprintf("%s=%d", "count", capacity/mib)
// Create a block file.
out, err := executor.Command("dd", "if=/dev/zero", of, "bs=1M", count).CombinedOutput()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use fallocate [1] instead of dd? It's way faster.

[1] http://man7.org/linux/man-pages/man1/fallocate.1.html


executor := utilexec.New()
// Get a free loop device.
out, err := executor.Command("losetup", "-f").CombinedOutput()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two node publish calls can race here and choose the same loop device. It's possible to remove those invocation of losetup and just pass the -f flag to the invocation below, then also pass --show to find out which device it chose.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In k/k, makeLoopDevice is doing the way that @bswartz mentioned. So, it would be worth checking this code.

losetupArgs = append(losetupArgs, loopDevice, vol.VolPath)

// Associate block file with the loop device.
out, err = executor.Command("losetup", losetupArgs...).CombinedOutput()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not idempotent. If you get killed between the call to "losetup" and "ln", when you restart and come back into this method, you'll create another loop device, leaking the old one.

}

// Create symlink to the target path.
out, err = executor.Command("ln", "-s", loopDevice, targetPath).CombinedOutput()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a check higher up in this function to see if the symlink already exists and points to the right place, for idempotency. Otherwise you'll end up in a failure loop when the link already exists because "ln" without "-f" won't overwrite existing symlinks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, for idempotency, CSI community recommends to use bind mount of device, instead of symlink. If you use bind mount, your can check if it is already done, by checking if file already exists in targetPath and if the loopDevice is already mounted to targetPath.

switch vol.VolAccessType {
case blockAccess:
// Remove the symlink.
os.RemoveAll(targetPath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error check?

Copy link

@mkimuram mkimuram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkowlzz

Thank you for working on adding block volume support codes to csi hostpath driver.
It's a good idea to make loopback device visible to pod as a block volume for drivers that originally provide storage as filesystem, not block volume.

I'm adding some comments.


executor := utilexec.New()
// Get a free loop device.
out, err := executor.Command("losetup", "-f").CombinedOutput()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In k/k, makeLoopDevice is doing the way that @bswartz mentioned. So, it would be worth checking this code.

}

// Create symlink to the target path.
out, err = executor.Command("ln", "-s", loopDevice, targetPath).CombinedOutput()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, for idempotency, CSI community recommends to use bind mount of device, instead of symlink. If you use bind mount, your can check if it is already done, by checking if file already exists in targetPath and if the loopDevice is already mounted to targetPath.

@bswartz
Copy link

bswartz commented Feb 1, 2019

BTW I understand that the host path driver isn't meant for production and is just for testing, so my comments about idempotency are just suggestions. Given the low likelihood that non-idempotency will cause any actual issues during testing, I wouldn't spend a ton of time on that. Still there are a few sections of code that could be improved easily. See mkimuram's comments.

@darkowlzz darkowlzz force-pushed the hostpath-block-support branch from 1a9ef86 to 083b8fc Compare February 17, 2019 14:58
@darkowlzz
Copy link
Contributor Author

Hi, I've added some more commits addressing the previous reviews. Keeping the old commits for now, in case I need to refer back. Will squash all the commits once the changes are final.

I moved the loop device creation from NodePublish to CreateVolume and also moved the corresponding loop device deletion from NodeUnpublish to DeleteVolume. Now NodePublish just checks the symlink targets of a given target path and creates symlink if needed. And NodeUnpublish deletes the symlinks. The volume ID is set to the loop device base name, avoiding the need to maintain any volume info in memory.

Regarding the usage of makeLoopDevice (#6 (comment)) like in k/k, I was unable to do that with the current version of losetup in alpine base container. Looks like the losetup packed with busybox is old and doesn't have --show option. Hence, doing losetup -f /file/path would associate the file to a loop device, but we can't get the name of the loop device used. Should we use some other base container image or maybe replace the old losetup with a new losetup binary?

Regarding using bind mount (#6 (comment)), I'm a little confused about that. Trying to bind mount a loop device associated with a block device fails unless the block device has a filesystem. AFAIK, we are trying to provide raw block here. Should we format it and then bind mount?
Maybe I'm missing something here. I would need some help in understanding that.

Also, I've added a security context to run the plugin as privileged in the plugin deployment manifest. Without this, loop device creation was failing with file not found error, because it wasn't shared with the host.

@pohly
Copy link
Contributor

pohly commented Feb 18, 2019

The version of losetup in the base alpine image is probably from busybox. You should be able to install the util-linux package as part of the Dockerfile to get the full-featured version.

@darkowlzz
Copy link
Contributor Author

The version of losetup in the base alpine image is probably from busybox. You should be able to install the util-linux package as part of the Dockerfile to get the full-featured version.

Yeah, I just tried doing apk add util-linux in an alpine container and that updated losetup to a newer version with support for --show. Thanks, I'll add this in the Dockerfile.

@bswartz
Copy link

bswartz commented Feb 22, 2019

Regarding using bind mount (#6 (comment)), I'm a little confused about that. Trying to bind mount a loop device associated with a block device fails unless the block device has a filesystem. AFAIK, we are trying to provide raw block here. Should we format it and then bind mount?
Maybe I'm missing something here. I would need some help in understanding that.

When you bind mount a file, the destination must be an existing file. There's no need for a filesystem.

@darkowlzz darkowlzz force-pushed the hostpath-block-support branch from 083b8fc to 267cbdc Compare February 25, 2019 14:27
This change adds block volume support to hostpath driver.

When a block volume request is received, a block file is created at
provisionRoot with the requested capacity as size and a loop device is
created associated with the block file.

At node publish, a bind mount of the loop device is created at the
publish target path.

At node unpublish, the target path is unmounted and deleted.

At volume delete, loop device is disassociated and the block file is
deleted.

Add plugins-dir to hostpath plugin daemonset
The volume publish target path for block devices are usually under
/var/lib/kubelet/plugins directory. Hence, adding plugins directory to
the pod volumes with bidirectional mount propagation.

Run the plugin as privileged to use loop devices
In order to share loop devices with the host, the plugin container must
be run as a privileged container.
@darkowlzz darkowlzz force-pushed the hostpath-block-support branch from 267cbdc to ea76c5b Compare February 25, 2019 14:31
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 25, 2019
@darkowlzz
Copy link
Contributor Author

darkowlzz commented Feb 25, 2019

Block support works with bind mount now :)

I made some changes to how the statelessness was implemented. Instead of keeping the loop device name in volume ID, it is now queried through the updated version of losetup (-j option). Once a file is associated with a loop device, whenever there's a need to know the loop device name, it is queried through losetup with the block file as an argument.

Added new files in vendor because I've used the functions available in k8s.io/kubernetes/pkg/volume/util/volumepathhandler to attach device and get loop device.

@bswartz
Copy link

bswartz commented Mar 4, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2019
@msau42
Copy link
Collaborator

msau42 commented Mar 4, 2019

/approve
Please open up followup issues for the remaining work

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: darkowlzz, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit b862d4c into kubernetes-csi:master Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants