-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeVolumeUnpublish: tolerate repeated requests #139
NodeVolumeUnpublish: tolerate repeated requests #139
Conversation
Hi @okartau. Thanks for your PR. I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/ok-to-test |
Better add a default.
Please try to avoid code duplication if possible. |
4866976
to
246ccd1
Compare
I added Although we discussed off-line with @pohly that more idempotency-safety via mutex locking could be added in same pass with this PR, I think it's simpler to take smaller steps, as this PR solely improves the ability to withstand repeated single-thread operations, as can be demonstrated by kubernetes-csi/csi-test#229, once merged. |
pkg/hostpath/nodeserver.go
Outdated
@@ -204,24 +204,27 @@ func (ns *nodeServer) NodeUnpublishVolume(ctx context.Context, req *csi.NodeUnpu | |||
return nil, status.Error(codes.NotFound, err.Error()) | |||
} | |||
|
|||
// Check if the target path is really a mount point. If its not a mount point do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "If it's not a mount point do nothing."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a closer look, we tell "if (not) mount point" twice which is one too many. Same message can be given without repeating same words, and in one statement. And combining that with another improvement idea from below, I change that comment to "Unmount only if the target path is really a mount point."
pkg/hostpath/nodeserver.go
Outdated
// Check if the target path is really a mount point. If its not a mount point do nothing | ||
if notMnt, err := mount.IsNotMountPoint(mount.New(""), targetPath); notMnt || err != nil && !os.IsNotExist(err) { | ||
glog.V(4).Infof("hostpath: %s is not mount point, skip", targetPath) | ||
return &csi.NodeUnpublishVolumeResponse{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not mounted, we can only skip unmounting. We should not skip the rest of the function because a previous NodeUnpublishVolume
might have been interrupted after unmounting and before removing the target path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only skip unmounting and proceed , we have to make sure repeated calling of os.RemoveAll is safe. It seems os.RemoveAll returns error if called with non-existent directory (is it so?), means we have to check for existence before calling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better check for the "not-exist" error after calling it and ignore that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I also got idea to re-arrange those statements not to have everything packed on single line. (This line I copied from pmem-csi). But it improves readability to have it on different lines and check for err first, IMHO.
But also, I discovered that removal of directory in this functopn does not follow CSI spec which tells to remove the directory. Current code does it for VolAccessTyoe=Block only.
So what about:
- we remove the switch stmt so that there will be just os.RemoveAll.
- we add check for VolAccessType allowed values near start of function,
to avoid currently possible code path where some changes are made, then there is parameters check in the middle of function which can return without further changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I also got idea to re-arrange those statements not to have everything packed on single line. (This line I copied from pmem-csi). But it improves readability to have it on different lines and check for err first, IMHO.
I'm not sure I follow here.
if err := <do something>; <check err here>
is the canonical way to handle errors. If the line gets too long, then simply spread it over multiple lines.
Current code does it for VolAccessTyoe=Block only.
Yes, it should create (or try to create) the target directory in NodePublishVolume
and remove it in NodeUnpublishVolume
. It is a bug in Kubernetes that it creates the target path for filesystem volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow here.
What I meant is that instead of multiple ANDed and ORed login on same line:
if notMnt, err := mount.IsNotMountPoint(mount.New(""), targetPath); notMnt || err != nil && !os.IsNotExist(err) {
I prefer more straightforward structure:
notMnt, err := stmt
if err
.... and so on, having statement and checks separated for easier reading.
i.e. the code that is in PR now (I pushed new state close to my previous comment)
But what about my idea of removing switch stmt in middle. If you say we can trust volAccessType to be valid, does it mean then we can remove the switch altogether, without adding separate check closer to start of function ? And remove directory in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer more straightforward structure:
notMnt, err := stmt
if err
That's non-idiomatic Go. You can get the same readability with:
if notMnt, err := stmt;
err ...
If you say we can trust volAccessType to be valid, does it mean then we can remove the switch altogether, without adding separate check closer to start of function ?
Yes.
And remove directory in all cases?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed new state addressing raised concerns. gofmt has own opinion how to format that "if" part so I used that. The switch stmt is gone, we always remove the directory. And we do not check for VolAccessType any more, we trust the type is valid.
pkg/hostpath/nodeserver.go
Outdated
glog.V(4).Infof("hostpath: volume %s/%s has been unmounted.", targetPath, volumeID) | ||
default: | ||
return nil, status.Error(codes.Internal, fmt.Sprintf("unsupported access type %v", vol.VolAccessType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the unlikely case that we get here, we potentially did already some work (unmounting), but I think that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For proper handling, we should check earlier that VolAccessType is one of supported types, and refuse to make any changes if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's necessary. vol.VolAccessType
really should be valid, as it was set earlier by the driver itself.
246ccd1
to
0b60b66
Compare
/retest |
0b60b66
to
961ec99
Compare
// Unmounting the image | ||
err = mount.New("").Unmount(req.GetTargetPath()) | ||
err = mount.New("").Unmount(targetPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is no longer accurate. It should say "Unmounting the image or filesystem" now.
pkg/hostpath/nodeserver.go
Outdated
} | ||
// Delete the block file or mount point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "block file", do you mean the image file that the loop device is bound to? We don't delete that here.
I would just say "Delete the mount point." here, which is a correct statement, regardless whether that mount point is a directory (for filesystem mode) or file (for raw block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-used previous "block file" and added another part in that comment. Changed now as proposed, pushed.
One thing I notice when looking at latest code now, the recent change which took away "return OK" in "not mount point" case, likely shifted the potential idempotency failure point to next operation, os.RemoveAll which will fail if there is no mount point.
And I have not tested newest code against repeated test yet.
I need to do that before accept point of that code, I will do that locally soon, my guess is it now will fail at RemoveAll().
Partial reason why this is left unnoticed is that similar code on pmem-csi side does not check for return code of os.RemoveAll, i.e. tolerates failure silently, with end result being still OK.
The failure to remove non-existing directory is not really bad thing in that case.
But more clear is to check for return code and have code to handle this, commented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, what about another "code saving" approach: we do not check for return code and add comment why we do not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think correct way is to check. One of
- check first does path exist, if yes then RemoveAll, and also check for return code
- try RemoveAll and check for return code, if "no such path" then OK, otherwise error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter. Testing before some operation and then doing the operation is always a bit fishy. There have been security exploits because of this (https://hackernoon.com/time-of-check-to-time-of-use-toctou-a-race-condition-99c2311bd9fc).
This shouldn't be a problem here, but it's still better to follow best practices. It's also cheaper (= less syscalls), although again that doesn't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried current code and it works without error, i.e. os.RemoveAll does not return error on non-existing path
This is expected and documented on https://golang.org/pkg/os/ :
If the path does not exist, RemoveAll returns nil (no error)
Means, current code is good as is, needs comment stating above to make it even more clear (I will add a comment and push)
961ec99
to
7bd758c
Compare
NodeVolumeUnpublish handling has to be idempotent, i.e. has to tolerate repeated requests. That means, we can not attempt umount and return failure without checking is targetPath really a mount point. Re-arranged "not mount point" checking to be simpler. Removed switch about VolAccessTypes as targetPath has to be removed in both cases by CSI spec.
7bd758c
to
2d18f8a
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @msau42 For approval. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msau42, okartau The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a1e1127 Merge pull request kubernetes-csi#139 from pohly/kind-for-kubernetes-latest 1c0fb09 prow.sh: use KinD main for latest Kubernetes 1d77cfc Merge pull request kubernetes-csi#138 from pohly/kind-update-0.10 bff2fb7 prow.sh: KinD 0.10.0 git-subtree-dir: release-tools git-subtree-split: a1e1127
…latest prow.sh: use KinD main for latest Kubernetes
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
NodeVolumeUnpublish handling has to be idempotent,
tolerating repeated requests.
That means, we can not attempt umount and return failure
without checking is that really mount point.
Which issue(s) this PR fixes:
Fixes #105
Special notes for your reviewer:
Consider this as WIP: I intentionally replicated mount check code initially
in two case: blocks, and that's not meant to be final.
I made it like that to avoid semantic change to current switch block, which
already replicates code for two cases.
This switch does not have default (should it have?), i.e. we assume that
VolAccessType
has to be one of two handled by case: blocks?Does that mean we can bring mount check code to be before switch block?
But for that to be correct, we should probably consider adding default stmt in switch.
Note that two case: parts already have most of code identical
(we can use
targetPath
instead ofreq.GetTargetPath()
in 2nd case)Does this PR introduce a user-facing change?: