-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI volume reconstruction does not work for ephemeral volumes #79980
Comments
/sig storage related to #79896 |
Maybe a stupid idea... Reconciler sync has found a directory in
kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go Lines 398 to 406 in a292437
kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go Lines 412 to 415 in a292437
In both cases, we can expect that the volume plugin / CSI driver is idempotent and SetUp() won't do anything if the volume is already set up / TearDown() will not do anything if the volume has been torn down already. Are our volume plugins really idempotent? IMO they are. We perhaps don't need to check for presence of mount point anywhere, just presence of directory Adding @gnufied to the loop. |
Filed #79980, PTAL |
I'm curious how the reconstruction e2e tests passed |
They start kubelet with a pod already deleted. If the pod still exists, volume reconstruction can see it in DSW and does nothing when kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go Lines 385 to 391 in 978c38d
[Reconstruction fails because it checks for a mount in a wrong directory.] In this case, kubelet waits for the volume to appear in VolumesInUse before calling |
They pass because their container does not handle SIGTERM and it takes 30 seconds to kill them. Kubelet has enough time to |
It turns out that I tested with broken version of #80743 and reconstruction is broken only for ephemeral volumes that don't mount. |
So the conclusion is that we only need to fix for ephemeral volumes? Thanks!
…On Tue, Jul 30, 2019 at 3:27 AM Jan Šafránek ***@***.***> wrote:
It turns out that I tested with broken version of #80743
<#80743> and reconstruction
is broken only for ephemeral volumes that don't mount.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#79980?email_source=notifications&email_token=AESI3ROIV2SK6SNZWU4EFFTQCAJSBA5CNFSM4H7OA642YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DQYAQ#issuecomment-516361218>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AESI3RIJYBLS5F3WOVSEBE3QCAJSBANCNFSM4H7OA64Q>
.
--
- Jing
|
Yes, sorry again about the noise. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
/lifecycle frozen |
From #103651: this issue surfaced through e2e tests using the hostPath CSI driver. The cause is |
To work around this bug here, volume life cycle checking was disabled for all tests, not just the subpath test: This check is useful and should be enabled again. |
Hey everyone! I arrived at this conversation by investigating on issue #105242 (seems to me like a duplicate of #103651). I'm wondering if there's something I can help with? I'm still a bit new to the codebase, though. I was able to reproduce the problem by using the following ginkgo focus command in the e2e framework:
And do see that if we manually add the --check-volume-lifecycle=false to the statefulset/csi-hostpathplugin tests will pass correctly.
I also see that during the process, something in the cleanup function doesn't correctly delete the pv's after the tests are done, so they remain alive if we execute a kubectl get pv command. |
I think I found a bit of more info on the issue @jsafrane, probably could be of some help. I think that the volume is not being correclty marked as unmounted inside operation_generator.go in GenerateUnmountVolumeFunc. I'm not sure why but I'll continue to investigate on it.
I think this error is coming from DeletePodFromVolume in actual_state_of_the_world.go |
I added a couple of extra debug lines to that lines in the actual_state_of_the_world.go file, and can see that when DeletePodFromVolume tries to see if the volume exists, it doesn't find it in the asw.attachedVolumes struct, like so:
Probably we should search for the name of the attached volume differently for ephemeral volumes? |
/assign |
These tests were previously disabled to work around kubernetes#61446 and kubernetes#79980 kubernetes@f1e1f3a
There are two problems:
The call path looks like this (starting in reconcile.go):
Inline volumes do not have a PV spec, so it fails here. We should be calling |
There is an ongoing work that changes |
These tests were previously disabled to work around kubernetes#79980 kubernetes@f1e1f3a
These tests were previously disabled to work around kubernetes#79980 kubernetes@f1e1f3a
These tests were previously disabled to work around kubernetes#79980 kubernetes@f1e1f3a
These tests were previously disabled to work around kubernetes#79980 kubernetes@f1e1f3a
When a pod is marked as deleted while kubelet is down / being restarted, newly started kubelet does not clean up CSI filesystem volumes of the pod.
Newly started kubelet tries to reconstruct the volume using CSI's
ConstructVolumeSpec
function. This part looks working, CSI volume plugin loads its json file.But then VolumeManager checks if the volume is still mounted in
/var/lib/kubelet/pods/9440e7e5-d454-4555-84b7-d72e43ec4b3a/volumes/kubernetes.io~csi/pvc-45640a32-4ba3-4a7d-ad4b-087281f1460d/mount
directory.There are two issues:
CSI does not require volumes to be presented as mounts. They can be just directories with files on them. This will be case of the most of in-line volumes.
Even if the CSI driver used mount, kubelet mounts it into/var/lib/kubelet/pods/9440e7e5-d454-4555-84b7-d72e43ec4b3a/volumes/kubernetes.io~csi/pvc-45640a32-4ba3-4a7d-ad4b-087281f1460d/mount
. Checking of/var/lib/kubelet/pods/9440e7e5-d454-4555-84b7-d72e43ec4b3a/volumes/kubernetes.io~csi/pvc-45640a32-4ba3-4a7d-ad4b-087281f1460d
does not make sense.Kubelet checks the right directory given by
GetPath()
The text was updated successfully, but these errors were encountered: