-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod creation fails after restarting nydus snapshotter daemon pod #631
Comments
Im using al2023 ami and it uses local pause image as sandbox -(localhost/kubernetes/pause) |
Is this stably repeatable? Did it happen after the snapshotter was restarted when the containerd snapshot request did not complete. |
@imeoer yes. This is my setup. Using eks and worker nodes with ami/os amazon linux 2023, where the pause image is local
Installed nydus-snapshotter using this yaml. nydus snapshotter pod runs and i deployed my custom nginx image(nydus image) it also runs.
I force delete the nydus snapshotter pod and now the pod is stuck in container creating status with this event
Upon some debugging i found out this function is removing sandbox/pause image when nydus snapshotter pod gets deleted. |
Hi @imeoer , I may be running into the same issue than @gane5hvarma, and I may be able to add a bit more detail. My set-up is a bit different, I am running the nydus-snapshotter as a system daemon in proxy mode. After I have succesfully run once a given pod with an image, if I then restart the nydus snapshotter service, I can see the following logs:
I have added a WARN log where I think the error is happening. In summary, upon restart the snapshotter identifies that the two RAFS instances corresponding to the previous execution exist, but it does not add them to any daemon (I guess no daeom running in proxy mode?): https://github.com/containerd/nydus-snapshotter/blob/main/pkg/manager/manager.go#L161-L173 However, when I try to create the pod again, nydus will try to put an instance with id 1 in the database: This will trigger an error because instance 1 already exists. In containerd logs the only thing we can see is:
For completeness, the same exact set-up works well when using the snapshotter in How can we address this issue? Many thanks! |
The pod is stuck in container creating state with the below error
This error is coming after i deliberately restart the nydus snapshotter pod. These are logs from nydus snapshotter
Im using nydus v0.15.0 release
The text was updated successfully, but these errors were encountered: