Pod creation fails after restarting nydus snapshotter daemon pod #631

gane5hvarma · 2025-01-24T09:50:58Z

The pod is stuck in container creating state with the below error

Failed to create pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: create snapshot: missing parent "k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0" bucket: not found

This error is coming after i deliberately restart the nydus snapshotter pod. These are logs from nydus snapshotter
Im using nydus v0.15.0 release

time="2025-01-24T09:46:19.814709359Z" level=info msg="[Prepare] snapshot with key k8s.io/367/extract-812295082-7xm9 sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:46:19.833155487Z" level=info msg="[Prepare] snapshot with key k8s.io/368/a03cf4b0c0bb92eb32e7afe6fdda203b68b3677ae31bcc953aace2bb4c7b7bab parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:19.913519730Z" level=info msg="[Remove] snapshot with key k8s.io/367/extract-812295082-7xm9 sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 116"
time="2025-01-24T09:46:19.915767307Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/116]"
time="2025-01-24T09:46:31.811537164Z" level=info msg="[Prepare] snapshot with key k8s.io/369/80115bbf0c0d8571eb093c20ac331d573ce1b03f9b106b368a39db8b6f8b2da5 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:31.814576340Z" level=info msg="[Prepare] snapshot with key k8s.io/370/extract-812400584-OrEf sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:46:31.834165380Z" level=info msg="[Prepare] snapshot with key k8s.io/371/80115bbf0c0d8571eb093c20ac331d573ce1b03f9b106b368a39db8b6f8b2da5 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:31.947057054Z" level=info msg="[Remove] snapshot with key k8s.io/370/extract-812400584-OrEf sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 117"
time="2025-01-24T09:46:31.949322111Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/117]"
time="2025-01-24T09:46:42.812048502Z" level=info msg="[Prepare] snapshot with key k8s.io/372/fc475334f78c031ea6c3337982cef79a0df22d5bd59f93ba38269d291c705fc4 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:42.815114178Z" level=info msg="[Prepare] snapshot with key k8s.io/373/extract-812948782-FoY_ sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:46:42.836281065Z" level=info msg="[Prepare] snapshot with key k8s.io/374/fc475334f78c031ea6c3337982cef79a0df22d5bd59f93ba38269d291c705fc4 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:42.996104550Z" level=info msg="[Remove] snapshot with key k8s.io/373/extract-812948782-FoY_ sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 118"
time="2025-01-24T09:46:42.998418157Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/118]"
time="2025-01-24T09:46:56.812157877Z" level=info msg="[Prepare] snapshot with key k8s.io/375/42315085e56ed5e4a4ed777945632f2ef47b970342d20f5243e70667505156af parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:56.815147374Z" level=info msg="[Prepare] snapshot with key k8s.io/376/extract-813035448-oyBS sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:46:56.834437188Z" level=info msg="[Prepare] snapshot with key k8s.io/377/42315085e56ed5e4a4ed777945632f2ef47b970342d20f5243e70667505156af parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:46:56.993281261Z" level=info msg="[Remove] snapshot with key k8s.io/376/extract-813035448-oyBS sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 119"
time="2025-01-24T09:46:56.995527648Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/119]"
time="2025-01-24T09:47:08.811163529Z" level=info msg="[Prepare] snapshot with key k8s.io/378/1cbdb22e5abf0fa0ca171f6e0850b625539f2e918066e26c64ad87e630f21de3 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:08.814113414Z" level=info msg="[Prepare] snapshot with key k8s.io/379/extract-812123610-tvWP sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:47:08.834408312Z" level=info msg="[Prepare] snapshot with key k8s.io/380/1cbdb22e5abf0fa0ca171f6e0850b625539f2e918066e26c64ad87e630f21de3 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:09.028167475Z" level=info msg="[Remove] snapshot with key k8s.io/379/extract-812123610-tvWP sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 120"
time="2025-01-24T09:47:09.030426691Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/120]"
time="2025-01-24T09:47:23.811284515Z" level=info msg="[Prepare] snapshot with key k8s.io/381/d466bf9dde970e69fd7b1640b09b7d15b06ee9b05a4f73ea46342c1d52f331c5 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:23.814345420Z" level=info msg="[Prepare] snapshot with key k8s.io/382/extract-812193386-LCfj sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:47:23.834230295Z" level=info msg="[Prepare] snapshot with key k8s.io/383/d466bf9dde970e69fd7b1640b09b7d15b06ee9b05a4f73ea46342c1d52f331c5 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:24.008713931Z" level=info msg="[Remove] snapshot with key k8s.io/382/extract-812193386-LCfj sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 121"
time="2025-01-24T09:47:24.010941627Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/121]"
time="2025-01-24T09:47:34.811666352Z" level=info msg="[Prepare] snapshot with key k8s.io/384/0a50efece4645cc43aa7ae83c8ba09563d8f5356144036b8a74ed9076a489986 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:34.814905920Z" level=info msg="[Prepare] snapshot with key k8s.io/385/extract-812592774-vQCa sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:47:34.833589920Z" level=info msg="[Prepare] snapshot with key k8s.io/386/0a50efece4645cc43aa7ae83c8ba09563d8f5356144036b8a74ed9076a489986 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:35.058158214Z" level=info msg="[Remove] snapshot with key k8s.io/385/extract-812592774-vQCa sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 122"
time="2025-01-24T09:47:35.060440611Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/122]"
time="2025-01-24T09:47:46.812017732Z" level=info msg="[Prepare] snapshot with key k8s.io/387/cadc3756bfe39a4fd35830673330ee096ebfe17d6a5b356438d31f8341d70e3c parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:46.815092248Z" level=info msg="[Prepare] snapshot with key k8s.io/388/extract-812957363-Fg2- sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:47:46.834042520Z" level=info msg="[Prepare] snapshot with key k8s.io/389/cadc3756bfe39a4fd35830673330ee096ebfe17d6a5b356438d31f8341d70e3c parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:46.846268262Z" level=info msg="[Remove] snapshot with key k8s.io/388/extract-812957363-Fg2- sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 123"
time="2025-01-24T09:47:46.848442498Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/123]"
time="2025-01-24T09:47:58.812678298Z" level=info msg="[Prepare] snapshot with key k8s.io/390/aae7b5ecd511421ea403db487d82b7c21dda5c6c783ac33f23d409c8066f331e parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:58.816168598Z" level=info msg="[Prepare] snapshot with key k8s.io/391/extract-813632928-fkW6 sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:47:58.836446605Z" level=info msg="[Prepare] snapshot with key k8s.io/392/aae7b5ecd511421ea403db487d82b7c21dda5c6c783ac33f23d409c8066f331e parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:47:58.879471498Z" level=info msg="[Remove] snapshot with key k8s.io/391/extract-813632928-fkW6 sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 124"
time="2025-01-24T09:47:58.881755005Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/124]"
time="2025-01-24T09:48:13.811502209Z" level=info msg="[Prepare] snapshot with key k8s.io/393/67cc11979dc9d3dbb68bc940aa4420d7ef9b6c50a28012121fcfe1c2a4586682 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:13.814955669Z" level=info msg="[Prepare] snapshot with key k8s.io/394/extract-812388899-G2zt sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:48:13.833826090Z" level=info msg="[Prepare] snapshot with key k8s.io/395/67cc11979dc9d3dbb68bc940aa4420d7ef9b6c50a28012121fcfe1c2a4586682 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:13.856025089Z" level=info msg="[Remove] snapshot with key k8s.io/394/extract-812388899-G2zt sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 125"
time="2025-01-24T09:48:13.858698391Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/125]"
time="2025-01-24T09:48:27.811594830Z" level=info msg="[Prepare] snapshot with key k8s.io/396/c76a9b1b9c86fe10695e2c43e3de8de91c6851a7b37006ac2d9fb4a95830090b parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:27.814771437Z" level=info msg="[Prepare] snapshot with key k8s.io/397/extract-812469430-7t9g sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:48:27.833746538Z" level=info msg="[Prepare] snapshot with key k8s.io/398/c76a9b1b9c86fe10695e2c43e3de8de91c6851a7b37006ac2d9fb4a95830090b parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:27.854718143Z" level=info msg="[Remove] snapshot with key k8s.io/397/extract-812469430-7t9g sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 126"
time="2025-01-24T09:48:27.856937839Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/126]"
time="2025-01-24T09:48:39.812052012Z" level=info msg="[Prepare] snapshot with key k8s.io/399/4e9dbf5602006d8e03540ea99ca7b56bec7370dd2d4dd00640810aec6d756c04 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:39.815444291Z" level=info msg="[Prepare] snapshot with key k8s.io/400/extract-812986692-Wvwt sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:48:39.834483134Z" level=info msg="[Prepare] snapshot with key k8s.io/401/4e9dbf5602006d8e03540ea99ca7b56bec7370dd2d4dd00640810aec6d756c04 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:39.885569314Z" level=info msg="[Remove] snapshot with key k8s.io/400/extract-812986692-Wvwt sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 127"
time="2025-01-24T09:48:39.887816390Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/127]"
time="2025-01-24T09:48:53.811849260Z" level=info msg="[Prepare] snapshot with key k8s.io/402/a5475c3aede72cd5e79a0cf1bcdce2a62fa4ceb5858b353ecde55f47ec2a03c9 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:53.814902266Z" level=info msg="[Prepare] snapshot with key k8s.io/403/extract-812773851-3Esx sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:48:53.835231474Z" level=info msg="[Prepare] snapshot with key k8s.io/404/a5475c3aede72cd5e79a0cf1bcdce2a62fa4ceb5858b353ecde55f47ec2a03c9 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:48:53.884363741Z" level=info msg="[Remove] snapshot with key k8s.io/403/extract-812773851-3Esx sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 128"
time="2025-01-24T09:48:53.886705889Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/128]"
time="2025-01-24T09:49:06.812344184Z" level=info msg="[Prepare] snapshot with key k8s.io/405/57215e92e3d91d4b7e29f60cc35e6b4cb12d65f6b88ccdde121ec7f2b431c6ec parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:49:06.815462821Z" level=info msg="[Prepare] snapshot with key k8s.io/406/extract-813228994-EM6J sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:49:06.834732417Z" level=info msg="[Prepare] snapshot with key k8s.io/407/57215e92e3d91d4b7e29f60cc35e6b4cb12d65f6b88ccdde121ec7f2b431c6ec parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:49:06.904757299Z" level=info msg="[Remove] snapshot with key k8s.io/406/extract-813228994-EM6J sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 129"
time="2025-01-24T09:49:06.906941455Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/129]"
time="2025-01-24T09:49:21.811951119Z" level=info msg="[Prepare] snapshot with key k8s.io/408/8bd996230601778d539b3d749a12a68c0771eed625b96c19226e7aadb7e4d626 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:49:21.815206627Z" level=info msg="[Prepare] snapshot with key k8s.io/409/extract-812893860-jBYv sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 parent "
time="2025-01-24T09:49:21.834531674Z" level=info msg="[Prepare] snapshot with key k8s.io/410/8bd996230601778d539b3d749a12a68c0771eed625b96c19226e7aadb7e4d626 parent k8s.io/2/sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0"
time="2025-01-24T09:49:21.886791937Z" level=info msg="[Remove] snapshot with key k8s.io/409/extract-812893860-jBYv sha256:59b1469b8fbd05fd256959ad9d7d776b9937b848d75113a0d7c1af442528b6d0 snapshot id 130"
time="2025-01-24T09:49:21.889009743Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd/io.containerd.snapshotter.v1.nydus/snapshots/130]"

The text was updated successfully, but these errors were encountered:

gane5hvarma · 2025-01-24T12:57:13Z

Im using al2023 ami and it uses local pause image as sandbox -(localhost/kubernetes/pause)

imeoer · 2025-01-26T02:07:56Z

Is this stably repeatable? Did it happen after the snapshotter was restarted when the containerd snapshot request did not complete.

gane5hvarma · 2025-01-30T12:18:43Z

@imeoer yes. This is my setup. Using eks and worker nodes with ami/os amazon linux 2023, where the pause image is local

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "localhost/kubernetes/pause"

Installed nydus-snapshotter using this yaml. nydus snapshotter pod runs and i deployed my custom nginx image(nydus image) it also runs.
Now i deliberately kill the nydus image to simulate restarts which can happen via version update or change in pod labels.
The nydus snapshotter pod gets stuck in terminating stage. Upon describing the pod i found these events

 error killing pod: [failed to "KillContainer" for "nydus-snapshotter" with KillContainerError: "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: no such file or directory\"", failed to "KillPodSandbox" for "e9f13d4e-07d8-45d3-9e31-5068d37c9fcc" with KillPodSandboxError: "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: no such file or directory\""]
  Normal   Killing            18m (x8 over 19m)  kubelet            Stopping container nydus-snapshotter
  Warning  FailedKillPod      13m                kubelet            error killing pod: [failed to "KillContainer" for "nydus-snapshotter" with KillContainerError: "rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\" to be killed: wait container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\": context deadline exceeded", failed to "KillPodSandbox" for "e9f13d4e-07d8-45d3-9e31-5068d37c9fcc" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = failed to stop container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\": an error occurs during waiting for container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\" to be killed: wait container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\": context deadline exceeded"]
  Warning  FailedKillPod      9m3s               kubelet            error killing pod: [failed to "KillContainer" for "nydus-snapshotter" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "e9f13d4e-07d8-45d3-9e31-5068d37c9fcc" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
  Warning  FailedKillPod      4m18s              kubelet            error killing pod: [failed to "KillContainer" for "nydus-snapshotter" with KillContainerError: "rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\" to be killed: wait container \"9bf467f00e98faf0b0d3ddadccffe9a3055d9e67dbe552d2a003c688d73b956f\": context deadline exceeded", failed to "KillPodSandbox" for "e9f13d4e-07d8-45d3-9e31-5068d37c9fcc" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]

I force delete the nydus snapshotter pod and now the pod is stuck in container creating status with this event

Normal   Scheduled               12s   default-scheduler  Successfully assigned nydus-snapshotter/nydus-snapshotter-rpdtg to ip-172-31-93-33.ec2.internal
  Warning  FailedCreatePodSandBox  11s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "localhost/kubernetes/pause": failed to pull image "localhost/kubernetes/pause": failed to pull and unpack image "localhost/kubernetes/pause:latest": failed to resolve reference "localhost/kubernetes/pause:latest": failed to do request: Head "https://localhost/v2/kubernetes/pause/manifests/latest": dial tcp 127.0.0.1:443: connect: connection refused

Upon some debugging i found out this function is removing sandbox/pause image when nydus snapshotter pod gets deleted.
May i know the reason why we are removing images associated with nydus during cleanup function

csegarragonz · 2025-02-03T11:51:50Z

Hi @imeoer , I may be running into the same issue than @gane5hvarma, and I may be able to add a bit more detail.

My set-up is a bit different, I am running the nydus-snapshotter as a system daemon in proxy mode. After I have succesfully run once a given pod with an image, if I then restart the nydus snapshotter service, I can see the following logs:

level=info msg="Start nydus-snapshotter. Version: v0.15.0-2-gfbf6bb5.m, PID: 3957824, FsDriver: proxy, DaemonMode: multiple"
level=info msg="Run daemons monitor..."
level=debug msg="found RAFS instance &rafs.Rafs{Seq:0x1, ImageID:\"registry.k8s.io/pause:3.8\", DaemonID:\"\", FsDriver:\"proxy\", SnapshotID:\"1\", SnapshotDir:\"/var/lib/containerd-nydus/snapshots/1\", Mountpoint:\"/var/lib/containerd-nydus/snapshots/1/fs\", Annotations:map[string]string{\"containerd.io/snapshot/cri.layer-digest\":\"sha256:9457426d68990df190301d2e20b8450c4f67d7559bdb7ded6c40d41ced6731f7\", \"containerd.io/snapshot/nydus-proxy-mode\":\"true\"}}"
level=debug msg="found RAFS instance &rafs.Rafs{Seq:0x2, ImageID:\"sc2cr.io/applications/helloworld-py:unencrypted\", DaemonID:\"\", FsDriver:\"proxy\", SnapshotID:\"9\", SnapshotDir:\"/var/lib/containerd-nydus/snapshots/9\", Mountpoint:\"/var/lib/containerd-nydus/snapshots/9/fs\", Annotations:map[string]string{\"containerd.io/snapshot/cri.layer-digest\":\"sha256:0cdd502a029d8cec8a14360704355f186ed184edfd030fa6aa92c35b0acb1973\", \"containerd.io/snapshot/nydus-proxy-mode\":\"true\"}}"
level=info msg="[Prepare] snapshot with key k8s.io/69/35b8a538aa3dfd7efb5007cccc07783d8c9b681d678b3d43460ce8ee9e36fbc4 parent sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816"
level=debug msg="[Prepare] snapshot with labels map[]" key=k8s.io/69/35b8a538aa3dfd7efb5007cccc07783d8c9b681d678b3d43460ce8ee9e36fbc4 parent="sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816"
level=debug msg="isProxyDriver = true, isProxyLabel = true, isProxyImage = true"
level=info msg="Prepare active snapshot k8s.io/69/35b8a538aa3dfd7efb5007cccc07783d8c9b681d678b3d43460ce8ee9e36fbc4 in proxy mode" key=k8s.io/69/35b8a538aa3dfd7efb5007cccc07783d8c9b681d678b3d43460ce8ee9e36fbc4 parent="sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816"
level=debug msg="Prepare remote snapshot 1" key=k8s.io/69/35b8a538aa3dfd7efb5007cccc07783d8c9b681d678b3d43460ce8ee9e36fbc4 parent="sha256:961e93cda9dd918dbe26aca24cccd6c5db05176850d2c53476d881df5d0d4816"
level=info msg="WARN: creating rafs instance: snapid 1 - imageid registry.k8s.io/pause:3.8 - fsdriver proxy"
level=info msg="WARN: adding rafs instance with id: 1 (instance: &{4 registry.k8s.io/pause:3.8  proxy 1 /var/lib/containerd-nydus/snapshots/1 /var/lib/containerd-nydus/snapshots/1/fs map[containerd.io/snapshot/cri.layer-digest:sha256:9457426d68990df190301d2e20b8450c4f67d7559bdb7ded6c40d41ced6731f7 containerd.io/snapshot/nydus-proxy-mode:true]})"

I have added a WARN log where I think the error is happening. In summary, upon restart the snapshotter identifies that the two RAFS instances corresponding to the previous execution exist, but it does not add them to any daemon (I guess no daeom running in proxy mode?): https://github.com/containerd/nydus-snapshotter/blob/main/pkg/manager/manager.go#L161-L173

However, when I try to create the pod again, nydus will try to put an instance with id 1 in the database:
https://github.com/containerd/nydus-snapshotter/blob/main/pkg/store/database.go#L282-L288

This will trigger an error because instance 1 already exists. In containerd logs the only thing we can see is:

level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:coco-helloworld-py-8b86b5c58-db9pq,Uid:5c0ca2d9-99f7-4eed-815a-48bfd77e17a3,Namespace:default,Att
empt:0,} failed, error" error="failed to create containerd container: create instance 1: object with key \"1\" already exists: unknown"

For completeness, the same exact set-up works well when using the snapshotter in blockdev mode. The recovery mechanism in the daemon manager linked above does recover the RAFS instances, so no issue there.

How can we address this issue? Many thanks!

csegarragonz mentioned this issue Feb 3, 2025

nydus: support host-sharing sc2-sys/deploy#131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod creation fails after restarting nydus snapshotter daemon pod #631

Pod creation fails after restarting nydus snapshotter daemon pod #631

gane5hvarma commented Jan 24, 2025

gane5hvarma commented Jan 24, 2025

imeoer commented Jan 26, 2025

gane5hvarma commented Jan 30, 2025 •

edited

Loading

csegarragonz commented Feb 3, 2025 •

edited

Loading

Pod creation fails after restarting nydus snapshotter daemon pod #631

Pod creation fails after restarting nydus snapshotter daemon pod #631

Comments

gane5hvarma commented Jan 24, 2025

gane5hvarma commented Jan 24, 2025

imeoer commented Jan 26, 2025

gane5hvarma commented Jan 30, 2025 • edited Loading

csegarragonz commented Feb 3, 2025 • edited Loading

gane5hvarma commented Jan 30, 2025 •

edited

Loading

csegarragonz commented Feb 3, 2025 •

edited

Loading