Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't restart stargz daemon #703

Closed
gabrieldemarmiesse opened this issue Mar 24, 2022 · 3 comments
Closed

Can't restart stargz daemon #703

gabrieldemarmiesse opened this issue Mar 24, 2022 · 3 comments

Comments

@gabrieldemarmiesse
Copy link

gabrieldemarmiesse commented Mar 24, 2022

I'm using ECR and nerdctl, and credentials for ecr expire after 12 hours, so that might impact stargz after a reboot.

containerd-stargz-grpc
{"level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","time":"2022-03-24T15:13:18.203314400+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:02fd2ee80f3b56e7abcb1182d5f974c2b172dc50922afb562d1d02623b977dbc","time":"2022-03-24T15:13:18.339802600+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:9d70dc121cb417e8290fb15560f121dc4516d6d6e9d0fb39c79ec2b07b9fa75e","time":"2022-03-24T15:13:18.341141700+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:cae7954bf529dec178c799017eb9c99dcb567edfa50f61258dd5dd3eb659c517","time":"2022-03-24T15:13:18.342419900+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:3e2959d4f10c568bcc42c73829bc9a258dc83ce44f05afc6a206e622464c8e28","time":"2022-03-24T15:13:18.343577600+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:f227544ade16162b560147b4edfc040c09e6a94a71127ed13feb9f8f11b10e5b","time":"2022-03-24T15:13:18.343604700+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:a9549a4bc22afd8e24a0ab1297b88401cd4986ff9e550274e806fcf19d0684b1","time":"2022-03-24T15:13:18.343610800+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:257bb287239281363e2163f5ac689aa3cec210718d9fd75aa2a23a0b5031196a","time":"2022-03-24T15:13:18.344929100+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:1e0ef5dd7e9a089d361e79b648ccac5f71961f45f67712b6d2c95e01997597c4","time":"2022-03-24T15:13:18.344961400+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:f11dcff0b022bc7ee1ae9f4d3c936671cbd27b3c698641bc8bf596c8ad1b3d54","time":"2022-03-24T15:13:18.345012600+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:d3c6665a16bd2e3cbc917bb6e08639518f7c445e7d067003c3e6d6b8ea42b989","time":"2022-03-24T15:13:18.344938500+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:fbaf50b7d1e6265d4c2e4245ae8d7602aad4d8b5e98563465bc811e646dab314","time":"2022-03-24T15:13:18.344963100+01:00"}
{"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"766281746212.dkr.ecr.eu-west-1.amazonaws.com/skynet-common:py3.7_cuda10.0-runtime_slim_data.infra-5.1.1-stargz/sha256:528161e3ae0ec4ea9bfff1296dedc3bfd248661bb51e5fb9d738813a4b2743fd","time":"2022-03-24T15:13:18.344951100+01:00"}
{"layer_sha":"sha256:f11dcff0b022bc7ee1ae9f4d3c936671cbd27b3c698641bc8bf596c8ad1b3d54","level":"info","metrics":"latency","msg":"value=0.0075 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.755690000+01:00"}
{"layer_sha":"sha256:02fd2ee80f3b56e7abcb1182d5f974c2b172dc50922afb562d1d02623b977dbc","level":"info","metrics":"latency","msg":"value=0.0058 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.756755700+01:00"}
{"layer_sha":"sha256:9d70dc121cb417e8290fb15560f121dc4516d6d6e9d0fb39c79ec2b07b9fa75e","level":"info","metrics":"latency","msg":"value=0.0032 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.758321200+01:00"}
{"layer_sha":"sha256:a9549a4bc22afd8e24a0ab1297b88401cd4986ff9e550274e806fcf19d0684b1","level":"info","metrics":"latency","msg":"value=0.0045 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.769646700+01:00"}
{"layer_sha":"sha256:cae7954bf529dec178c799017eb9c99dcb567edfa50f61258dd5dd3eb659c517","level":"info","metrics":"latency","msg":"value=0.0058 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.779322900+01:00"}
{"layer_sha":"sha256:f227544ade16162b560147b4edfc040c09e6a94a71127ed13feb9f8f11b10e5b","level":"info","metrics":"latency","msg":"value=0.0035 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.779387100+01:00"}
{"layer_sha":"sha256:fbaf50b7d1e6265d4c2e4245ae8d7602aad4d8b5e98563465bc811e646dab314","level":"info","metrics":"latency","msg":"value=0.036 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.792979800+01:00"}
{"layer_sha":"sha256:528161e3ae0ec4ea9bfff1296dedc3bfd248661bb51e5fb9d738813a4b2743fd","level":"info","metrics":"latency","msg":"value=0.0471 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:18.952146200+01:00"}
{"layer_sha":"sha256:3e2959d4f10c568bcc42c73829bc9a258dc83ce44f05afc6a206e622464c8e28","level":"info","metrics":"latency","msg":"value=0.0211 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:19.159834500+01:00"}
{"layer_sha":"sha256:d3c6665a16bd2e3cbc917bb6e08639518f7c445e7d067003c3e6d6b8ea42b989","level":"info","metrics":"latency","msg":"value=0.011 milliseconds; prefetch_size=0 bytes","operation":"prefetch_total","time":"2022-03-24T15:13:19.175182400+01:00"}
/usr/bin/fusermount: bad mount point /var/lib/containerd-stargz-grpc/snapshotter/snapshots/100/fs: No such file or directory
{"error":"failed to restore remote snapshot: failed to prepare remote snapshot: sha256:0277586186ee6a928839ff6de1ffb16e55dda71788be12a799d9c9ff118544a8: fusermount exited with code 256\n","level":"fatal","msg":"failed to create new snapshotter","time":"2022-03-24T15:13:19.189670300+01:00"}
# containerd-stargz-grpc --version
containerd-stargz-grpc v0.11.1 2f3aa34ecb5555db37074ad22c47ef2d456de210

EDIT: It might have something to do with #314 but I'm not sure.

@ktock
Copy link
Member

ktock commented Mar 25, 2022

@gabrieldemarmiesse Do you use SIGINT to stop snapshotter? If so, please try SIGTERM.
And, as of now, you need to re-run containers too.

If you use SIGINT, it performs graceful cleanup which unmounts and deletes all snapshots under /var/lib/containerd-stargz-grpc.
This is useful to make sure the node is fully cleaned up but, as of now, restoring snapshots isn't supported for SIGINT.
If you use other signal (e.g. SIGTERM) to stop the snapshotter, it doesn't perform this cleanup and restore will hopefully work.

@ktock
Copy link
Member

ktock commented Apr 4, 2022

@gabrieldemarmiesse Is this still an issue? Cloud you try #703 (comment) ?

@gabrieldemarmiesse
Copy link
Author

Oups, sorry, I totally forgot to answer. The bug was triggerred because I rebooted my pc. I can confirm that with ctrl-c (sigint), the cleanup is done and there is no issue. We can close this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants