-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: Check connection only when image isn't fully cached #1584
Conversation
When the layer is fully cached on the node, registry connection won't happen so we can skip the checking. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
9067701
to
86b107c
Compare
is any release planned that would include the changes in this PR? |
Hello @ktock, Will you please confirm that this fix also applies for these 2 scenarios when Once this fix is released, will the new pods with stargz images already on the nodes (pulled before Scenario 1 > EKS > Stargz image is pulled from private registry with creds that do not expire
Note that
systemctl restart stargz-snapshotter
Job for stargz-snapshotter.service failed because the control process exited with error code. See "systemctl status stargz-snapshotter.service" and "journalctl -xe" for details.
journalctl -u stargz-snapshotter -l
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"level":"debug","msg":"Waiting for CRI service is started...","time":"2024-04-23T23:56:15.194866142Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"level":"info","msg":"connected to backend CRI service","time":"2024-04-23T23:56:15.195422179Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/containerd-stargz-grpc/snapshotter/snapshots/119/fs","time":"2024-04-23T23:56:15.196665212Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"level":"debug","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/119/fs","msg":"resolving","src":"private-regsitry/estargz:1/sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a","time":"2024-04-23T23:56:15.196812462Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"digest":"sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a","error":null,"level":"debug","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/119/fs","msg":"using default handler","ref":"private-regsitry/estargz:1","src":"private-regsitry/estargz:1/sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a","time":"2024-04-23T23:56:15.196899102Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"level":"info","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/119/fs","msg":"Received status code: 401 Unauthorized. Refreshing creds...","src":"private-regsitry/estargz:1/sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a","time":"2024-04-23T23:56:15.349907820Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"error":"failed to resolve layer \"sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a\" from \"private-regsitry/estargz:1\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"gcr.io\", ref:\"private-regsitry/estargz:1\", digest:\"sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a\"): failed to request: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Aprivate-regsitry%2Festargz%3Apull\u0026scope=repository%3Aprivate-regsitry%2Fgcr.io%2Festargz%3Apull\u0026service=gcr.io: 403 Forbidden: failed to resolve: failed to resolve target","level":"debug","mountpoint":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/119/fs","msg":"failed to resolve layer","time":"2024-04-23T23:56:15.650207312Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal containerd-stargz-grpc[14958]: {"error":"failed to restore remote snapshot: failed to prepare remote snapshot: sha256:19ac957b239fbbf329b0c303a6d3ab6425d96d1556475eb8c12093670f81366a: failed to resolve layer: failed to resolve layer \"sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a\" from \"private-regsitry/estargz:1\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"gcr.io\", ref:\"private-regsitry/estargz:1\", digest:\"sha256:1b82fbeab8a04e8548e0708cdbf2ddc35edd3a5aff4ab77a161765d8935bca5a\"): failed to request: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Aprivate-regsitry%2Festargz%3Apull\u0026scope=repository%3Aprivate-regsitry%2Fgcr.io%2Festargz%3Apull\u0026service=gcr.io: 403 Forbidden: failed to resolve: failed to resolve target","level":"fatal","msg":"failed to create new snapshotter","time":"2024-04-23T23:56:15.650305520Z"}
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal systemd[1]: stargz-snapshotter.service: main process exited, code=exited, status=1/FAILURE
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal systemd[1]: Failed to start stargz snapshotter.
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal systemd[1]: Unit stargz-snapshotter.service entered failed state.
Apr 23 23:56:15 ip-10-0-2-130.ec2.internal systemd[1]: stargz-snapshotter.service failed.
Apr 23 23:56:16 ip-10-0-2-130.ec2.internal systemd[1]: stargz-snapshotter.service holdoff time over, scheduling restart.
Apr 23 23:56:16 ip-10-0-2-130.ec2.internal systemd[1]: Stopped stargz snapshotter.
Apr 23 23:56:16 ip-10-0-2-130.ec2.internal systemd[1]: start request repeated too quickly for stargz-snapshotter.service
Apr 23 23:56:16 ip-10-0-2-130.ec2.internal systemd[1]: Failed to start stargz snapshotter.
Apr 23 23:56:16 ip-10-0-2-130.ec2.internal systemd[1]: Unit stargz-snapshotter.service entered failed state.
Apr 23 23:56:16 ip-10-0-2-130.ec2.internal systemd[1]: stargz-snapshotter.service failed. Note that Scenario 2 > k3d > Stargz image is pulled from private registry with creds that do not expire
Note that
Note that k3d cluster would not start. |
@ElenaHenderson Thanks for sharing the logs.
This patch won't solve the restarting failure because stargz-snapshotter doesn't persist filesystem data cache over restarting. And CRI-based authentication mode doesn't persist registry creds over restarting as well. Alternatively, there are 3 possible ways to solve the issue.
Or, if we want ways to persist filesystem/creds data, we'll need additional patches. |
@ktock Thank you for your prompt response and solutions. Solution A (kubeconfig-based-authentication) is working for us. I did run into an issue with Solution B and C would not work for us because we need to have valid images on reboots and can't really remove images from the nodes. |
Fixes: #1583
When the layer is fully cached on the node, registry connection won't happen so we can skip the checking.