Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Layer not found error message #931

Closed
ollypom opened this issue Nov 10, 2023 · 2 comments · Fixed by #1025
Closed

[Bug] Layer not found error message #931

ollypom opened this issue Nov 10, 2023 · 2 comments · Fixed by #1025
Labels
bug Something isn't working kubernetes
Milestone

Comments

@ollypom
Copy link

ollypom commented Nov 10, 2023

Description

When attempting to run a container with a layer that does not contain a ztoc through the CRI API I receive an usual "not found" error message.

I will caveat this by saying the container still runs, but I’m unsure the repercussions of this message as it does not appear when running the same image through the containerd grpc api with nerdctl.

Error Message:

{
    "error": "cannot unpack the layer: cannot fetch layer: size of descriptor is 0; unable to resolve: unable to resolve ref (111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo@sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2): 111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo@sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2: not found",
    "key": "k8s.io/6/extract-389109763-ZI5T sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "warning",
    "msg": "failed to prepare snapshot; deferring to container runtime",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-09T21:07:03.452728891Z"
}

Steps to reproduce the bug

The container image I have been using is a self built nginx container image, but the error message is not image dependent (as it appears on any non indexed image too, like the pause container image).

Self Built Container Image:

Dockerfile

FROM public.ecr.aws/amazonlinux/amazonlinux:2

RUN amazon-linux-extras enable epel && \
    yum install -y epel-release && \
    yum install -y nginx

COPY index.html /usr/share/nginx/html/index.html

CMD ["nginx", "-g", "daemon off;"]

Index.html

<!DOCTYPE html>
<html>

<head>
    <title>Welcome to the SOCI Demo!</title>
    <style>
        html {
            color-scheme: light dark;
        }

        body {
            width: 35em;
            margin: 0 auto;
            font-family: Tahoma, Verdana, Arial, sans-serif;
        }
    </style>
</head>

<body>
    <h1>Welcome to SOCI Demo!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
        working. Further configuration is required.</p>

    <p>For online documentation and support please refer to
        <a href="http://nginx.org/">nginx.org</a>.<br />
        Commercial support is available at
        <a href="http://nginx.com/">nginx.com</a>.
    </p>

    <p><em>Thank you for using nginx.</em></p>
</body>

</html>

Container Image Manifest

111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo@sha256:1f6b58ad873037ea19d0b2e717cc1f60023963a7ea730a0686ec922545b47581

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:9b3faf804fbe89052a4e489791e86d0e785827fd2a59e3690caaac488da848e3",
    "size": 1330
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:e6a2a106da1df9b5d4bfa74686415ac760f9d9999604e3784820d596e0983e27",
      "size": 62492148
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:0a0992534263fcf67d4b5ead3e49995e47e241995fec1051148292fe0ebd4c12",
      "size": 188770887
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "digest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
      "size": 629
    }
  ]
}

SOCI Index

111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo@sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.amazon.soci.index.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2
  },
  "layers": [
    {
      "mediaType": "application/octet-stream",
      "digest": "sha256:04881d8ecc7c5d8cb0c01d391898caec9afe0d20da91a61e5c7ea8204455bf5b",
      "size": 3106904,
      "annotations": {
        "com.amazon.soci.image-layer-digest": "sha256:e6a2a106da1df9b5d4bfa74686415ac760f9d9999604e3784820d596e0983e27",
        "com.amazon.soci.image-layer-mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip"
      }
    },
    {
      "mediaType": "application/octet-stream",
      "digest": "sha256:e6f6ee873dfbe8a747c18ae134ffe61f40968ff22d74fa5959b2651da5e967e0",
      "size": 4365080,
      "annotations": {
        "com.amazon.soci.image-layer-digest": "sha256:0a0992534263fcf67d4b5ead3e49995e47e241995fec1051148292fe0ebd4c12",
        "com.amazon.soci.image-layer-mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip"
      }
    }
  ],
  "subject": {
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "digest": "sha256:1f6b58ad873037ea19d0b2e717cc1f60023963a7ea730a0686ec922545b47581",
    "size": 898
  },
  "annotations": {
    "com.amazon.soci.build-tool-identifier": "AWS SOCI CLI v0.1"
  }
}

My containerd config

version = 2

root = "/var/lib/containerd"
state = "/run/containerd"

[plugins]
  [proxy_plugins.soci]
    type = "snapshot"
    address = "/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock"

  [plugins."io.containerd.grpc.v1.cri"]
    # Just pinning to a specific digest of registry.k8s.io/pause:3.9
    sandbox_image = "registry.k8s.io/pause@sha256:8d4106c88ec0bd28001e34c975d65175d994072d65341f62a8ab0754b0fafe10"

    [plugins."io.containerd.grpc.v1.cri".containerd]
      snapshotter = "soci"
      disable_snapshot_annotations = false

Running with nerdctl

$ nerdctl run --snapshotter soci --detach 111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest
111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest:                   resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:1f6b58ad873037ea19d0b2e717cc1f60023963a7ea730a0686ec922545b47581: done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:9b3faf804fbe89052a4e489791e86d0e785827fd2a59e3690caaac488da848e3:   done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 1.5 s                                                                    total:  2.2 Ki (1.4 KiB/s)                                       
27da62825f947a77eab611eab809fde130994896147c0cb57be7e8ce00261862

$ nerdctl ps
CONTAINER ID    IMAGE                                                             COMMAND                   CREATED          STATUS    PORTS    NAMES
27da62825f94    111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest    "nginx -g daemon off;"    4 minutes ago    Up                 nginx-demo-27da6
Snapshotter Logs showing 2 remote snapshots and 1 local snapshot
{
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "parent": "",
    "time": "2023-11-10T10:31:05.545927494Z"
}
{
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "msg": "index digest not provided, making a Referrers API call to fetch list of indices",
    "parent": "",
    "time": "2023-11-10T10:31:05.546017450Z"
}
{
    "digest": "sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df",
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "msg": "fetching SOCI artifacts using index descriptor",
    "parent": "",
    "time": "2023-11-10T10:31:05.669938145Z"
}
{
    "digest": "sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df",
    "level": "info",
    "msg": "fetching SOCI index from remote registry",
    "time": "2023-11-10T10:31:05.669982229Z"
}
{
    "digest": "sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df",
    "level": "info",
    "msg": "fetching artifact from remote",
    "time": "2023-11-10T10:31:05.670029665Z"
}
{
    "digest": "sha256:e6f6ee873dfbe8a747c18ae134ffe61f40968ff22d74fa5959b2651da5e967e0",
    "level": "info",
    "msg": "fetching artifact from remote",
    "time": "2023-11-10T10:31:05.765965393Z"
}
{
    "digest": "sha256:04881d8ecc7c5d8cb0c01d391898caec9afe0d20da91a61e5c7ea8204455bf5b",
    "level": "info",
    "msg": "fetching artifact from remote",
    "time": "2023-11-10T10:31:05.765970817Z"
}
{
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "msg": "Received status code: 401 Unauthorized. Refreshing creds...",
    "parent": "",
    "src": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest/sha256:0a0992534263fcf67d4b5ead3e49995e47e241995fec1051148292fe0ebd4c12",
    "time": "2023-11-10T10:31:06.125028428Z"
}
{
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "msg": "Received status code: 401 Unauthorized. Refreshing creds...",
    "parent": "",
    "src": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest/sha256:e6a2a106da1df9b5d4bfa74686415ac760f9d9999604e3784820d596e0983e27",
    "time": "2023-11-10T10:31:06.125594058Z"
}
{
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "msg": "Verification forcefully skipped",
    "parent": "",
    "time": "2023-11-10T10:31:06.573774164Z"
}
{
    "key": "default/1/extract-541431167-69QF sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "msg": "remote snapshot successfully prepared.",
    "parent": "",
    "remote-snapshot-prepared": "true",
    "time": "2023-11-10T10:31:06.580144478Z"
}
{
    "key": "default/2/extract-584756795-z2cR sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/2/fs",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "time": "2023-11-10T10:31:06.588358222Z"
}
{
    "key": "default/2/extract-584756795-z2cR sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/2/fs",
    "msg": "Verification forcefully skipped",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "time": "2023-11-10T10:31:06.588504085Z"
}
{
    "key": "default/2/extract-584756795-z2cR sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "msg": "remote snapshot successfully prepared.",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "remote-snapshot-prepared": "true",
    "time": "2023-11-10T10:31:06.593693335Z"
}
{
    "key": "default/3/extract-597895522-ZoZd sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:31:06.601117922Z"
}
{
    "image": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest",
    "key": "default/3/extract-597895522-ZoZd sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "layerDigest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "msg": "skipping mounting layer as FUSE mount: no ztoc for layer",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:31:06.601209362Z"
}
{
    "error": "skipping mounting layer sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2 as FUSE mount: no ztoc for layer",
    "key": "default/3/extract-597895522-ZoZd sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "warning",
    "msg": "failed to prepare remote snapshot",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "remote-snapshot-prepared": "false",
    "time": "2023-11-10T10:31:06.601264202Z"
}
{
    "layerDigest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "level": "info",
    "msg": "preparing snapshot as local snapshot",
    "time": "2023-11-10T10:31:06.601427586Z"
}
{
    "key": "default/3/extract-597895522-ZoZd sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "preparing local filesystem at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:31:06.601458850Z"
}
{
    "digest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "key": "default/3/extract-597895522-ZoZd sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "fetching artifact from remote",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:31:06.601522128Z"
}
{
    "key": "default/3/extract-597895522-ZoZd sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "local snapshot successfully prepared",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:31:06.694161690Z"
}

Running with crictl

$ cat pod-config.json
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "log_directory": "/tmp",
    "linux": {
    }
}

$ cat container-config.json
{
  "metadata": {
      "name": "nginxdemo"
  },
  "image":{
      "image": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest"
  },
  "log_path":"nginx.0.log",
  "linux": {
  }
}

$ TOKEN=$(aws ecr get-authorization-token --region eu-west-1 | jq -r '.authorizationData[0].authorizationToken' | base64 -d)

$ sudo crictl \
  --runtime-endpoint "unix:///var/run/containerd/containerd.sock" \
  --image-endpoint "unix:///var/run/containerd/containerd.sock" \
  run \
  --creds $TOKEN \
  container-config.json \
  pod-config.json

$ sudo crictl \
  --runtime-endpoint "unix:///var/run/containerd/containerd.sock" \
  pods
POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
c674eb5332d7e       3 minutes ago       Ready               nginx-sandbox       default             1                   (default)

$ sudo crictl \
  --runtime-endpoint "unix:///var/run/containerd/containerd.sock" \
  ps
CONTAINER           IMAGE                                                            CREATED             STATE               NAME                ATTEMPT             POD ID              POD
a22995067ca0e       111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest   3 minutes ago       Running             nginxdemo           0                   c674eb5332d7e       unknown
Snapshotter Logs for failed local snapshot for pause container
{
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "parent": "",
    "time": "2023-11-10T10:58:22.915836434Z"
}
{
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "msg": "index digest not provided, making a Referrers API call to fetch list of indices",
    "parent": "",
    "time": "2023-11-10T10:58:22.915944996Z"
}
{
    "error": "unable to fetch SOCI artifacts: cannot fetch list of referrers: unable to fetch referrers: GET \"https://accounts.google.com/v3/signin/identifier?continue=https%3A%2F%2Fconsole.cloud.google.com%2Fartifacts%2Ftags%2Fv2%2Feurope-west2%2Fk8s-artifacts-prod%2Fimages%252Fpause%252Freferrers%252Fsha256%2F8d4106c88ec0bd28001e34c975d65175d994072d65341f62a8ab0754b0fafe10\u0026followup=https%3A%2F%2Fconsole.cloud.google.com%2Fartifacts%2Ftags%2Fv2%2Feurope-west2%2Fk8s-artifacts-prod%2Fimages%252Fpause%252Freferrers%252Fsha256%2F8d4106c88ec0bd28001e34c975d65175d994072d65341f62a8ab0754b0fafe10\u0026ifkv=AVQVeyxNoNrCAv-K2lHSvMWhNk9GhWERQaP9ms_NJXF0_WgSmx93TAVDFvLrOZ3PCYh8O24wiu_K\u0026osid=1\u0026passive=1209600\u0026service=cloudconsole\u0026flowName=WebLiteSignIn\u0026flowEntry=ServiceLogin\u0026dsh=S755472317%3A1699613903093988\": failed to decode response: invalid character '\u003c' looking for beginning of value",
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "warning",
    "msg": "failed to prepare remote snapshot",
    "parent": "",
    "remote-snapshot-prepared": "false",
    "time": "2023-11-10T10:58:23.225310304Z"
}
{
    "layerDigest": "sha256:61fec91190a0bab34406027bbec43d562218df6e80d22d4735029756f23c7007",
    "level": "info",
    "msg": "preparing snapshot as local snapshot",
    "time": "2023-11-10T10:58:23.225370917Z"
}
{
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "info",
    "msg": "preparing local filesystem at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs",
    "parent": "",
    "time": "2023-11-10T10:58:23.225405422Z"
}
{
    "digest": "sha256:61fec91190a0bab34406027bbec43d562218df6e80d22d4735029756f23c7007",
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "info",
    "msg": "fetching artifact from remote",
    "parent": "",
    "time": "2023-11-10T10:58:23.225498491Z"
}
{
    "digest": "sha256:61fec91190a0bab34406027bbec43d562218df6e80d22d4735029756f23c7007",
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "warning",
    "msg": "size of descriptor is 0, trying to resolve it...",
    "parent": "",
    "time": "2023-11-10T10:58:23.225515000Z"
}
{
    "error": "cannot unpack the layer: cannot fetch layer: size of descriptor is 0; unable to resolve: unable to resolve ref (registry.k8s.io/pause@sha256:61fec91190a0bab34406027bbec43d562218df6e80d22d4735029756f23c7007): registry.k8s.io/pause@sha256:61fec91190a0bab34406027bbec43d562218df6e80d22d4735029756f23c7007: not found",
    "key": "k8s.io/1/extract-911068980-HwpE sha256:e3e5579ddd43c08e4b5c74dc12941a4ef656fab070b1087a1fd5a8a836b71e7d",
    "level": "warning",
    "msg": "failed to prepare snapshot; deferring to container runtime",
    "parent": "",
    "time": "2023-11-10T10:58:23.347868140Z"
}
Snapshotter Logs for Nginx Container. 2 Successful Remote Snapshots, 1 Failed local snapshot
{
    "key": "k8s.io/4/extract-889519745-F-vT sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "parent": "",
    "time": "2023-11-10T10:58:23.893369096Z"
}
{
    "key": "k8s.io/4/extract-889519745-F-vT sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "msg": "index digest not provided, making a Referrers API call to fetch list of indices",
    "parent": "",
    "time": "2023-11-10T10:58:23.893446004Z"
}
{
    "digest": "sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df",
    "key": "k8s.io/4/extract-889519745-F-vT sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "msg": "fetching SOCI artifacts using index descriptor",
    "parent": "",
    "time": "2023-11-10T10:58:24.098372503Z"
}
{
    "digest": "sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df",
    "level": "info",
    "msg": "fetching SOCI index from remote registry",
    "time": "2023-11-10T10:58:24.098413992Z"
}
{
    "digest": "sha256:90ae56adc78f9363b3b2397f87b9d5e971e004694120ccbc0b58cc8d4b5654df",
    "level": "info",
    "msg": "fetching artifact from remote",
    "time": "2023-11-10T10:58:24.098456319Z"
}
{
    "digest": "sha256:04881d8ecc7c5d8cb0c01d391898caec9afe0d20da91a61e5c7ea8204455bf5b",
    "level": "info",
    "msg": "fetching artifact from remote",
    "time": "2023-11-10T10:58:24.184183399Z"
}
{
    "digest": "sha256:e6f6ee873dfbe8a747c18ae134ffe61f40968ff22d74fa5959b2651da5e967e0",
    "level": "info",
    "msg": "fetching artifact from remote",
    "time": "2023-11-10T10:58:24.184210447Z"
}
{
    "key": "k8s.io/4/extract-889519745-F-vT sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "msg": "Received status code: 401 Unauthorized. Refreshing creds...",
    "parent": "",
    "src": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest/sha256:e6a2a106da1df9b5d4bfa74686415ac760f9d9999604e3784820d596e0983e27",
    "time": "2023-11-10T10:58:24.493977616Z"
}
{
    "key": "k8s.io/4/extract-889519745-F-vT sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs",
    "msg": "Verification forcefully skipped",
    "parent": "",
    "time": "2023-11-10T10:58:24.894619999Z"
}
{
    "key": "k8s.io/4/extract-889519745-F-vT sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "level": "info",
    "msg": "remote snapshot successfully prepared.",
    "parent": "",
    "remote-snapshot-prepared": "true",
    "time": "2023-11-10T10:58:24.900188272Z"
}
{
    "key": "k8s.io/5/extract-921454245-nQMe sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/4/fs",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "time": "2023-11-10T10:58:24.924948947Z"
}
{
    "key": "k8s.io/5/extract-921454245-nQMe sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/4/fs",
    "msg": "Received status code: 401 Unauthorized. Refreshing creds...",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "src": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest/sha256:0a0992534263fcf67d4b5ead3e49995e47e241995fec1051148292fe0ebd4c12",
    "time": "2023-11-10T10:58:24.932276438Z"
}
{
    "key": "k8s.io/5/extract-921454245-nQMe sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/4/fs",
    "msg": "Verification forcefully skipped",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "time": "2023-11-10T10:58:25.315583990Z"
}
{
    "key": "k8s.io/5/extract-921454245-nQMe sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "level": "info",
    "msg": "remote snapshot successfully prepared.",
    "parent": "sha256:5456d740f71ce2e23882740d2d87a61ad270a5ba2679dede61fd8541e4277d29",
    "remote-snapshot-prepared": "true",
    "time": "2023-11-10T10:58:25.322447547Z"
}
{
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/5/fs",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:58:25.330450563Z"
}
{
    "image": "111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo:latest",
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "layerDigest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "level": "info",
    "mountpoint": "/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/5/fs",
    "msg": "skipping mounting layer as FUSE mount: no ztoc for layer",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:58:25.330536859Z"
}
{
    "error": "skipping mounting layer sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2 as FUSE mount: no ztoc for layer",
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "warning",
    "msg": "failed to prepare remote snapshot",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "remote-snapshot-prepared": "false",
    "time": "2023-11-10T10:58:25.330584665Z"
}
{
    "layerDigest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "level": "info",
    "msg": "preparing snapshot as local snapshot",
    "time": "2023-11-10T10:58:25.330688150Z"
}
{
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "preparing local filesystem at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/5/fs",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:58:25.330714499Z"
}
{
    "digest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "info",
    "msg": "fetching artifact from remote",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:58:25.330773723Z"
}
{
    "digest": "sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2",
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "warning",
    "msg": "size of descriptor is 0, trying to resolve it...",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:58:25.330787676Z"
}
{
    "error": "cannot unpack the layer: cannot fetch layer: size of descriptor is 0; unable to resolve: unable to resolve ref (111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo@sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2): 111222333444.dkr.ecr.eu-west-1.amazonaws.com/nginx-demo@sha256:4dbc43c9a102fbf8cfc991788f21b809b9aa10d365bf4aaf9d1ef22ffb0e8db2: not found",
    "key": "k8s.io/6/extract-327009088-xpeP sha256:5fcd89177512660e58f6bd7596bd41b0a0ac94912b14bc13b11ebee594ce79ef",
    "level": "warning",
    "msg": "failed to prepare snapshot; deferring to container runtime",
    "parent": "sha256:7743d89969b2f11448ff9c3c7d0b67768bf706d37581e804f3cc4b44be22687e",
    "time": "2023-11-10T10:58:25.387041197Z"
}

Describe the results you expected

I was expecting the SOCI snapshotter to create a local snapshot.

Host information

$ uname -a
Linux ip-172-31-19-9.eu-west-1.compute.internal 5.10.198-187.748.amzn2.x86_64 #1 SMP Tue Oct 24 19:49:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

$ sudo ctr --version
ctr github.com/containerd/containerd 1.6.19

$ sudo nerdctl --version
nerdctl version 1.7.0

$ sudo soci --version
soci version v0.4.0 894c8ece8a2177928343899b1f897d2164c98254

$ sudo crictl --version
crictl version v1.28.0

Any additional context or information about the bug

No response

@ollypom ollypom added the bug Something isn't working label Nov 10, 2023
@github-project-automation github-project-automation bot moved this to ❓ Ungroomed in soci-snapshotter Nov 21, 2023
@Kern-- Kern-- moved this from ❓ Ungroomed to 📋 Backlog in soci-snapshotter Nov 21, 2023
@Kern--
Copy link
Contributor

Kern-- commented Nov 23, 2023

tl;dr there's a bug which causes us to resolve layers as if they were manifests which then causes containerd to pull all subsequent layers ahead of time. This happens when using the CRI on a sparse index (i.e. one where not all layers have zTOCS).

Impact

As you saw from your testing, this bug doesn't immediately prevent SOCI from working. What it does do is it causes containerd to pull all subsequent layers ahead of time once (similar behavior to #693).

Tracing the bug

Tracking this down, when we do a non-lazy layer pull we call:

  1. fs.MountLocal calls unpacker.Unpack

  2. unpacker.Unpack calls artifactFetcher.Fetch

  3. if the descriptor size is 0, artifactFetcher.Fetch calls artifactFetcher.resolve

  4. artifactFetcher.resolve calls remote.Repository.Resolve from oras-go

  5. remote.Repository.Resolve assumes it's resolving a manifest.

Since this is a layer, not a manifest, the resolution fails.

Why does this even happen?

  • We need the size layer because when we store the object from the registry, we verify that the length of the descriptor matches the length of data we actually get back.
  • This affects non-lazy loaded layers with CRI because the containerd CRI label wrapper doesn't pass the layer size, so we have to resolve it ourselves.
  • This doesn't affect lazy loaded layers because, while we do call artifactFetcher.Fetch, the size is not 0 because we're fetching ztocs and we get the size from the SOCI index.
  • This doesn't affect nerdctl because nerdctl uses our label wrapper which adds the size of the layer to the labels.

Potential fixes

Fixing this is a little weird because there's at least 1 other bug. Even if we managed to resolve this correctly, unpacker.Unpack would still have the descriptor size set to 0 so we would just fail when we try to store the layer locally.

We should fix artifactFetcher.resolve to try to resolve both manifests and blobs to fix the immediate issue in this report. Then we should move the size resolution as early in the process as possible. We may want to take a look at if there's any refactoring we can do here to make this less awkward.

We should probably also look at upstreaming our layer size annotation. This would definitely be helpful for stargz and it might be helpful for other remote snapshotters as well.

@Kern-- Kern-- added this to the v0.5.0 milestone Nov 27, 2023
@debajyoti-truefoundry
Copy link

@ollypom @Kern--

I am on v0.4.0.

{
  "error": "cannot unpack the layer: cannot fetch layer: size of descriptor is 0; unable to resolve: unable to resolve ref (nvcr.io/nvidia/k8s/container-toolkit@sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9): Head \"https://nvcr.io/v2/nvidia/k8s/container-toolkit/manifests/sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9\": HEAD https://nvcr.io/v2/nvidia/k8s/container-toolkit/manifests/sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9 giving up after 3 attempt(s)",
  "key": "k8s.io/260/extract-45204369-6hIj sha256:a177c22b4e0d76a18351a1a31c666de1643a68f2a3b4c6408762ffef8e5318cc",
  "level": "warning",
  "msg": "failed to prepare snapshot; deferring to container runtime",
  "parent": "k8s.io/259/sha256:63c5d3862f93c51ccb88bbc83cdc6a515e90e7d375631f0bee85b2f01b5cf715",
  "time": "2023-12-08T14:43:25.786365548Z"
}

In this case, Head https://nvcr.io/v2/nvidia/k8s/container-toolkit/manifests/sha256:745cad9a8a1e0a0d92738687a85b5a314d324dfca7c2dc6f2b2111508f6fbec9, returns 401 and it goes to a retry loop.

When a GPU node comes up on EKS, we run 4-5 containers with images from nvcr.io. All of them were getting stuck in the "Pulling image" state.

I updated the retry configuration to,

[http]
MinWaitMsec=15
MaxRetries=2

I do not have concrete numbers on my hand right now, but I could see a noticeable improvement.

Do you guys have any suggestions? Can I express something like, "If there is no soci index present, immediately defer all the layers to container runtime." via the configuration?

Thanks.

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in soci-snapshotter Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kubernetes
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants