Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't pull modestly sized image from Quay.io #14789

Closed
JossWhittle opened this issue Aug 14, 2022 · 35 comments
Closed

Can't pull modestly sized image from Quay.io #14789

JossWhittle opened this issue Aug 14, 2022 · 35 comments
Labels
co/runtime/docker Issues specific to a docker runtime kind/bug Categorizes issue or PR as related to a bug.

Comments

@JossWhittle
Copy link

What Happened?

Pulling a 600MB image and larger from Quay.io fails with context deadline exceeded.

The image is there, I can pull it with docker / podman / singularity.

I tagged the ubuntu:20.04 base image and pushed it to our Quay.io repo. This image is 27MB and minikube is able to successfully pull and run the image. But it fails to pull the 600MB image from the same repo.

This is not an authentication issue with the private repo because the 27MB image works.

On my network monitoring I can see the traffic from the 600MB image being pulled in, and it is pulling the full image in. On Quay.io I can see the pull logs for the image and it is being pulled, but it always fails due context deadline exceeded.

This is not a network issue. I have a stable and extremely fast connection.

I am at my wits end here. This is only a 600MB image. Larger images also fail. What is happening?

Attach the log file

$minikube start
  minikube v1.26.1 on Ubuntu 20.04 (kvm/amd64)
✨  Automatically selected the docker driver. Other choices: ssh, none
  Using Docker driver with root privileges
  Starting control plane node minikube in cluster minikube
  Pulling base image ...
  Downloading Kubernetes v1.24.3 preload ...
    > preloaded-images-k8s-v18-v1...:  405.75 MiB / 405.75 MiB  100.00% 317.54 
  Creating docker container (CPUs=2, Memory=25100MB) ...
  Preparing Kubernetes v1.24.3 on Docker 20.10.17 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
  Enabled addons: storage-provisioner, default-storageclass
  kubectl not found. If you need it, try: 'minikube kubectl -- get pods -A'
  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

I cannot give you anything more than these relevant lines from minikube logs redacted for hopefully obvious reasons.

Aug 14 14:07:55 minikube kubelet[2069]: E0814 14:07:55.120188    2069 remote_image.go:218] "PullImage from image service failed" err="rpc error: code = Unknown desc = context deadline exceeded" image="quay.io/USER/IMAGE:TAG"
...
Aug 14 14:07:55 minikube kubelet[2069]: E0814 14:07:55.120724    2069 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"REDACTED\" with ErrImagePull: \"rpc error: code = Unknown desc = context deadline exceeded\"" pod="REDACTED/REDACTED" podUID=28b393c0-3573-41de-9747-386d911ab8fd
Aug 14 14:07:55 minikube kubelet[2069]: E0814 14:07:55.889526    2069 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"REDACTED\" with ImagePullBackOff: \"Back-off pulling image \\\"quay.io/USER/IMAGE:TAG\\\"\"" pod="REDACTED/REDACTED" podUID=28b393c0-3573-41de-9747-386d911ab8fd

Operating System

Ubuntu

Driver

Docker

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

As far as I know, this timeout (60 seconds) is from the container runtime. Not in minikube, and not even Kubernetes anymore...

Like --image-pull-progress-deadline, that you set in cri-dockerd (currently all options are updated in the systemd unit file)

https://github.com/Mirantis/cri-dockerd/blob/v0.2.5/config/options.go#L48

In the future, there might be a config file where these can be set as well.


Loading it from a tarball in the cache (or a local registry), would be the workaround...

See minikube image load for options. It doesn't have any such timeouts, by default.

@afbjorklund afbjorklund added co/runtime/docker Issues specific to a docker runtime kind/support Categorizes issue or PR as a support question. and removed co/runtime/docker Issues specific to a docker runtime labels Aug 14, 2022
@afbjorklund
Copy link
Collaborator

Hmm, wonder if this is some kind of regression. It is suppose to trigger on missing progress updates.
So as long as there is a trickle left (512 KiB in 10 mintes), it should not cause the pull to be aborted.

	// If no pulling progress is made before imagePullProgressDeadline, the image pulling will be cancelled.
	// Docker reports image progress for every 512kB block, so normally there shouldn't be too long interval
	// between progress updates.
	imagePullProgressDeadline time.Duration

@JossWhittle
Copy link
Author

JossWhittle commented Aug 14, 2022

Thank you @afbjorklund.

If it is from the container runtime why does docker pull on it's own work? I don't understand why this would be a desirable behaviour in any situation. Images are rarely small and the error response is almost meaningless for debugging the issue.

I've installed minikube in a very standard way sudo install ... on a machine that had docker installed. I've never looked at cri-dockerd specifically and so it must just be using all defaults.

@JossWhittle
Copy link
Author

Also I can see the whole 600MB image being pulled in in my network monitoring. It's more than a trickle it's a firehose :)

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

If it is from the container runtime why does docker pull on it's own work?

I mean it is in the CRI, which is the murky gray zone (DMZ) between them
i.e. it is not in Kubernetes (anymore, since 1.24) and it is not in Docker et al

Also, the code comments say "10 seconds" timeout rather than "1 minute" ?

Wonder if there is another layer involved, like a timeout from the gRPC in CRI etc.
Like you say, the error message doesn't immediately suggest where to look either.


EDIT: The 10 seconds was how often it updates the log, then a timeout after 60

@JossWhittle
Copy link
Author

JossWhittle commented Aug 14, 2022

Thanks for your help on this, especially on a Sunday. I guess I will look at methods of getting images into minikube from local registry, and or look at swapping to the podman runtime.

I've got a colleague at another organization who has confirmed the behaviour independently.

It feels almost absurd that the docker runtime would have this as a default behaviour since it just makes it silently unusable.

Also, the code comments say "10 seconds" timeout rather than "1 minute" ?

It's swapping to the context exceeded deadline response after one minute. By this point the network traffic from the image pull has ended.

@afbjorklund
Copy link
Collaborator

It feels almost absurd that the docker runtime would have this as a default behaviour since it just makes it silently unusable.

It would be interesting to know if -1.23 also has this problem, or if it is a regression due to the dockershim removal in 1.24- ?

@JossWhittle
Copy link
Author

Ahh I didn't realize something major like that had happened recently.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

We do see this behaviour in other places, when loading a large image such as KIC base into a new cluster.

Then it gives progress for the download, but then there is a big old freeze at 99% when it is loading...

To put it in old world terms, you get progress output from the "wget" but nothing from the "tar"

It is especially bad if you already have it loaded, since then the download progress is empty as well

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

Ahh I didn't realize something major like that had happened recently.

I'm not sure transferring a component to another power is a major event ?

https://www.mirantis.com/blog/the-future-of-dockershim-is-cri-dockerd/

But sure, both CRI and CNI have been snoozed upon long enough (in Docker)

@JossWhittle
Copy link
Author

Is there a way to see pull progress during the pull for a config apply, or only when doing minikube image pull?

Currently I'm just sitting in the blind until it swaps to the error after a minute.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

Hmm, as far as I know it uses the docker client which calls the REST API - so you are at the mercy of those.

But it is supposed to output everything in the log, at the "info" level. So it's hidden somewhere in journald

minikube ssh -- sudo journalctl -u cri-docker.service

Nice: Using CNI configuration file /etc/cni/net.d/100-crio-bridge.conf - that reminds me of a backlog item...


EDIT: I take my earlier comments back, it does seem to be outputting progress for both download and extract:

Aug 14 15:58:42 minikube cri-dockerd[874]: time="2022-08-14T15:58:42Z" level=info msg="Pulling image python: 8a335986117b: Downloading [=======>                                           ]  31.11MB/197.5MB"
Aug 14 15:58:52 minikube cri-dockerd[874]: time="2022-08-14T15:58:52Z" level=info msg="Pulling image python: 8a335986117b: Downloading [============================>                      ]  113.5MB/197.5MB"
Aug 14 15:59:02 minikube cri-dockerd[874]: time="2022-08-14T15:59:02Z" level=info msg="Pulling image python: 8a335986117b: Extracting [==>                                                ]  8.913MB/197.5MB"
Aug 14 15:59:08 minikube cri-dockerd[874]: time="2022-08-14T15:59:08Z" level=info msg="Stop pulling image python: Status: Downloaded newer image for python:latest"

minikube ssh -- sudo crictl pull python

@afbjorklund afbjorklund added the co/runtime/docker Issues specific to a docker runtime label Aug 14, 2022
@JossWhittle
Copy link
Author

From earlier pulling my images as part of the deployment...

Aug 14 15:48:09 minikube cri-dockerd[885]: time="2022-08-14T15:48:09Z" level=error msg="ContainerStats resp: {0xc000a1de40 linux}"
Aug 14 15:48:09 minikube cri-dockerd[885]: time="2022-08-14T15:48:09Z" level=error msg="ContainerStats resp: {0xc00039f8c0 linux}"
Aug 14 15:48:09 minikube cri-dockerd[885]: time="2022-08-14T15:48:09Z" level=error msg="ContainerStats resp: {0xc000aba040 linux}"
Aug 14 15:48:09 minikube cri-dockerd[885]: time="2022-08-14T15:48:09Z" level=error msg="ContainerStats resp: {0xc00039f180 linux}"
Aug 14 15:48:10 minikube cri-dockerd[885]: time="2022-08-14T15:48:10Z" level=info msg="Pulling image quay.io/USER/IMAGE:TAG: e6094af204bb: Extracting [==================>                                ]  4.456MB/12.33MB"
Aug 14 15:48:19 minikube cri-dockerd[885]: time="2022-08-14T15:48:19Z" level=error msg="ContainerStats resp: {0xc000ab9740 linux}"
Aug 14 15:48:20 minikube cri-dockerd[885]: time="2022-08-14T15:48:20Z" level=info msg="Pulling image quay.io/USER/IMAGE:TAG: 3249d9fd0e04: Extracting [==============>                                    ]   79.1MB/265.3MB"
Aug 14 15:48:20 minikube cri-dockerd[885]: time="2022-08-14T15:48:20Z" level=error msg="ContainerStats resp: {0xc000abbfc0 linux}"
Aug 14 15:48:21 minikube cri-dockerd[885]: time="2022-08-14T15:48:21Z" level=error msg="ContainerStats resp: {0xc000144cc0 linux}"
Aug 14 15:48:21 minikube cri-dockerd[885]: time="2022-08-14T15:48:21Z" level=error msg="ContainerStats resp: {0xc000b63900 linux}"
Aug 14 15:48:21 minikube cri-dockerd[885]: time="2022-08-14T15:48:21Z" level=error msg="ContainerStats resp: {0xc000a30280 linux}"
Aug 14 15:48:21 minikube cri-dockerd[885]: time="2022-08-14T15:48:21Z" level=error msg="ContainerStats resp: {0xc000145680 linux}"
Aug 14 15:48:21 minikube cri-dockerd[885]: time="2022-08-14T15:48:21Z" level=error msg="ContainerStats resp: {0xc000a30d40 linux}"
Aug 14 15:48:21 minikube cri-dockerd[885]: time="2022-08-14T15:48:21Z" level=error msg="ContainerStats resp: {0xc000a31000 linux}"
Aug 14 15:48:28 minikube cri-dockerd[885]: time="2022-08-14T15:48:28Z" level=info msg="Stop pulling image quay.io/USER/IMAGE:TAG: 3249d9fd0e04: Extracting [=============================>                     ]  154.3MB/265.3MB"
Aug 14 15:48:32 minikube cri-dockerd[885]: time="2022-08-14T15:48:32Z" level=error msg="ContainerStats resp: {0xc000a31900 linux}"
Aug 14 15:48:33 minikube cri-dockerd[885]: time="2022-08-14T15:48:33Z" level=error msg="ContainerStats resp: {0xc000a30200 linux}"
Aug 14 15:48:34 minikube cri-dockerd[885]: time="2022-08-14T15:48:34Z" level=error msg="ContainerStats resp: {0xc000a31740 linux}"
Aug 14 15:48:34 minikube cri-dockerd[885]: time="2022-08-14T15:48:34Z" level=error msg="ContainerStats resp: {0xc000b63580 linux}"

After running minikube ssh -- sudo crictl pull python ...

Aug 14 16:04:16 minikube cri-dockerd[885]: time="2022-08-14T16:04:16Z" level=error msg="ContainerStats resp: {0xc000a2f100 linux}"
Aug 14 16:04:16 minikube cri-dockerd[885]: time="2022-08-14T16:04:16Z" level=error msg="ContainerStats resp: {0xc000a1cd40 linux}"
Aug 14 16:04:16 minikube cri-dockerd[885]: time="2022-08-14T16:04:16Z" level=error msg="ContainerStats resp: {0xc000a2f340 linux}"
Aug 14 16:04:26 minikube cri-dockerd[885]: time="2022-08-14T16:04:26Z" level=error msg="ContainerStats resp: {0xc000ab8540 linux}"
Aug 14 16:04:27 minikube cri-dockerd[885]: time="2022-08-14T16:04:27Z" level=error msg="ContainerStats resp: {0xc000a2fcc0 linux}"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=info msg="Pulling image python: 8a335986117b: Downloading [==>                                                ]  10.73MB/197.5MB"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=error msg="ContainerStats resp: {0xc000a1c840 linux}"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=error msg="ContainerStats resp: {0xc000a1cc80 linux}"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=error msg="ContainerStats resp: {0xc000d30d80 linux}"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=error msg="ContainerStats resp: {0xc000ab8f00 linux}"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=error msg="ContainerStats resp: {0xc000ab9c00 linux}"
Aug 14 16:04:28 minikube cri-dockerd[885]: time="2022-08-14T16:04:28Z" level=error msg="ContainerStats resp: {0xc000d30ec0 linux}"
Aug 14 16:04:38 minikube cri-dockerd[885]: time="2022-08-14T16:04:38Z" level=info msg="Pulling image python: 001c52e26ad5: Extracting [==================================================>]     55MB/55MB"
Aug 14 16:04:39 minikube cri-dockerd[885]: time="2022-08-14T16:04:39Z" level=error msg="ContainerStats resp: {0xc000a2f8c0 linux}"
Aug 14 16:04:40 minikube cri-dockerd[885]: time="2022-08-14T16:04:40Z" level=error msg="ContainerStats resp: {0xc000b63d00 linux}"
Aug 14 16:04:41 minikube cri-dockerd[885]: time="2022-08-14T16:04:41Z" level=error msg="ContainerStats resp: {0xc000ab8780 linux}"
Aug 14 16:04:41 minikube cri-dockerd[885]: time="2022-08-14T16:04:41Z" level=error msg="ContainerStats resp: {0xc00063d940 linux}"
Aug 14 16:04:41 minikube cri-dockerd[885]: time="2022-08-14T16:04:41Z" level=error msg="ContainerStats resp: {0xc000ab9200 linux}"
Aug 14 16:04:41 minikube cri-dockerd[885]: time="2022-08-14T16:04:41Z" level=error msg="ContainerStats resp: {0xc000a1c3c0 linux}"
Aug 14 16:04:41 minikube cri-dockerd[885]: time="2022-08-14T16:04:41Z" level=error msg="ContainerStats resp: {0xc00003be80 linux}"
Aug 14 16:04:41 minikube cri-dockerd[885]: time="2022-08-14T16:04:41Z" level=error msg="ContainerStats resp: {0xc000ab8200 linux}"
Aug 14 16:04:48 minikube cri-dockerd[885]: time="2022-08-14T16:04:48Z" level=info msg="Pulling image python: 9daef329d350: Extracting [============================================>      ]  49.02MB/54.58MB"
Aug 14 16:04:52 minikube cri-dockerd[885]: time="2022-08-14T16:04:52Z" level=error msg="ContainerStats resp: {0xc000144e40 linux}"
Aug 14 16:04:53 minikube cri-dockerd[885]: time="2022-08-14T16:04:53Z" level=error msg="ContainerStats resp: {0xc000a1d480 linux}"
Aug 14 16:04:54 minikube cri-dockerd[885]: time="2022-08-14T16:04:54Z" level=error msg="ContainerStats resp: {0xc000a1df80 linux}"
Aug 14 16:04:54 minikube cri-dockerd[885]: time="2022-08-14T16:04:54Z" level=error msg="ContainerStats resp: {0xc000e02ec0 linux}"
Aug 14 16:04:54 minikube cri-dockerd[885]: time="2022-08-14T16:04:54Z" level=error msg="ContainerStats resp: {0xc000b62040 linux}"
Aug 14 16:04:54 minikube cri-dockerd[885]: time="2022-08-14T16:04:54Z" level=error msg="ContainerStats resp: {0xc000a32540 linux}"
Aug 14 16:04:54 minikube cri-dockerd[885]: time="2022-08-14T16:04:54Z" level=error msg="ContainerStats resp: {0xc000163380 linux}"
Aug 14 16:04:54 minikube cri-dockerd[885]: time="2022-08-14T16:04:54Z" level=error msg="ContainerStats resp: {0xc000e02e80 linux}"
Aug 14 16:04:58 minikube cri-dockerd[885]: time="2022-08-14T16:04:58Z" level=info msg="Pulling image python: 8a335986117b: Extracting [===========>                                       ]  44.01MB/197.5MB"
Aug 14 16:05:04 minikube cri-dockerd[885]: time="2022-08-14T16:05:04Z" level=error msg="ContainerStats resp: {0xc000163080 linux}"
Aug 14 16:05:06 minikube cri-dockerd[885]: time="2022-08-14T16:05:06Z" level=error msg="ContainerStats resp: {0xc000b62600 linux}"
Aug 14 16:05:07 minikube cri-dockerd[885]: time="2022-08-14T16:05:07Z" level=error msg="ContainerStats resp: {0xc000ab9b80 linux}"
Aug 14 16:05:07 minikube cri-dockerd[885]: time="2022-08-14T16:05:07Z" level=error msg="ContainerStats resp: {0xc000a2f7c0 linux}"
Aug 14 16:05:07 minikube cri-dockerd[885]: time="2022-08-14T16:05:07Z" level=error msg="ContainerStats resp: {0xc0001629c0 linux}"
Aug 14 16:05:07 minikube cri-dockerd[885]: time="2022-08-14T16:05:07Z" level=error msg="ContainerStats resp: {0xc00003be40 linux}"
Aug 14 16:05:07 minikube cri-dockerd[885]: time="2022-08-14T16:05:07Z" level=error msg="ContainerStats resp: {0xc0001631c0 linux}"
Aug 14 16:05:07 minikube cri-dockerd[885]: time="2022-08-14T16:05:07Z" level=error msg="ContainerStats resp: {0xc000b63a00 linux}"
Aug 14 16:05:08 minikube cri-dockerd[885]: time="2022-08-14T16:05:08Z" level=info msg="Pulling image python: 8a335986117b: Extracting [====================================>              ]  144.8MB/197.5MB"
Aug 14 16:05:17 minikube cri-dockerd[885]: time="2022-08-14T16:05:17Z" level=error msg="ContainerStats resp: {0xc00063c900 linux}"
Aug 14 16:05:18 minikube cri-dockerd[885]: time="2022-08-14T16:05:18Z" level=info msg="Pulling image python: f40cdf3ebc10: Extracting [============>                                      ]  5.046MB/20.04MB"
Aug 14 16:05:18 minikube cri-dockerd[885]: time="2022-08-14T16:05:18Z" level=error msg="ContainerStats resp: {0xc000a31240 linux}"
Aug 14 16:05:19 minikube cri-dockerd[885]: time="2022-08-14T16:05:19Z" level=error msg="ContainerStats resp: {0xc000a2ff40 linux}"
Aug 14 16:05:19 minikube cri-dockerd[885]: time="2022-08-14T16:05:19Z" level=error msg="ContainerStats resp: {0xc00039f9c0 linux}"
Aug 14 16:05:19 minikube cri-dockerd[885]: time="2022-08-14T16:05:19Z" level=error msg="ContainerStats resp: {0xc00063dd80 linux}"
Aug 14 16:05:19 minikube cri-dockerd[885]: time="2022-08-14T16:05:19Z" level=error msg="ContainerStats resp: {0xc000e02380 linux}"
Aug 14 16:05:19 minikube cri-dockerd[885]: time="2022-08-14T16:05:19Z" level=error msg="ContainerStats resp: {0xc00077c280 linux}"
Aug 14 16:05:20 minikube cri-dockerd[885]: time="2022-08-14T16:05:20Z" level=error msg="ContainerStats resp: {0xc000b63600 linux}"
Aug 14 16:05:22 minikube cri-dockerd[885]: time="2022-08-14T16:05:22Z" level=info msg="Stop pulling image python: Status: Downloaded newer image for python:latest"
Aug 14 16:05:30 minikube cri-dockerd[885]: time="2022-08-14T16:05:30Z" level=error msg="ContainerStats resp: {0xc000a615c0 linux}"
Aug 14 16:05:31 minikube cri-dockerd[885]: time="2022-08-14T16:05:31Z" level=error msg="ContainerStats resp: {0xc000a1da80 linux}"
Aug 14 16:05:32 minikube cri-dockerd[885]: time="2022-08-14T16:05:32Z" level=error msg="ContainerStats resp: {0xc000e027c0 linux}"
Aug 14 16:05:32 minikube cri-dockerd[885]: time="2022-08-14T16:05:32Z" level=error msg="ContainerStats resp: {0xc000162ec0 linux}"

@JossWhittle
Copy link
Author

My image has a "Stopping pulling image" line part way through grabbing some of the layers.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

It might be a regression. (from dockershim to cri-dockerd)

}

func (d *kubeDockerClient) Info() (*dockertypes.Info, error) {
-	ctx, cancel := d.getTimeoutContext()
+	ctx, cancel := context.WithTimeout(context.Background(), d.timeout)
	defer cancel()
	resp, err := d.client.Info(ctx)
	if ctxErr := contextError(ctx); ctxErr != nil {
 	}
 	opts.RegistryAuth = base64Auth
-	ctx, cancel := d.getCancelableContext()
+	ctx, cancel := context.WithTimeout(context.Background(), d.timeout)
 	defer cancel()
 	resp, err := d.client.ImagePull(ctx, image, opts)
 	if err != nil {

So looks like copy/paste. (needs to be reported to Mirantis)

// getCancelableContext returns a new cancelable context. For long running requests without timeout, we use cancelable
// context to avoid potential resource leak, although the current implementation shouldn't leak resource.
func (d *kubeDockerClient) getCancelableContext() (context.Context, context.CancelFunc) {
        return context.WithCancel(context.Background())
}

// getTimeoutContext returns a new context with default request timeout
func (d *kubeDockerClient) getTimeoutContext() (context.Context, context.CancelFunc) {
        return context.WithTimeout(context.Background(), d.timeout)
}

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

Note that it also looks totally unrelated to the ImagePullProgressDeadline setting, oops.

Then again, the code says that short operations are two minutes (not the one observed)

@JossWhittle
Copy link
Author

Note that it also looks totally unrelated to the ImagePullProgressDeadline setting, oops.

No worries. I wouldn't have had a clue where to look without your help.

@JossWhittle
Copy link
Author

Short term, moving to a different runtime seems to be the solution.

@JossWhittle
Copy link
Author

Then again, the code says that short operations are two minutes (not the one observed)

It's definitely closer to 1 minute than 2. I saw the 2 minute default the other day but we ruled it out at the time because we were definitely seeing it fail sooner.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

Short term, moving to a different runtime seems to be the solution.

That was supposed to be the long term solution... :-) But, as the saying goes, now you have two problems ?

Once we can sort out the legacy docker socket issues, we too might move over to containerd as the default runtime.

@JossWhittle
Copy link
Author

Thanks again for all your help. Enjoy your sunday :)

@afbjorklund afbjorklund added kind/bug Categorizes issue or PR as related to a bug. and removed kind/support Categorizes issue or PR as a support question. labels Aug 14, 2022
@afbjorklund
Copy link
Collaborator

It would be nice to confirm two things:

  1. that this did not happen with --kubernetes-version=v1.23.9

  2. that the timeout is 1m59s, and not 1m00s (which needs finding)

If so, then we know it is the linked issue.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 14, 2022

I turned off/on my network half way, and it seemed to be ~ 2 minutes (with a granularity of 10 seconds)

After that you get the update in the log ("Stop pulling image", but with progress still ongoing) and:

FATA[0119] pulling image: rpc error: code = Unknown desc = context deadline exceeded

You still get the abort (after 1 minute) with 1.23.9 too, but only when the network is completely off.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 21, 2022

Fixed: Mirantis/cri-dockerd@460da8e

Should be in cri-dockerd 0.2.6

@afbjorklund
Copy link
Collaborator

Workaround, meanwhile:

minikube ssh docker pull

@henryrior
Copy link

Hey all,

First off, this thread is a lifesaver, my team has been trying to figure out this same issue for quite a bit now.

We are pulling a very large image as well (~15 GB) and are reliant for-better-or-worse on minikube for local development right now. I saw that your pull request to bump up cri-dockerd to 0.2.5 was still open/pending, so I tried installing cri-dockerd v0.2.5 from source inside the minikube VM according to the readme instructions. Unfortunately, while the installation ostensibly went fine, the timeout is still happening when I try to pull the image.

I have two questions:

  1. Have you confirmed on your local environment that bumping up the cri-dockerd version fixes the timeout?
    if yes, then
  2. Am I missing some step in configuring the updated cri-dockerd service inside minikube?

Thanks for the help!

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 30, 2022

@henryrior
You need to patch cri-dockerd, there is no release containing the patch yet (but it should be in 0.2.6)
https://github.com/Mirantis/cri-dockerd/commit/460da8ef84e7d2781ee275907543764b6a77c5ff.patch


For minikube, updating the cri-dockerd commit and rebuilding should do it.
Probably to master: 22b25c4458c16ec3e1d707c5ee88e45059b83f6b

It will still be broken, unless explicitly enabling CNI plugin.
minikube start --enable-default-cni (or somesuch)

@henryrior
Copy link

I actually was building from latest master, so the changes should in theory be in. I will try out that flag after rebuilding again and see how that goes. Thanks again

@henryrior
Copy link

Turns out using a different container run time, as Joss talked about earlier, fixed our problem. ex: minikube start --container-runtime=containerd. The Images are pulling and buildling just fine now. Wouldn't have been able to figure that out without the debugging help here, so thanks to both of you!

@afbjorklund
Copy link
Collaborator

afbjorklund commented Aug 31, 2022

It was working with docker in kubernetes -1.23 (dockershim), and it will be working again in 1.24+ with cri-dockerd 0.2.6

@Phelan164
Copy link

@afbjorklund I used Minikube version on Mac M1

minikube version: v1.27.1
commit: fe869b5d4da11ba318eb84a3ac00f336411de7ba

Start with minikube start --cni=bridge
But I still got a timeout error with the below docker image (~12GB)
image
Do you have any suggestions for that?

@afbjorklund
Copy link
Collaborator

It is waiting on this PR:

@SaudxInu
Copy link

Any updates on this?

@afbjorklund
Copy link
Collaborator

afbjorklund commented Dec 12, 2022

Not really, fixed in cri-dockerd 0.2.6

A workaround would be to upgrade it

@afbjorklund
Copy link
Collaborator

Fixed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/runtime/docker Issues specific to a docker runtime kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants