Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker cannot pull images #137

Closed
johannmayer opened this issue Jan 20, 2022 · 38 comments
Closed

Docker cannot pull images #137

johannmayer opened this issue Jan 20, 2022 · 38 comments
Milestone

Comments

@johannmayer
Copy link

johannmayer commented Jan 20, 2022

Hi,

i just installed colima on a MacBook Pro wit BigSur 11.6.2

colima version 0.3.2
git commit: 272db4732b90390232ed9bdba955877f46a50552

runtime: docker
arch: x86_64
client: v20.10.12
server: v20.10.11

When i want to pull in docker, I get an i/o timeout error. It seems that the colima system doesn't have internet connection.

docker pull maven Using default tag: latest Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:56157->192.168.5.3:53: i/o timeout

Are there any post-install steps to get a connection?

@abiosoft
Copy link
Owner

are you behind a VPN connection?

@johannmayer
Copy link
Author

Yes, i am behind a corporate VPN connection.

@spkane
Copy link

spkane commented Jan 21, 2022

I am not on a VPN or using docker with colima, but I see a similar issue:

I get a DNS related error on my first build with nerdctl via containerd after I have started the alpine VM.
Simply re-running the command fixes things until I restart the VM.

  • First Try:
$ nerdctl build --namespace k8s.io --platform linux/amd64 -t test/test:local -f ./Dockerfile .
[+] Building 0.2s (4/4) FINISHED
...
error: failed to solve: alpine:latest: failed to do request: Head "https://registry-1.docker.io/v2/library/alpine/manifests/latest": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:45220->[::1]:53: read: connection refused
FATA[0000] unrecognized image format
FATA[0000] exit status 1

Second Try:

$ nerdctl build --namespace k8s.io --platform linux/amd64 -t test/test:local -f ./Dockerfile .
[+] Building 0.2s (4/4) FINISHED
...
[+] Building 9.7s (7/17)
 => [internal] load build definition from Dockerfile                                  0.1s
 => => transferring dockerfile: 580B                                                  0.1s
 => [internal] load .dockerignore                                                     0.1s
 => => transferring context: 306B                                                     0.1s
 => [internal] load metadata for docker.io/library/alpine:latest                      0.4s
 => [internal] load metadata for docker.io/library/golang:1.17
...

@cschmatzler
Copy link

I am running into the same error, without any VPN connection.

❯ colima version
colima version 0.3.2
git commit: 272db4732b90390232ed9bdba955877f46a50552

runtime: docker
arch: aarch64
client: v20.10.10
server: v20.10.11

@starvsion
Copy link

starvsion commented Jan 24, 2022

I resolved it by doing colima start --port-interface 127.0.0.1

Correction: colima start --port-interface 127.0.0.1 -s

but it fails after pulling in more data

@niroowns
Copy link

For those of us behind a VPN, how do I configure docker to use a proxy?

@spkane
Copy link

spkane commented Jan 26, 2022

This is a good overview of DNS issues in Alpine and might be at the core of some of these DNS issues:

https://support.cloudbees.com/hc/en-us/articles/360040999471-UnknownHostException-caused-by-DNS-Resolution-issue-with-Alpine-Images

Their main fix was to migrate to RedHat's Universal Base Images (UBI) - https://developers.redhat.com/products/rhel/ubi

There is a workaround as well, that I will try when I have a bit of time to test it.

@pensatocriminale
Copy link

I am seeing this issue now too, after it had been working for me initially, e.g. -

% docker pull lscr.io/linuxserver-labs/daedalos
Using default tag: latest
Error response from daemon: Get "https://ghcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

and testing on multiple networks.

@yoedusvany
Copy link

yoedusvany commented Feb 1, 2022

Same here
docker pull hello-world
Using default tag: latest
error during connect: Post "http://%2FUsers%2Fxxxxxx%2F.colima%2Fdocker.sock/v1.41/images/create?fromImage=hello-word&tag=latest": EOF

@AlexLombry
Copy link

AlexLombry commented Feb 2, 2022

Hello, I have this error too : Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:33676->192.168.5.3:53: i/o timeout
Sometimes it's a timeout, sometimes another error.

I to install it on a macOS without VPN whatsoever, I don't understand the issue. I've also tested multiple configuration like Rancher desktop, minikube + hyperkit, podman etc and I have this issue only with Colima.

Someone found a solution about that ?

For instance if I run docker run hello-word it's working for almost 30 secondes after the start of colima.
And then it crashes and I finally get this error.
After that the error happen every times

@wolf31o2
Copy link

It's Alpine. The musl DNS resolver is pretty terrible. It behaves differently from glibc in many ways.

@abiosoft
Copy link
Owner

It's Alpine. The musl DNS resolver is pretty terrible. It behaves differently from glibc in many ways.

I am just realising this

@spkane
Copy link

spkane commented Feb 17, 2022

@pedantic79
Copy link

I've been experiencing DNS failures randomly too. Especially, when having many queries in quick succession. Would having a caching dns server sit between the qemu dns and the containers help? I may try to set one up manually to see if it helps the situation.

@jandubois
Copy link

I'm not convinced the differences between glibc and musl are the root cause here; unless colima does something different, there should be only a single nameserver in /etc/resolv.conf, and it should point to the lima internal host resolver.

I found one bug with this very recently: we disable IPv6 lookups in Lima by default because they often end up not working. The issue was though that instead of responding with an empty response, we handed the request to the resolver on the host, which might then add some random error for the IPv6 query to our response.

In my specific test case, I got the right DNS information when I looked with nslookup or dig, but curl could not connect. So I guess the musl resolver could share some blame, but the main blame belongs on our own DNS implementation (at least for this particular case).

This should be fixed in the forthcoming lima 0.8.3 release. So I would appreciate if you could all re-test with that version (once released), and report back if this improved/fixed the situation!

@abiosoft
Copy link
Owner

I'm not convinced the differences between glibc and musl are the root cause here; unless colima does something different, there should be only a single nameserver in /etc/resolv.conf, and it should point to the lima internal host resolver.

This is the case in Colima as well, and the single nameserver is 192.168.5.3.

This should be fixed in the forthcoming lima 0.8.3 release. So I would appreciate if you could all re-test with that version (once released), and report back if this improved/fixed the situation!

Looking forward to it. Thanks.

@navels
Copy link

navels commented Feb 19, 2022

New colima user here, running into this right off the bat. lima version is 0.8.3, colima 0.3.3. This workaround fixed it for me: #140 (comment)

@pedantic79
Copy link

I'm not convinced the differences between glibc and musl are the root cause here; unless colima does something different, there should be only a single nameserver in /etc/resolv.conf, and it should point to the lima internal host resolver.

This is the case in Colima as well, and the single nameserver is 192.168.5.3.

This should be fixed in the forthcoming lima 0.8.3 release. So I would appreciate if you could all re-test with that version (once released), and report back if this improved/fixed the situation!

Looking forward to it. Thanks.

@abiosoft Do we need to wait for a colima release for this? Running colima 0.3.3, and lima 0.8.3.

I experience this error:

Unable to connect to the server: dial tcp: lookup private.hostname.from.internal.company.com on 192.168.5.3:53: read udp 172.17.0.2:34738->192.168.5.3:53: i/o timeout

When I go into the VM:

dnn@overwatch ~ » colima ssh
colima:/Users/dnn$ nslookup private.hostname.from.internal.company.com
;; connection timed out; no servers could be reached

This happens because I'm running a script that is doing the same lookup over and over again very quickly. If I stop for a few minutes and try again, the DNS lookup is okay.

@abiosoft
Copy link
Owner

@pedantic79 a lima upgrade should be all that is required.

For troubleshooting purposes, can you kindly try this #140 (comment) and see if the behaviour is different? Note that it requires recreating the VM to see the effect i.e. colima delete (if exits) prior to starting.

@rahul286
Copy link

rahul286 commented Feb 23, 2022

I also faced the same issue but its resolved by specifying DNS resolver

colima start --dns 1.1.1.1

@pedantic79
Copy link

@abiosoft Yes that seems to fix things. I ended up using 192.168.5.2, the host, since work runs a dns proxy on my laptop. This way I can resolve private addresses not on the public DNS.

@abiosoft
Copy link
Owner

Can anyone try the lastest development version and see if anything changes?

brew install --HEAD colima

@navels
Copy link

navels commented Mar 21, 2022

Nope. A reasonable test for me is to download a large-ish (~1.5 GB) image:

docker image rm localstack/localstack
docker pull localstack/localstack:latest

which will get part of the way through and then stall:

Using default tag: latest
latest: Pulling from localstack/localstack
69bf0018a85c: Pull complete
d99d2ad45cad: Pull complete
2f5e7e852b75: Pull complete
9bdba4da0515: Pull complete
6d148a48367a: Pull complete
4f136f6bab8f: Pull complete
abd3b9714a4d: Pull complete
50eebec84093: Pull complete
a7f30185d16d: Pull complete
a0e7ef63792a: Pull complete
6e070eb76685: Pull complete
6fb969c1cc11: Pull complete
6b72ad47a399: Pull complete
5a968b0e80e9: Pull complete
4f4fb700ef54: Pull complete
f7deb66a5a33: Pull complete
318d55565698: Pull complete
565ac449cbaa: Pull complete
973b9108c62f: Pull complete
abe7f386e549: Pull complete
6af74865c5fb: Pull complete
b4ff06af1df8: Pull complete
b93bdfca7413: Pull complete
6e0f2f6fe87b: Pull complete
348542de0a59: Pull complete
338328b1acd7: Pull complete
343ae7575c43: Retrying in 1 second
ecaf8f60df9e: Retrying in 1 second
c01474015845: Retrying in 1 second
31c659c48f0f: Waiting
b146a65269aa: Waiting
b19b566fb94a: Waiting

and subsequent attempts:

Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.5.3:53: read udp 192.168.5.15:56456->192.168.5.3:53: i/o timeout

making me wonder if I am getting throttled or running out of sockets or something.

Using docker desktop this pull is a breeze.

@abiosoft
Copy link
Owner

@navels l'd be interested in knowing if there are any specifics to your network connection as I am struggling to reproduce this.
I do get Retrying in x secs once in a while but the retries are successful and it never gets bad enough for the image pulling to terminate.

Can you kindly share the output of colima version ?

Thanks.

@DannyAtDejero
Copy link

@abiosoft I'm seeing the same timeout and lookup failure as @navels, only in my case it was triggered by pushing a number of images in quick succession instead of pulling a single large one. I've confirmed that docker pull localstack/localstack:latest often fails with endless retry messages for me as well.

% colima version
colima version HEAD-5e2e413
git commit: 5e2e41310e595553dcdc29ba45827d4030af37bb

Other details that might be helpful:

  • Restarting the colima VM with colima stop; colima start resolves the issue temporarily, allowing name lookups to complete again until another large push/pull
  • I noticed this issue against an azure container registry initially, and assumed they were rate limiting. Now that I see it happens with docker.io too, that seems less likely.

Ping output from within the VM used to be very strange with a constantly increasing round trip and DUP packets, but that appears to be fixed in this latest version. 👍

@navels
Copy link

navels commented Mar 21, 2022

> colima version
colima version HEAD-5e2e413
git commit: 5e2e41310e595553dcdc29ba45827d4030af37bb

runtime: docker
arch: aarch64
client: v20.10.13
server: v20.10.11

I have this problem at home and at work, on and off VPN. This is on an M1 Mac Pro. Network speeds are about the same at both locations: ~300 Mbps.

Aha . . . I just tried a few different configurations and it seems to happen with more CPUs. With 1-2 CPUs I didn't have any issues. With 3 I do. My normal configuration is 8 CPUs.

Double-checked my docker desktop config: 8 CPUs.

@jasoncodes
Copy link

I’ve ran into these DNS issues too and I’ve found changing my DNS to use the gateway of the VDE network works well for me. If you want to see if this workaround will work for you too, try running the following before your test:

colima ssh -- sudo sh -c 'echo nameserver 192.168.106.1 > /etc/resolv.conf'

This temporary patch can be reverted by restarting colima or running the above again with 192.168.5.3. I have the following in ~/.lima/_config/override.yaml to make this change persistent:

useHostResolver: false
dns:
  - 192.168.106.1

@navels
Copy link

navels commented Mar 21, 2022

Yep, yep, there are workarounds, just trying to help @abiosoft troubleshoot.

@spkane
Copy link

spkane commented Mar 21, 2022

I am also still seeing issues with the use case that I reported in #137 (comment)

The first time I run something like:

nerdctl build --namespace k8s.io --platform linux/amd64 -t test/test:local -f ./Dockerfile . it fails with:

After another one or two tries (so likely after some short amount of time from the first attempt) it works and then continues to work.

@abiosoft
Copy link
Owner

@spkane can you try the last development version brew install --head colima and see if that improves anything?

@abiosoft
Copy link
Owner

@navels you likely weren't running colima with vde networking enabled as the fix for m1 devices just got pushed.
Can you try installing again brew install --HEAD colima and get rid of /opt/colima with sudo rm -rf /opt/colima.

Does that change anything?

@navels
Copy link

navels commented Mar 24, 2022

Unfortunately no change, fails with 3 CPUs.

colima version HEAD-3fc20b2

@abiosoft
Copy link
Owner

@navels are you able to see the IP address in the output of colima ls?

@navels
Copy link

navels commented Mar 24, 2022

Yep: 192.168.106.2

@ramunasd
Copy link

@abiosoft The latest HEAD has much more stable network on apple M1 CPU, with 4 cores enabled, although wrong DNS issue is still present.

colima version HEAD-37a6de0
git commit: 37a6de0ef4fe631c7b34e69697c5234a9cdd5541

runtime: docker
arch: aarch64
client: v20.10.14
server: v20.10.11

@cognifloyd
Copy link

cognifloyd commented Apr 20, 2022

Does anyone have Cisco AnyConnect installed?

I have an intel mac that I just upgraded from Catalina to Monterey.
Since the upgrade, I've been experiencing various network timeouts, but the dns issues in colima were the most pronounced as they blocked my use of docker pull. Outside of Colima, git was often hanging as well, so I didn't think it was a uniquely colima issue, so I kept looking after I found this issue.

I have Cisco AnyConnect installed which I occasionally use to connect to a VPN. After the Monterey update, "Cisco AnyConnect Socket Filter" showed up and asked for permission to run a new SystemExtension. I allowed it at that point, but I think that was the culprit behind all my network issues.
Here are some other issues people experienced with it: https://apple.stackexchange.com/questions/420773/the-process-com-cisco-anyconnect-macos-acsockext-hogs-mac-cpu-but-cannot-be-kill

This service is suspicious (to me) because its "features" are (based on the docs):

  • DNS proxy (aka: screw up the DNS by doing MITM crap)
  • App/Transparent proxy
  • Content filter

So, I just deleted Cisco AnyConnect Socket Filter (deleted it from the Applications) which removed the SystemExtension.
And, I stopped its annoying "notification" service from pestering me about it on reboot.

$ launchctl blame cisco
// this prints a list the services. You want the gui/...cisco.anyconnect.notification... one.
$ launchctl disable gui/<number>/application.com.cisco.anyconnect.notification.<number>.<number>
$ launchctl stop gui/<number>/application.com.cisco.anyconnect.notification.<number>.<number>
$ launchctl kill 9 gui/<number>/application.com.cisco.anyconnect.notification.<number>.<number>

After doing all of that (and another reboot), dns works in colima again!

@navels
Copy link

navels commented Sep 28, 2023

I stopped using colima a while ago but just tried this again and am not getting the errors, so either fixed in colima or the Mac networking stack (Sonoma on an M1 Pro).

@abiosoft abiosoft added this to the v0.6.0 milestone Nov 12, 2023
jesse-c pushed a commit to SeldonIO/MLServer that referenced this issue May 30, 2024
* build: Lock GitHub runners' OS

This was motivated by our macOS jobs failing [2] because
colima is missing. It looks like this is because the
latest versions of the macOS runner no longer have
colima installed by default [1].

colima is now explicitly installed.

[1] actions/runner-images#6216
[2] `/Users/runner/work/_temp/f19ffbff-27a9-4fc7-80b6-97791d2de141.sh: line 9: colima: command not found`

* build: Lock Colima

* build: Move macOS Docker installation to script

* build: Move macOS libomp activation to script

* build: Use latest Colima

The > 0.6.0 releases actually fix the issue we have linked [1][2][3].

[1] abiosoft/colima#577
[2] https://github.com/jesse-c/MLServer/blob/c3acd60995a72141027eff506e4fd330fe824179/hack/install-docker-macos.sh#L18-L20
[3] > Switch to new user-v2 network. Fixes abiosoft/colima#648, abiosoft/colima#603, abiosoft/colima#577, abiosoft/colima#779, abiosoft/colima#137, abiosoft/colima#740.
@huybuidev
Copy link

@abiosoft I'm seeing the same timeout and lookup failure as @navels, only in my case it was triggered by pushing a number of images in quick succession instead of pulling a single large one. I've confirmed that docker pull localstack/localstack:latest often fails with endless retry messages for me as well.

% colima version
colima version HEAD-5e2e413
git commit: 5e2e41310e595553dcdc29ba45827d4030af37bb

Other details that might be helpful:

  • Restarting the colima VM with colima stop; colima start resolves the issue temporarily, allowing name lookups to complete again until another large push/pull
  • I noticed this issue against an azure container registry initially, and assumed they were rate limiting. Now that I see it happens with docker.io too, that seems less likely.

Ping output from within the VM used to be very strange with a constantly increasing round trip and DUP packets, but that appears to be fixed in this latest version. 👍

colima stop; colima start works for me after searching a while for the solution. Thank you very much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests