Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests hang when pulling from github.com #471

Closed
Hamxter opened this issue Jan 1, 2024 · 5 comments
Closed

Requests hang when pulling from github.com #471

Hamxter opened this issue Jan 1, 2024 · 5 comments

Comments

@Hamxter
Copy link

Hamxter commented Jan 1, 2024

I've encountered a unique problem using DinD that I haven't been able to find a solution. I'm running DinD inside of a microk8s cluster to execute devops pipelines. The problem is that the containers running inside DinD cannot pull ANY content from github.com (and only github.com as far as I can tell) and just hangs after resolving the DNS and connecting.

Here is a sample of a request from a container not working inside DinD. Note, this is not isolated to the tooling or the repository, I've tried cURL and Node to make the request and cannot even get a response from wget github.com. I've also tried multiple different containers.

/ # docker run -it node:20 sh
# wget https://github.com/helmfile/helmfile/releases/download/v0.158.1/helmfile_0.158.1_linux_amd64.tar.gz
--2023-12-31 23:46:29--  https://github.com/helmfile/helmfile/releases/download/v0.158.1/helmfile_0.158.1_linux_amd64.tar.gz
Resolving github.com (github.com)... 20.248.137.48
Connecting to github.com (github.com)|20.248.137.48|:443... connected.

It just hangs after this point.

However, if I just run the request after exec'ing into DinD (not inside a container running in it) it works fine.

/ # wget https://github.com/helmfile/helmfile/releases/download/v0.158.1/helmfil
e_0.158.1_linux_amd64.tar.gz
Connecting to github.com (20.248.137.48:443)
Connecting to objects.githubusercontent.com (185.199.109.133:443)
saving to 'helmfile_0.158.1_linux_amd64.tar.gz'
helmfile_0.158.1_lin 100% |********************************| 20.3M  0:00:00 ETA
'helmfile_0.158.1_linux_amd64.tar.gz' saved

My DinD deployment is simple

image:
    repository: docker
    tag: 24-dind
    pullPolicy: IfNotPresent
  env:
    DOCKER_TLS_CERTDIR: /certs
  securityContext:
    privileged: true

Here are my nodes

NAME      STATUS                     ROLES    AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
rachel    Ready                      <none>   415d   v1.28.3   192.168.1.9    <none>        Ubuntu 22.04.3 LTS   5.15.0-91-generic   containerd://1.6.15
roy       Ready                      <none>   415d   v1.28.3   192.168.1.10   <none>        Ubuntu 22.04.3 LTS   5.15.0-91-generic   containerd://1.6.15
deckard   Ready,SchedulingDisabled   <none>   415d   v1.28.3   192.168.1.8    <none>        Ubuntu 22.04.3 LTS   5.15.0-91-generic   containerd://1.6.15

I have tried multiple different versions of DinD and couldn't get it to work. I tried replicating this in docker on my desktop (docker -> dind -> node:20) and it worked fine. Not sure what else to do here so any help would be greatly appreciated. Thanks

@tianon
Copy link
Member

tianon commented Jan 4, 2024

Hmm, is this a problem you only started seeing recently? It's really unlikely, but could possibly be related to #466 / #467 / #468 (unlikely because you're on Ubuntu 22.04, which shouldn't have issues with either iptables or nftables 🙈)

@Hamxter
Copy link
Author

Hamxter commented Jan 8, 2024

This is the first time I have deployed DinD so I don't have any previous data. I believe I have found the reason for the issue but am unsure how to fix it. I used ksniff to record the packet data of the DinD container during a request.

Wireshark_T35pDSNZLp

I think the main thing to look at is the duplicate ClientHello requests, one coming from the node20 container within DinD (which is expected), but also one coming from the DinD container itself (Identified with the kubernetes container IP). I did this request for other sites and can confirm that ALL activity is duplicated (this ranges from dns lookups to application data packets, there is always 2).

There is another problem in that the protocol used in this ClientHello request is TLSv1. All request I did for other sites used TLSv1.3 coming from the same container.

The only conclusion that I can come to is that GitHub is ignoring the ClientHello. This is either when duplicate requests are performed within such a short period of time, or when using TLSv1.

I believe the issue that I'm primarily trying to tackle here is the duplicate requests. I'm unsure where to go from here to try and debug this problem, so any help would be greatly appreciated.

Finally, here is a request that is similar to github content request, but from a different site where it performs successfully. Note, all of the TCP dupes.

Wireshark_p8u8s4VqAj

@tianon
Copy link
Member

tianon commented Jan 24, 2024

Now that #468 is merged and deployed, can you try again? (if it still doesn't work, try with DOCKER_IPTABLES_LEGACY=1 set 👀)

@Hamxter
Copy link
Author

Hamxter commented Jan 26, 2024

This did not fix the issue. It would be interesting if someone else could run ksniff on their DinD container to see if there are duplicated network calls like on my cluster. It might be a lower level issue (I'm running microk8s)

@Hamxter
Copy link
Author

Hamxter commented Jan 27, 2024

I figured out the problem. I use Project Calico as my CNI and it uses an MTU of 1440, DinD has a default MTU of 1500. This was discussed here projectcalico/calico#2334. If anyone comes across this in the future to fix this I just added args: ["--mtu=1440"] to the DinD deployment.

@Hamxter Hamxter closed this as completed Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants