-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calico 3.3.2 with k8s 1.12.3 docker dind network issues #2334
Comments
Same problem on kubernetes 1.13.0 with Calico 3.3.2 and datastore type is kubernetes. |
Same problem :( Kubernetes - 1.13.0 |
This is weird, I'm not aware of anything that would have had such an impact. Did anything else change in the cluster at the same time? e.g. versions of docker in use? It might be useful to monitor the output of |
I upgraded our k8s cluster from 1.11.2 -> 1.12.3 and docker from 17.03.1 -> 18.06.1 Then upgraded calico from 3.1.4 -> 3.3.2 which broke the networking. Downgrading to 3.1.4 fixed it for me again. So for now I am running on calico 3.1.4 as that works. It is a production cluster so I fixed the networking problem to get it working again so I do not have a cluster in the broken state atm. I will try to get a time slot where I can redo the test and have a look at the |
TL;DR
I spend a little time to get a setup that is easier to work with than creating a gitlab ci/cd pipeline. Deploy this deployment.
Then exec into the bash container with
Try to build the container.
If you try to get the file from the bash container it works fine..
From the docker:dind container it is not.
If I run tcpdump from the bash container this is the output. Seems that at some point return packages are dropped in the stack..
the same pattern is on the host.
iptables-save -c output does not seem to indicate a drop rule getting hit.
I ran tcpdump on the host towards the endpoint which gives me some ICMP unreachable - need to frag (mtu 1440) errors.
Which lead me to look for mtu changes in my deployment. It seems that the mtu default has changed from 1500 -> 1440 which is a problem when we use docker:dind as that creates interfaces with mtu of 1500 that is why our docker builds started to break. |
Awesome sleuthing, and thanks for reporting back! It's odd though, since I'm not aware of the MTU changing in Calico v3.3. I wonder if that got adjusted somehow unintentionally, or if some other bit of configuration has changed? Here's a good place to start for how to adjust your settings: https://docs.projectcalico.org/v3.3/usage/configuration/mtu |
It does appear that the CNI MTU configuration went from 1500 -> 1440 in the Calico manifests between v3.1 and v3.2. IIUC, the 1500 value in v3.1 was actually incorrect since the IPIP tunnel MTU was only configured to be 1440 in the older manifests, which is why it was changed. I'm sorry this wasn't spotted earlier, but I think we can close this now since all manifests are using the 1440 value consistently now, and it can be easily modified in |
@knfoo You saved my day. I had exactly the same issue with Gitlab and its runner deployed on k8s with Gitlab Charts. I'm running docker-in-docker (dind) runner, and I was experiencing just the same networking issue. Indeed, after running an Passing a I got the same issue on Weave and Calico. Thanks a lot! |
Expected Behavior
I have been using calico in our k8s build cluster where we run gitlab runners and jenkins nodes.
We are using gitlab ci/cd with docker:dind to build docker images securely in our cluster.
We have been using calico 3.1.4 in the cluster and that works as expected, and our images get build and pushed to our registry.
Current Behavior
After upgrading to calico 3.3.2 in our k8s cluster our builds started to fail. We where unable to build our images with docker:dind
It seems to be a strange behavior.
We are able to ping a websight form with in the docker:dind container.:
However we are not able to curl a https site:
It just hangs.
Steps to Reproduce (for bugs)
Install a gitlab runner in a k8s cluster with calico 3.3.2
Registrar the runner with gitlab
Create a simple project and a pipeline:
Docker file:
Create a pipeline file .gitlab-ci.yml:
This will fail to build.
Context
I wanted to upgrade in order to get better network performance: #2073
Your Environment
The text was updated successfully, but these errors were encountered: