Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico 3.3.2 with k8s 1.12.3 docker dind network issues #2334

Closed
knfoo opened this issue Dec 6, 2018 · 8 comments
Closed

calico 3.3.2 with k8s 1.12.3 docker dind network issues #2334

knfoo opened this issue Dec 6, 2018 · 8 comments

Comments

@knfoo
Copy link

knfoo commented Dec 6, 2018

Expected Behavior

I have been using calico in our k8s build cluster where we run gitlab runners and jenkins nodes.
We are using gitlab ci/cd with docker:dind to build docker images securely in our cluster.
We have been using calico 3.1.4 in the cluster and that works as expected, and our images get build and pushed to our registry.

Current Behavior

After upgrading to calico 3.3.2 in our k8s cluster our builds started to fail. We where unable to build our images with docker:dind
It seems to be a strange behavior.
We are able to ping a websight form with in the docker:dind container.:

sh-4.2$ ping rubygems.org
PING rubygems.org (151.101.192.70) 56(84) bytes of data.
64 bytes from 151.101.192.70 (151.101.192.70): icmp_seq=1 ttl=58 time=13.8 ms
64 bytes from 151.101.192.70 (151.101.192.70): icmp_seq=2 ttl=58 time=13.7 ms
^C
--- rubygems.org ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 13.732/13.782/13.833/0.127 ms

However we are not able to curl a https site:

sh-4.2$ curl https://rubygems.org/
^C

It just hangs.

Steps to Reproduce (for bugs)

Install a gitlab runner in a k8s cluster with calico 3.3.2
Registrar the runner with gitlab
Create a simple project and a pipeline:
Docker file:

FROM docker.elastic.co/logstash/logstash-oss:6.4.1
RUN logstash-plugin install logstash-output-datadog_logs

Create a pipeline file .gitlab-ci.yml:

image: docker:latest
services:
  - docker:dind

variables:
  DOCKER_HOST: tcp://localhost:2375
  DOCKER_DRIVER: overlay2

stages:
  - build

build-master:
  stage: build
  script:
    - docker build --pull -t ${CONTAINER_TEST_IMAGE} .
    - docker push ${CONTAINER_TEST_IMAGE}

This will fail to build.

Context

I wanted to upgrade in order to get better network performance: #2073

Your Environment

  • Calico version 3.3.2
  • Orchestrator kubernetes 1.12.3
  • Operating System and version: Debian 9 kernel 4.18.0-0.bpo.1-amd64
@mpyatishev
Copy link

mpyatishev commented Dec 7, 2018

Same problem on kubernetes 1.13.0 with Calico 3.3.2 and datastore type is kubernetes.

@hellfosa
Copy link

hellfosa commented Dec 7, 2018

Same problem :(

Kubernetes - 1.13.0
Calico 3.3.2
Debian 9 Stretch with 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64
Iptables 1.6.0

@caseydavenport
Copy link
Member

This is weird, I'm not aware of anything that would have had such an impact.

Did anything else change in the cluster at the same time? e.g. versions of docker in use?

It might be useful to monitor the output of iptables save -c on the node with the pod which has this issue, after generating some traffic, to see if iptables is dropping the traffic for some reason.

@knfoo
Copy link
Author

knfoo commented Dec 8, 2018

I upgraded our k8s cluster from 1.11.2 -> 1.12.3 and docker from 17.03.1 -> 18.06.1

Then upgraded calico from 3.1.4 -> 3.3.2 which broke the networking. Downgrading to 3.1.4 fixed it for me again. So for now I am running on calico 3.1.4 as that works.

It is a production cluster so I fixed the networking problem to get it working again so I do not have a cluster in the broken state atm.

I will try to get a time slot where I can redo the test and have a look at the iptables save -c you suggest.

@knfoo
Copy link
Author

knfoo commented Dec 11, 2018

TL;DR
The defautl MTU in calico changed from 1500 -> 1440 which breaks docker:dind as that sets up interfaces with a MTU of 1500 which will give you errors like on your host system.

12:50:42.301954 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8265216 ecr 454081341], length 1448 
12:50:42.301978 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556 

I spend a little time to get a setup that is easier to work with than creating a gitlab ci/cd pipeline.

Deploy this deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: bash
  name: docker-build
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      run: bash
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        run: bash
    spec:
      containers:
      - image: alpine:latest
        imagePullPolicy: IfNotPresent
        name: bash
        command:
        - /bin/sh
        env:
        - name:  DOCKER_HOST
          value: tcp://localhost:2375
        - name: DOCKER_DRIVER
          value: overlay2
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
      - image: docker:dind
        name: docker
        securityContext:
          privileged: true

Then exec into the bash container with sh
Go into tmp
Create a docker file

FROM docker.elastic.co/logstash/logstash-oss:6.4.1
RUN logstash-plugin install logstash-output-datadog_logs

Try to build the container.

/tmp # docker build .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM docker.elastic.co/logstash/logstash-oss:6.4.1
 ---> 05bd87d1e727
Step 2/2 : RUN logstash-plugin install logstash-output-datadog_logs
 ---> Running in cd6bbca6a000
Validating logstash-output-datadog_logs
Unable to download data from https://rubygems.org - timed out (https://api.rubygems.org/latest_specs.4.8.gz)                                                  
ERROR: Installation aborted, verification failed for logstash-output-datadog_logs                                                                             
The command '/bin/sh -c logstash-plugin install logstash-output-datadog_logs' returned a non-zero code: 1

If you try to get the file from the bash container it works fine..

wget https://api.rubygems.org/latest_specs.4.8.gz
Connecting to api.rubygems.org (151.101.128.70:443)
latest_specs.4.8.gz  100% |**************************************************************************************************************|  1217k  0:00:00 ETA

From the docker:dind container it is not.

docker exec -it angry_mirzakhani sh
sh-4.2$ curl -v https://api.rubygems.org/latest_specs.4.8.gz -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to api.rubygems.org port 443 (#0)                          
*   Trying 151.101.192.70...
* Connected to api.rubygems.org (151.101.192.70) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none

If I run tcpdump from the bash container this is the output. Seems that at some point return packages are dropped in the stack..

11:36:23.263596 IP 172.17.0.2.33386 > 100.64.64.10.53: 42265+ A? artifacts.elastic.co.default.svc.k8s-c0-eu-central-1-dev.local. (80)                         
11:36:23.264028 IP 100.64.64.10.53 > 172.17.0.2.33386: 42265 NXDomain* 0/1/0 (221)                                                                            
11:36:23.264099 IP 172.17.0.2.59720 > 100.64.64.10.53: 45541+ A? artifacts.elastic.co.svc.k8s-c0-eu-central-1-dev.local. (72)                                 
11:36:23.264457 IP 100.64.64.10.53 > 172.17.0.2.59720: 45541 NXDomain* 0/1/0 (213)                                                                            
11:36:23.264527 IP 172.17.0.2.60200 > 100.64.64.10.53: 28827+ A? artifacts.elastic.co.k8s-c0-eu-central-1-dev.local. (68)                                     
11:36:23.264942 IP 100.64.64.10.53 > 172.17.0.2.60200: 28827 NXDomain* 0/1/0 (209)                                                                            
11:36:23.264998 IP 172.17.0.2.35977 > 100.64.64.10.53: 18371+ A? artifacts.elastic.co.eu-central-1.compute.internal. (68)                                     
11:36:23.266609 IP 100.64.64.10.53 > 172.17.0.2.35977: 18371 NXDomain 0/0/0 (68)                                                                              
11:36:23.266665 IP 172.17.0.2.42938 > 100.64.64.10.53: 36491+ A? artifacts.elastic.co. (38)                                                                   
11:36:23.269681 IP 100.64.64.10.53 > 172.17.0.2.42938: 36491 9/0/0 CNAME dualstack.download-colb-770446651.us-east-1.elb.amazonaws.com., A 184.72.242.47, A 184.73.245.233, A 23.21.67.46, A 107.21.127.184, A 107.21.202.15, A 107.21.237.95, A 107.21.237.188, A 107.21.239.197 (241)                                      
11:36:23.270703 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [S], seq 3393458478, win 29200, options [mss 1460,sackOK,TS val 3294615075 ecr 0,nop,wscale 7],
length 0
11:36:23.358255 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [S.], seq 1987946351, ack 3393458479, win 26847, options [mss 1460,sackOK,TS val 2050428363 ecr
3294615075,nop,wscale 8], length 0
11:36:23.358288 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 3294615162 ecr 2050428363], length 0              
11:36:23.456308 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [P.], seq 1:191, ack 1, win 229, options [nop,nop,TS val 3294615260 ecr 2050428363], length 190
11:36:23.543540 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [.], ack 191, win 110, options [nop,nop,TS val 2050428410 ecr 3294615260], length 0            
11:36:23.545080 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [P.], seq 1:2959, ack 191, win 110, options [nop,nop,TS val 2050428410 ecr 3294615260], length 2958
11:36:23.545111 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [.], ack 2959, win 275, options [nop,nop,TS val 3294615349 ecr 2050428410], length 0           
11:36:23.601947 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [P.], seq 191:266, ack 2959, win 275, options [nop,nop,TS val 3294615406 ecr 2050428410], length 75
11:36:23.602092 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [P.], seq 266:272, ack 2959, win 275, options [nop,nop,TS val 3294615406 ecr 2050428410], length 6
11:36:23.602224 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [P.], seq 272:357, ack 2959, win 275, options [nop,nop,TS val 3294615406 ecr 2050428410], length 85
11:36:23.689344 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [.], ack 272, win 110, options [nop,nop,TS val 2050428446 ecr 3294615406], length 0            
11:36:23.689421 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [P.], seq 2959:3050, ack 357, win 110, options [nop,nop,TS val 2050428446 ecr 3294615406], length 91
11:36:23.689443 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [.], ack 3050, win 275, options [nop,nop,TS val 3294615493 ecr 2050428446], length 0           
11:36:23.915018 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [P.], seq 357:586, ack 3050, win 275, options [nop,nop,TS val 3294615719 ecr 2050428446], length 229
11:36:24.038948 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [.], ack 586, win 114, options [nop,nop,TS val 2050428534 ecr 3294615719], length 0            
11:36:24.063171 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [P.], seq 3050:3279, ack 586, win 114, options [nop,nop,TS val 2050428540 ecr 3294615719], length 229
11:36:24.063195 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [.], ack 3279, win 297, options [nop,nop,TS val 3294615867 ecr 2050428540], length 0           
11:36:24.068926 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [P.], seq 586:655, ack 3279, win 297, options [nop,nop,TS val 3294615873 ecr 2050428540], length 69
11:36:24.069041 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [F.], seq 655, ack 3279, win 297, options [nop,nop,TS val 3294615873 ecr 2050428540], length 0 
11:36:24.135335 IP 172.17.0.2.43009 > 100.64.64.10.53: 41449+ SRV? _rubygems._tcp.rubygems.org.default.svc.k8s-c0-eu-central-1-dev.local. (87)                
11:36:24.135811 IP 100.64.64.10.53 > 172.17.0.2.43009: 41449 NXDomain* 0/1/0 (228)                                                                            
11:36:24.141784 IP 172.17.0.2.43009 > 100.64.64.10.53: 28051+ SRV? _rubygems._tcp.rubygems.org.svc.k8s-c0-eu-central-1-dev.local. (79)                        
11:36:24.142097 IP 100.64.64.10.53 > 172.17.0.2.43009: 28051 NXDomain* 0/1/0 (220)                                                                            
11:36:24.144866 IP 172.17.0.2.43009 > 100.64.64.10.53: 487+ SRV? _rubygems._tcp.rubygems.org.k8s-c0-eu-central-1-dev.local. (75)                              
11:36:24.145188 IP 100.64.64.10.53 > 172.17.0.2.43009: 487 NXDomain* 0/1/0 (216)                                                                              
11:36:24.148217 IP 172.17.0.2.43009 > 100.64.64.10.53: 32569+ SRV? _rubygems._tcp.rubygems.org.eu-central-1.compute.internal. (75)                            
11:36:24.149599 IP 100.64.64.10.53 > 172.17.0.2.43009: 32569 NXDomain 0/0/0 (75)                                                                              
11:36:24.151238 IP 172.17.0.2.43009 > 100.64.64.10.53: 3640+ SRV? _rubygems._tcp.rubygems.org. (45)                                                           
11:36:24.152876 IP 100.64.64.10.53 > 172.17.0.2.43009: 3640 1/0/0 SRV api.rubygems.org.:80 0 1 (108)                                                          
11:36:24.157417 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [.], ack 655, win 114, options [nop,nop,TS val 2050428563 ecr 3294615873], length 0            
11:36:24.157534 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [P.], seq 3279:3348, ack 655, win 114, options [nop,nop,TS val 2050428563 ecr 3294615873], length 69
11:36:24.157550 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [F.], seq 3348, ack 655, win 114, options [nop,nop,TS val 2050428563 ecr 3294615873], length 0 
11:36:24.157577 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [R], seq 3393459133, win 0, length 0                                                           
11:36:24.157583 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [R], seq 3393459133, win 0, length 0                                                           
11:36:24.157658 IP 184.72.242.47.443 > 172.17.0.2.42842: Flags [.], ack 656, win 114, options [nop,nop,TS val 2050428563 ecr 3294615873], length 0            
11:36:24.157675 IP 172.17.0.2.42842 > 184.72.242.47.443: Flags [R], seq 3393459134, win 0, length 0                                                           
11:36:24.615347 IP 172.17.0.2.34465 > 100.64.64.10.53: 33784+ A? api.rubygems.org.default.svc.k8s-c0-eu-central-1-dev.local. (76)                             
11:36:24.615817 IP 100.64.64.10.53 > 172.17.0.2.34465: 33784 NXDomain* 0/1/0 (217)                                                                            
11:36:24.615894 IP 172.17.0.2.43323 > 100.64.64.10.53: 8983+ A? api.rubygems.org.svc.k8s-c0-eu-central-1-dev.local. (68)                                      
11:36:24.616310 IP 100.64.64.10.53 > 172.17.0.2.43323: 8983 NXDomain* 0/1/0 (209)                                                                             
11:36:24.616362 IP 172.17.0.2.47376 > 100.64.64.10.53: 40811+ A? api.rubygems.org.k8s-c0-eu-central-1-dev.local. (64)                                         
11:36:24.616721 IP 100.64.64.10.53 > 172.17.0.2.47376: 40811 NXDomain* 0/1/0 (205)                                                                            
11:36:24.616771 IP 172.17.0.2.45391 > 100.64.64.10.53: 14229+ A? api.rubygems.org.eu-central-1.compute.internal. (64)                                         
11:36:24.618427 IP 100.64.64.10.53 > 172.17.0.2.45391: 14229 NXDomain 0/0/0 (64)                                                                              
11:36:24.618479 IP 172.17.0.2.51017 > 100.64.64.10.53: 54035+ A? api.rubygems.org. (34)                                                                       
11:36:24.620589 IP 100.64.64.10.53 > 172.17.0.2.51017: 54035 5/0/0 CNAME rubygems.org., A 151.101.192.70, A 151.101.0.70, A 151.101.64.70, A 151.101.128.70 (188)
11:36:24.620757 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [S], seq 1101682900, win 29200, options [mss 1460,sackOK,TS val 2863438672 ecr 0,nop,wscale 7], length 0
11:36:24.622286 IP 151.101.192.70.443 > 172.17.0.2.40672: Flags [S.], seq 160483392, ack 1101682901, win 28960, options [mss 1460,sackOK,TS val 1217704801 ecr
2863438672,nop,wscale 9], length 0
11:36:24.622309 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 2863438673 ecr 1217704801], length 0             
11:36:24.624749 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [P.], seq 1:187, ack 1, win 229, options [nop,nop,TS val 2863438676 ecr 1217704801], length 186
11:36:24.626068 IP 151.101.192.70.443 > 172.17.0.2.40672: Flags [.], ack 187, win 59, options [nop,nop,TS val 1217704802 ecr 2863438676], length 0            
11:36:24.639651 IP 151.101.192.70.443 > 172.17.0.2.40672: Flags [P.], seq 4345:5338, ack 187, win 59, options [nop,nop,TS val 1217704806 ecr 2863438676], length 993
11:36:24.639664 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [.], ack 1, win 244, options [nop,nop,TS val 2863438691 ecr 1217704802,nop,nop,sack 1 {4345:5338}], length 0
11:36:28.474507 ARP, Request who-has 172.17.0.2 tell 172.17.0.1, length 28
11:36:28.474538 ARP, Reply 172.17.0.2 is-at 02:42:ac:11:00:02, length 28
11:37:24.685994 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [P.], seq 187:194, ack 1, win 244, options [nop,nop,TS val 2863498737 ecr 1217704802,nop,nop,sack 1 {4345:5338}], length 7
11:37:24.686058 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [F.], seq 194, ack 1, win 244, options [nop,nop,TS val 2863498737 ecr 1217704802,nop,nop,sack 1 {4345:5338}], length 0
11:37:24.688723 IP 151.101.192.70.443 > 172.17.0.2.40672: Flags [F.], seq 5338, ack 195, win 59, options [nop,nop,TS val 1217719818 ecr 2863498737], length 0 
11:37:24.688749 IP 172.17.0.2.40672 > 151.101.192.70.443: Flags [R], seq 1101683095, win 0, length 0 

the same pattern is on the host.

tcpdump -nli cali3d6d83f88c6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali3d6d83f88c6, link-type EN10MB (Ethernet), capture size 262144 bytes                                                                          
12:39:11.936160 IP 192.168.123.38.52283 > 100.64.64.10.53: 26143+ AAAA? localhost.default.svc.k8s-c0-eu-central-1-dev.local. (69)                             
12:39:11.936333 IP 192.168.123.38.54554 > 100.64.64.10.53: 34931+ A? localhost.default.svc.k8s-c0-eu-central-1-dev.local. (69)                                
12:39:11.936692 IP 100.64.64.10.53 > 192.168.123.38.52283: 26143 NXDomain* 0/1/0 (210)                                                                        
12:39:11.936711 IP 100.64.64.10.53 > 192.168.123.38.54554: 34931 NXDomain* 0/1/0 (210)                                                                        
12:39:11.936929 IP 192.168.123.38.38242 > 100.64.64.10.53: 42145+ AAAA? localhost.svc.k8s-c0-eu-central-1-dev.local. (61)                                     
12:39:11.936959 IP 192.168.123.38.47108 > 100.64.64.10.53: 11823+ A? localhost.svc.k8s-c0-eu-central-1-dev.local. (61)                                        
12:39:11.937290 IP 100.64.64.10.53 > 192.168.123.38.47108: 11823 NXDomain* 0/1/0 (202)                                                                        
12:39:11.937980 IP 192.168.123.38.41811 > 100.64.64.10.53: 41479+ AAAA? localhost.eu-central-1.compute.internal. (57)                                         
12:39:11.938021 IP 192.168.123.38.37908 > 100.64.64.10.53: 35010+ A? localhost.eu-central-1.compute.internal. (57)                                            
12:39:11.941540 IP 100.64.64.10.53 > 192.168.123.38.37908: 35010 NXDomain 0/1/0 (176)                                                                         
12:39:11.952425 IP 100.64.64.10.53 > 192.168.123.38.41811: 41479 NXDomain 0/1/0 (176)                                                                         
12:39:11.952548 IP 192.168.123.38.33265 > 100.64.64.10.53: 1418+ AAAA? localhost. (27)                                                                        
12:39:11.952581 IP 192.168.123.38.37637 > 100.64.64.10.53: 63190+ A? localhost. (27)                                                                          
12:39:11.953115 IP 100.64.64.10.53 > 192.168.123.38.33265: 1418* 1/0/0 AAAA ::1 (64)                                                                          
12:39:11.953194 IP 100.64.64.10.53 > 192.168.123.38.37637: 63190* 1/0/0 A 127.0.0.1 (52)                                                                      
12:39:16.234808 IP 192.168.123.38.53239 > 100.64.64.10.53: 10496+ A? artifacts.elastic.co.default.svc.k8s-c0-eu-central-1-dev.local. (80)                     
12:39:16.235261 IP 100.64.64.10.53 > 192.168.123.38.53239: 10496 NXDomain* 0/1/0 (221)                                                                        
12:39:16.235386 IP 192.168.123.38.56839 > 100.64.64.10.53: 32049+ A? artifacts.elastic.co.svc.k8s-c0-eu-central-1-dev.local. (72)                             
12:39:16.235777 IP 100.64.64.10.53 > 192.168.123.38.56839: 32049 NXDomain* 0/1/0 (213)                                                                        
12:39:16.235859 IP 192.168.123.38.33147 > 100.64.64.10.53: 31294+ A? artifacts.elastic.co.k8s-c0-eu-central-1-dev.local. (68)                                 
12:39:16.236185 IP 100.64.64.10.53 > 192.168.123.38.33147: 31294 NXDomain* 0/1/0 (209)                                                                        
12:39:16.236277 IP 192.168.123.38.37543 > 100.64.64.10.53: 31304+ A? artifacts.elastic.co.eu-central-1.compute.internal. (68)                                 
12:39:16.237801 IP 100.64.64.10.53 > 192.168.123.38.37543: 31304 NXDomain 0/0/0 (68)                                                                          
12:39:16.237881 IP 192.168.123.38.52167 > 100.64.64.10.53: 24997+ A? artifacts.elastic.co. (38)                                                               
12:39:16.240905 IP 100.64.64.10.53 > 192.168.123.38.52167: 24997 9/0/0 CNAME dualstack.download-colb-770446651.us-east-1.elb.amazonaws.com., A 107.21.237.188,
A 107.21.239.197, A 184.72.242.47, A 184.73.245.233, A 23.21.67.46, A 107.21.127.184, A 107.21.202.15, A 107.21.237.95 (241)                                  
12:39:16.241930 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [S], seq 1474474511, win 29200, options [mss 1460,sackOK,TS val 2598465253 ecr 0,nop,wscale 7], length 0
12:39:16.330299 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [S.], seq 4228456722, ack 1474474512, win 26847, options [mss 1460,sackOK,TS val 618401623
ecr 2598465253,nop,wscale 8], length 0
12:39:16.330346 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 2598465341 ecr 618401623], length 0          
12:39:16.422107 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [P.], seq 1:191, ack 1, win 229, options [nop,nop,TS val 2598465433 ecr 618401623], length
190
12:39:16.509927 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [.], ack 191, win 110, options [nop,nop,TS val 618401668 ecr 2598465433], length 0        
12:39:16.599240 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [.], seq 1:1449, ack 191, win 110, options [nop,nop,TS val 618401690 ecr 2598465433], length 1448
12:39:16.599252 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [.], seq 1449:2897, ack 191, win 110, options [nop,nop,TS val 618401690 ecr 2598465433], length 1448
12:39:16.599322 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [.], ack 1449, win 251, options [nop,nop,TS val 2598465610 ecr 618401690], length 0       
12:39:16.599339 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [.], ack 2897, win 274, options [nop,nop,TS val 2598465610 ecr 618401690], length 0       
12:39:16.687451 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [P.], seq 2897:2959, ack 191, win 110, options [nop,nop,TS val 618401712 ecr 2598465610], length 62
12:39:16.687501 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [.], ack 2959, win 274, options [nop,nop,TS val 2598465699 ecr 618401712], length 0       
12:39:16.710687 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [P.], seq 191:266, ack 2959, win 274, options [nop,nop,TS val 2598465722 ecr 618401712], length 75
12:39:16.710852 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [P.], seq 266:272, ack 2959, win 274, options [nop,nop,TS val 2598465722 ecr 618401712], length 6
12:39:16.711021 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [P.], seq 272:357, ack 2959, win 274, options [nop,nop,TS val 2598465722 ecr 618401712], length 85
12:39:16.798637 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [.], ack 272, win 110, options [nop,nop,TS val 618401740 ecr 2598465722], length 0        
12:39:16.798818 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [P.], seq 2959:3050, ack 357, win 110, options [nop,nop,TS val 618401740 ecr 2598465722], length 91
12:39:16.842524 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [.], ack 3050, win 274, options [nop,nop,TS val 2598465854 ecr 618401740], length 0       
12:39:17.024291 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [P.], seq 357:586, ack 3050, win 274, options [nop,nop,TS val 2598466035 ecr 618401740], length 229
12:39:17.121643 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [P.], seq 3050:3279, ack 586, win 114, options [nop,nop,TS val 618401821 ecr 2598466035], length 229
12:39:17.121700 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [.], ack 3279, win 296, options [nop,nop,TS val 2598466133 ecr 618401821], length 0       
12:39:17.127005 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [P.], seq 586:655, ack 3279, win 296, options [nop,nop,TS val 2598466138 ecr 618401821], length 69
12:39:17.127139 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [F.], seq 655, ack 3279, win 296, options [nop,nop,TS val 2598466138 ecr 618401821], length 0
12:39:17.181568 IP 192.168.123.38.59714 > 100.64.64.10.53: 38079+ SRV? _rubygems._tcp.rubygems.org.default.svc.k8s-c0-eu-central-1-dev.local. (87)            
12:39:17.181994 IP 100.64.64.10.53 > 192.168.123.38.59714: 38079 NXDomain* 0/1/0 (228)                                                                        
12:39:17.188047 IP 192.168.123.38.59714 > 100.64.64.10.53: 33703+ SRV? _rubygems._tcp.rubygems.org.svc.k8s-c0-eu-central-1-dev.local. (79)                    
12:39:17.188370 IP 100.64.64.10.53 > 192.168.123.38.59714: 33703 NXDomain* 0/1/0 (220)                                                                        
12:39:17.191122 IP 192.168.123.38.59714 > 100.64.64.10.53: 61446+ SRV? _rubygems._tcp.rubygems.org.k8s-c0-eu-central-1-dev.local. (75)                        
12:39:17.191401 IP 100.64.64.10.53 > 192.168.123.38.59714: 61446 NXDomain* 0/1/0 (216)                                                                        
12:39:17.194423 IP 192.168.123.38.59714 > 100.64.64.10.53: 25088+ SRV? _rubygems._tcp.rubygems.org.eu-central-1.compute.internal. (75)                        
12:39:17.196292 IP 100.64.64.10.53 > 192.168.123.38.59714: 25088 NXDomain 0/0/0 (75)                                                                          
12:39:17.197919 IP 192.168.123.38.59714 > 100.64.64.10.53: 39279+ SRV? _rubygems._tcp.rubygems.org. (45)                                                      
12:39:17.199567 IP 100.64.64.10.53 > 192.168.123.38.59714: 39279 1/0/0 SRV api.rubygems.org.:80 0 1 (108)                                                     
12:39:17.214829 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [P.], seq 3279:3348, ack 655, win 114, options [nop,nop,TS val 618401844 ecr 2598466138], length 69
12:39:17.214838 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [F.], seq 3348, ack 655, win 114, options [nop,nop,TS val 618401844 ecr 2598466138], length 0
12:39:17.214894 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [R], seq 1474475166, win 0, length 0                                                      
12:39:17.214911 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [R], seq 1474475166, win 0, length 0                                                      
12:39:17.214931 IP 107.21.237.188.443 > 192.168.123.38.45240: Flags [.], ack 656, win 114, options [nop,nop,TS val 618401844 ecr 2598466138], length 0        
12:39:17.214962 IP 192.168.123.38.45240 > 107.21.237.188.443: Flags [R], seq 1474475167, win 0, length 0                                                      
12:39:17.545355 IP 192.168.123.38.55617 > 100.64.64.10.53: 27370+ A? api.rubygems.org.default.svc.k8s-c0-eu-central-1-dev.local. (76)                         
12:39:17.545751 IP 100.64.64.10.53 > 192.168.123.38.55617: 27370 NXDomain* 0/1/0 (217)                                                                        
12:39:17.545876 IP 192.168.123.38.50018 > 100.64.64.10.53: 56861+ A? api.rubygems.org.svc.k8s-c0-eu-central-1-dev.local. (68)                                 
12:39:17.546313 IP 100.64.64.10.53 > 192.168.123.38.50018: 56861 NXDomain* 0/1/0 (209)                                                                        
12:39:17.546386 IP 192.168.123.38.35051 > 100.64.64.10.53: 13029+ A? api.rubygems.org.k8s-c0-eu-central-1-dev.local. (64)                                     
12:39:17.546808 IP 100.64.64.10.53 > 192.168.123.38.35051: 13029 NXDomain* 0/1/0 (205)                                                                        
12:39:17.546883 IP 192.168.123.38.58071 > 100.64.64.10.53: 11196+ A? api.rubygems.org.eu-central-1.compute.internal. (64)                                     
12:39:17.548442 IP 100.64.64.10.53 > 192.168.123.38.58071: 11196 NXDomain 0/0/0 (64)                                                                          
12:39:17.548520 IP 192.168.123.38.57011 > 100.64.64.10.53: 38420+ A? api.rubygems.org. (34)                                                                   
12:39:17.551042 IP 100.64.64.10.53 > 192.168.123.38.57011: 38420 5/0/0 CNAME rubygems.org., A 151.101.192.70, A 151.101.0.70, A 151.101.64.70, A 151.101.128.70 (188)
12:39:17.551235 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [S], seq 4290272964, win 29200, options [mss 1460,sackOK,TS val 2863611602 ecr 0,nop,wscale 7], length 0
12:39:17.552609 IP 151.101.192.70.443 > 192.168.123.38.42142: Flags [S.], seq 2904291499, ack 4290272965, win 28960, options [mss 1460,sackOK,TS val 8933572 ecr 2863611602,nop,wscale 9], length 0
12:39:17.552639 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 2863611604 ecr 8933572], length 0            
12:39:17.555263 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [P.], seq 1:187, ack 1, win 229, options [nop,nop,TS val 2863611606 ecr 8933572], length 186
12:39:17.556260 IP 151.101.192.70.443 > 192.168.123.38.42142: Flags [.], ack 187, win 59, options [nop,nop,TS val 8933573 ecr 2863611606], length 0           
12:39:17.577786 IP 151.101.192.70.443 > 192.168.123.38.42142: Flags [P.], seq 4345:5338, ack 187, win 59, options [nop,nop,TS val 8933579 ecr 2863611606], length 993
12:39:17.577814 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [.], ack 1, win 244, options [nop,nop,TS val 2863611629 ecr 8933573,nop,nop,sack 1 {4345:5338}], length 0
12:40:17.596025 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [P.], seq 187:194, ack 1, win 244, options [nop,nop,TS val 2863671647 ecr 8933573,nop,nop,sack 1 {4345:5338}], length 7
12:40:17.596083 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [F.], seq 194, ack 1, win 244, options [nop,nop,TS val 2863671647 ecr 8933573,nop,nop,sack
1 {4345:5338}], length 0
12:40:17.597274 IP 151.101.192.70.443 > 192.168.123.38.42142: Flags [F.], seq 5338, ack 195, win 59, options [nop,nop,TS val 8948583 ecr 2863671647], length 0
12:40:17.597316 IP 192.168.123.38.42142 > 151.101.192.70.443: Flags [R], seq 4290273159, win 0, length 0                                                      
12:40:22.714510 ARP, Request who-has 169.254.1.1 tell 192.168.123.38, length 28
12:40:22.714520 ARP, Reply 169.254.1.1 is-at ee:ee:ee:ee:ee:ee, length 28

iptables-save -c output does not seem to indicate a drop rule getting hit.

 iptables-save -c | grep cali3d6d83f88c6
:cali-fw-cali3d6d83f88c6 - [0:0]
:cali-tw-cali3d6d83f88c6 - [0:0]
[80053:4600613] -A cali-from-wl-dispatch-3 -i cali3d6d83f88c6 -m comment --comment "cali:geylDCrIvXLfnhgh" -g cali-fw-cali3d6d83f88c6
[100818:5711862] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:X5Q_Dx33YJuC1Q9d" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[0:0] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:8mJROVg6x77VovXF" -m conntrack --ctstate INVALID -j DROP
[2781:243323] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:8mOQvzDNUZqhCcYS" -j MARK --set-xmark 0x0/0x10000
[2781:243323] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:Zho-neSPitfJfhuw" -j cali-pro-kns.default
[2781:243323] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:0WqFDQueFWgxphl7" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
[0:0] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:meDx_SsjAEyUiMFy" -j cali-pro-ksa.default.default
[0:0] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:v3TFVcZVw9qBe-pA" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
[0:0] -A cali-fw-cali3d6d83f88c6 -m comment --comment "cali:WSNBqSQpCWp90umn" -m comment --comment "Drop if no profiles matched" -j DROP
[101721:1294323183] -A cali-to-wl-dispatch-3 -o cali3d6d83f88c6 -m comment --comment "cali:VR-aySC5vnqQIbmR" -g cali-tw-cali3d6d83f88c6
[137489:1695656118] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:I0zm3IXsai3x5npD" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:u__JvXqFROYHIsNZ" -m conntrack --ctstate INVALID -j DROP
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:0Ym__jjODCi1IKX5" -j MARK --set-xmark 0x0/0x10000
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:cNnPvuxOBo0hyCEI" -j cali-pri-kns.default
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:pSFslBlcx4HDPMst" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:N0Z-Dm-oFe3RTGN7" -j cali-pri-ksa.default.default
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:W6dcc5-QT3Ud9Tx7" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
[0:0] -A cali-tw-cali3d6d83f88c6 -m comment --comment "cali:JA5YkyRQJa1WgeyY" -m comment --comment "Drop if no profiles matched" -j DROP

I ran tcpdump on the host towards the endpoint which gives me some ICMP unreachable - need to frag (mtu 1440) errors.

tcpdump -nli eth0 host 151.101.128.70
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:49:48.585996 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [S], seq 3528876158, win 29200, options [mss 1460,sackOK,TS val 454081318 ecr 0,nop,wscale 7],
length 0
12:49:48.587283 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [S.], seq 1059600824, ack 3528876159, win 28960, options [mss 1460,sackOK,TS val 8251787 ecr 454081318,nop,wscale 9], length 0
12:49:48.587325 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [.], ack 1, win 229, options [nop,nop,TS val 454081319 ecr 8251787], length 0                 
12:49:48.589839 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [P.], seq 1:187, ack 1, win 229, options [nop,nop,TS val 454081322 ecr 8251787], length 186   
12:49:48.590816 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], ack 187, win 59, options [nop,nop,TS val 8251788 ecr 454081322], length 0                
12:49:48.592036 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [P.], seq 1:5338, ack 187, win 59, options [nop,nop,TS val 8251788 ecr 454081322], length 5337
12:49:48.592058 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:49:48.609737 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [P.], seq 4345:5338, ack 187, win 59, options [nop,nop,TS val 8251793 ecr 454081322], length 993
12:49:48.609778 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [.], ack 1, win 244, options [nop,nop,TS val 454081341 ecr 8251788,nop,nop,sack 1 {4345:5338}], length 0
12:49:48.610787 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8251793 ecr 454081341], length 1448 
12:49:48.610808 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:49:48.817854 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8251845 ecr 454081341], length 1448 
12:49:48.817883 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:49:49.246242 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8251952 ecr 454081341], length 1448 
12:49:49.246267 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:49:50.082044 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8252161 ecr 454081341], length 1448 
12:49:50.082095 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:49:51.742006 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8252576 ecr 454081341], length 1448 
12:49:51.742040 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:49:55.202214 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8253441 ecr 454081341], length 1448 
12:49:55.202250 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:50:01.853950 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8255104 ecr 454081341], length 1448 
12:50:01.853986 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:50:15.165813 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8258432 ecr 454081341], length 1448 
12:50:15.165840 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:50:42.301954 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [.], seq 1:1449, ack 187, win 59, options [nop,nop,TS val 8265216 ecr 454081341], length 1448 
12:50:42.301978 IP 10.61.1.35 > 151.101.128.70: ICMP 10.61.1.35 unreachable - need to frag (mtu 1440), length 556                                             
12:50:48.635530 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [P.], seq 187:194, ack 1, win 244, options [nop,nop,TS val 454141367 ecr 8251788,nop,nop,sack 1 {4345:5338}], length 7
12:50:48.635598 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [F.], seq 194, ack 1, win 244, options [nop,nop,TS val 454141367 ecr 8251788,nop,nop,sack 1 {4345:5338}], length 0
12:50:48.637850 IP 151.101.128.70.443 > 10.61.1.35.52108: Flags [F.], seq 5338, ack 195, win 59, options [nop,nop,TS val 8266799 ecr 454141367], length 0     
12:50:48.637915 IP 10.61.1.35.52108 > 151.101.128.70.443: Flags [R], seq 3528876353, win 0, length 0

Which lead me to look for mtu changes in my deployment. It seems that the mtu default has changed from 1500 -> 1440 which is a problem when we use docker:dind as that creates interfaces with mtu of 1500 that is why our docker builds started to break.
Now I just need to figure out how to change docker:dind.

@caseydavenport
Copy link
Member

Awesome sleuthing, and thanks for reporting back!

It's odd though, since I'm not aware of the MTU changing in Calico v3.3. I wonder if that got adjusted somehow unintentionally, or if some other bit of configuration has changed?

Here's a good place to start for how to adjust your settings: https://docs.projectcalico.org/v3.3/usage/configuration/mtu

@caseydavenport
Copy link
Member

It does appear that the CNI MTU configuration went from 1500 -> 1440 in the Calico manifests between v3.1 and v3.2.

IIUC, the 1500 value in v3.1 was actually incorrect since the IPIP tunnel MTU was only configured to be 1440 in the older manifests, which is why it was changed.

I'm sorry this wasn't spotted earlier, but I think we can close this now since all manifests are using the 1440 value consistently now, and it can be easily modified in calico.yaml via the veth_mtu setting.

@rcomblen
Copy link

rcomblen commented Nov 27, 2020

@knfoo You saved my day. I had exactly the same issue with Gitlab and its runner deployed on k8s with Gitlab Charts.

I'm running docker-in-docker (dind) runner, and I was experiencing just the same networking issue.

Indeed, after running an ifconfig on the docker-dind container, I see that its main network interface has an MTU of 1376.

Passing a --mtu 1300 argument to the daemon solves the problem.

I got the same issue on Weave and Calico.

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants