Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot access service from within a pod #73481

Closed
jiashenC opened this issue Jan 29, 2019 · 7 comments
Closed

Cannot access service from within a pod #73481

jiashenC opened this issue Jan 29, 2019 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@jiashenC
Copy link

What happened:
I have a simple Kubernetes Service standup and it is backed by a single Pod. I go inside the docker container for that Pod by docker exec -it k8s_test_test-deployment-58b575b599-zlpb5_default_4cc8f312-23e1-11e9-acf6-0800273e509f_0 bash. I try to curl test to access the Service, but it receives connection timeout error. I also test by curl <Pod IP>:8080, which works without any issue.

What you expected to happen:
curl test or curl <service name> also works without any issue inside the container.

How to reproduce it (as minimally and precisely as possible):
I use minikube on MacOS to test the Kubernetes setup. Below are my k8s config file and docker file. You can build a very simple test image docker build --tag=test . and test it in your environment.

config.yml

kind: Service
apiVersion: v1
metadata:
  name: test
spec:
  selector:
    app: test
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 8080
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
  labels:
    app: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test
        image: test
        imagePullPolicy: Never
        ports:
        - containerPort: 8080

Dockerfile

FROM ubuntu

WORKDIR /server

COPY . /server

RUN apt-get update && apt-get install -y build-essential \
    python \
    python-dev \
    python-pip \
    curl \
    telnet \
    traceroute


EXPOSE 8080

CMD python -m SimpleHTTPServer 8080

Anything else we need to know?:
I use nslookup test, and it returns the correct Service IP attached to test. I don't think this issue has anything to do with DSN service.

curl -v test prints out the following error.

Rebuilt URL to: test/
Trying 10.105.63.204...
TCP_NODELAY set
connect to 10.105.63.204 port 80 failed: Connection timed out
Failed to connect to test port 80: Connection timed out
Closing connection 0
curl: (7) Failed to connect to test port 80: Connection timed out

Here is my original post on StackOverflow.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-13T23:15:13Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
MacOS Mojave 10.14.2
  • Kernel (e.g. uname -a):
Darwin Jiashens-MacBook-Pro.local 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64
  • Install tools:
brew install kubernetes-cli
  • Others:
@jiashenC jiashenC added the kind/bug Categorizes issue or PR as related to a bug. label Jan 29, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 29, 2019
@jiashenC
Copy link
Author

/sig cli

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 29, 2019
@baltendo
Copy link
Contributor

I was able to reproduce the error with the mentioned setup (MacOS, Minikube, k8s v1.13.2).
But I didn't have the problem on my Docker Desktop setup (MacOS, Docker Desktop, k8s v1.13.0).

Minikube is using coredns instead of kube-dns.
I listed the pods in the kube-system namespace and had a look on the logs of a coredns pod:

.:53
2019-01-29T20:39:30.345Z [INFO] CoreDNS-1.2.6
2019-01-29T20:39:30.345Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
E0129 20:39:55.345725       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0129 20:39:55.346195       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0129 20:39:55.347321       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

I followed then the instructions here to debug the problem -> no success
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

ns lookup in the cluster seems to be ok

# nslookup kubernetes.default
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	kubernetes.default.svc.cluster.local
Address: 10.96.0.1

/etc/resolv.conf seems to beok

# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

coredns pods seem to be ok

# kubectl -n kube-system get pods
NAME                               READY   STATUS    RESTARTS   AGE
coredns-7f4d9cf5-dvddc             1/1     Running   0          8m14s
coredns-7f4d9cf5-q6v2r             1/1     Running   0          8m13s
.:53
2019-01-29T21:32:55.527Z [INFO] CoreDNS-1.3.1
2019-01-29T21:32:55.527Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-01-29T21:32:55.527Z [INFO] plugin/reload: Running configuration MD5 = 55ddf8276b3f109d978d1398efcd4609

.:53
2019-01-29T21:32:55.753Z [INFO] CoreDNS-1.3.1
2019-01-29T21:32:55.753Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-01-29T21:32:55.753Z [INFO] plugin/reload: Running configuration MD5 = 55ddf8276b3f109d978d1398efcd4609

DNS service seems to be ok

ubectl -n kube-system get svc
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP   63m

Endpoints are exposed, although on the page the IPs start with 10.* and here with 172.*

kubectl -n kube-system get endpoints
NAME                      ENDPOINTS                                               AGE
kube-controller-manager   <none>                                                  65m
kube-dns                  172.17.0.5:53,172.17.0.6:53,172.17.0.5:53 + 1 more...   65m
kube-scheduler            <none>

With enabled logging I see that the coredns pod gets the request

.:53
2019-01-29T21:47:16.507Z [INFO] CoreDNS-1.3.1
2019-01-29T21:47:16.508Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-01-29T21:47:16.508Z [INFO] plugin/reload: Running configuration MD5 = 5285a634cfe17920555e070c8a14a052
2019-01-29T21:47:16.509Z [INFO] 127.0.0.1:47476 - 58571 "HINFO IN 228427538998621228.5618191678049545926. udp 56 false 512" NXDOMAIN qr,rd,ra 56 0.000541398s
2019-01-29T21:47:54.212Z [INFO] 172.17.0.2:47666 - 37443 "A IN test.default.svc.cluster.local. udp 48 false 512" NOERROR qr,aa,rd 94 0.000260062s
2019-01-29T21:47:54.212Z [INFO] 172.17.0.2:47666 - 24399 "AAAA IN test.default.svc.cluster.local. udp 48 false 512" NOERROR qr,aa,rd 141 0.000184877s
2019-01-29T21:48:04.561Z [INFO] 172.17.0.2:47467 - 29624 "A IN test.default.svc.cluster.local. udp 48 false 512" NOERROR qr,aa,rd 94 0.000115134s
2019-01-29T21:48:04.563Z [INFO] 172.17.0.2:47467 - 32963 "AAAA IN test.default.svc.cluster.local. udp 48 false 512" NOERROR qr,aa,rd 141 0.000549559s

I tried to patch the deployment as described here -> no success
#63900

I found this issue in the minikube project, which sounds similar:
kubernetes/minikube#2302

Although I couldn't solve the problem I hope this analysis can help somehow.

@thockin thockin added sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved. labels Mar 7, 2019
@thockin
Copy link
Member

thockin commented Mar 21, 2019

@fturib

@danwinship
Copy link
Contributor

Drive-by insufficiently-detailed comment: the bug is probably that you need to enable "hairpin mode" in the network plugin (to allow a packet to leave the pod and then be routed back into it [after iptables rewrites] rather than being routed to a different destination). I don't know how you would do this in your environment.

@fturib
Copy link

fturib commented Apr 3, 2019

@jiashenC : is this issue still alive? if yes, did you try to follow advice from @danwinship , here ?

@freehan
Copy link
Contributor

freehan commented Apr 18, 2019

Reopen if it is still an issue.

@freehan freehan closed this as completed Apr 18, 2019
@koushikmgithub
Copy link

@jiashenC - Did you find any work around ? I am also facing the same issue with minikube. I have opened an issue.
#106784

Please let me know if you find any resolution or any work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

8 participants