Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod unable to reach itself through a service using driver none #13370

Open
belfo opened this issue Jan 18, 2022 · 19 comments
Open

Pod unable to reach itself through a service using driver none #13370

belfo opened this issue Jan 18, 2022 · 19 comments
Labels
co/none-driver help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@belfo
Copy link

belfo commented Jan 18, 2022

What Happened?

If a pod has a service which points to the pod, the pod cannot reach itself through the service IP. Other pods can reach the service and the pod itself can reach other services.
Similar to #1568

But i'm using driver none, and the different proposal in the issue hasn't help.

Just a simple service :
apiVersion: v1
kind: Pod
metadata:
name: replyer
namespace: sample-domain1-ns
spec:
containers:

  • name: replyer
    image: nginx-echo-headers:v1
    imagePullPolicy: IfNotPresent
    restartPolicy: Always

apiVersion: v1
kind: Service
metadata:
namespace: sample-domain1-ns
name: replyer-ext
spec:
selector:
k8s-app: replyer
ports:

  • name: app
    port: 8080
    protocol: TCP
    targetPort: 8080

nginx-echo-headers is a sample nginx that print back request (https://github.com/brndnmtthws/nginx-echo-headers), and i added curl to the image.

From the running pod if i run curl:

/ # curl -v replyer-ext:8080

  • Rebuilt URL to: replyer-ext:8080/
  • Trying 10.98.117.249...
  • TCP_NODELAY set
  • connect to 10.98.117.249 port 8080 failed: Operation timed out
  • Failed to connect to replyer-ext port 8080: Operation timed out
  • Closing connection 0
    curl: (7) Failed to connect to replyer-ext port 8080: Operation timed out

Attach the log file

I added the logs, i have other pods running, hopefully the logs are clear
log2.txt

Operating System

Ubuntu

Driver

None

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 18, 2022

I don't think it is possible to run it without a CNI (?), so something like --cni=bridge should work around...

KubeNet is getting deprecated, so probably shouldn't recommend that. Not sure if kindnet works on host ?

@afbjorklund afbjorklund added co/none-driver kind/bug Categorizes issue or PR as related to a bug. labels Jan 18, 2022
@belfo
Copy link
Author

belfo commented Jan 18, 2022

My start command is:
sudo minikube start --extra-config=apiserver.service-node-port-range=80-40000 --cni=bridge --driver=none

@afbjorklund afbjorklund added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 18, 2022
@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 18, 2022

The default setup (auto) is supposed to work out-of-the-box, also when running directly on localhost.

minikube start --driver=none --cni=''

Currently it just disables CNI, which is probably not the best choice. So should pick something else.

Driver none used, CNI unnecessary in this configuration, recommending no CNI

It is slightly better than kubeadm, which goes into CrashLoopBackOff waiting on CoreDNS.

But only ever-so slightly. For newer Kubernetes (1.24+), it should use both CRI and CNI...

@belfo
Copy link
Author

belfo commented Jan 18, 2022

The default setup (auto) is supposed to work out-of-the-box, also when running directly on localhost.

minikube start --driver=none --cni=''

Currently it just disables CNI, which is probably not the best choice. So should pick something else.

Driver none used, CNI unnecessary in this configuration, recommending no CNI

It is slightly better than kubeadm, which goes into CrashLoopBackOff waiting on CoreDNS.

But only ever-so slightly. For newer Kubernetes (1.24+), it should use both CRI and CNI...

Not sure to understand.
Auto is currently disabling cni?
So starting with cni=bridge should be ok? I also tested cni=calico but still the same effect.

@belfo
Copy link
Author

belfo commented Jan 18, 2022

To start from a clean cluster i deleted the minikube and restarted with the above command:
sudo minikube start --extra-config=apiserver.service-node-port-range=80-40000 --cni=bridge --driver=none

Some pod don't start:
kube-system pod/coredns-78fcd69978-25rc4 0/1 ContainerCreating 0 13m

Works with default cni (auto) or calico
In the logs i see:
Jan 18 15:55:30 ubuntu-18-lts kubelet[2568]: E0118 15:55:30.399890 2568 pod_workers.go:836] "Error syncing pod, skipping" err="failed to "KillPodSandbox" for "02e3fae4-3c8c-4b4e-a039-3ef15a740c3a" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"coredns-78fcd69978-25rc4_kube-system\" network: error getting ClusterInformation: connection is unauthorized: Unauthorized"" pod="kube-system/coredns-78fcd69978-25rc4" podUID=02e3fae4-3c8c-4b4e-a039-3ef15a740c3a
Jan 18 15:55:32 ubuntu-18-lts kubelet[2568]: I0118 15:55:32.577939 2568 cni.go:204] "Error validating CNI config list" configList="\n{\n "cniVersion": "0.3.1",\n "name": "bridge",\n "plugins": [\n {\n "type": "bridge",\n "bridge": "bridge",\n "addIf": "true",\n "isDefaultGateway": true,\n "forceAddress": false,\n "ipMasq": true,\n "hairpinMode": true,\n "ipam": {\n "type": "host-local",\n "subnet": "10.244.0.0/16"\n }\n },\n {\n "type": "portmap",\n "capabilities": {\n "portMappings": true\n }\n }\n ]\n}\n" err="[failed to find plugin "bridge" in path [/opt/cni/bin]]"

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jan 18, 2022

failed to find plugin "bridge" in path [/opt/cni/bin]

That looks like the CNI plugins are missing, you are supposed to provide both CRI and CNI when running "none"

@belfo
Copy link
Author

belfo commented Jan 19, 2022

Thx @afbjorklund .
So i installled the plugins, and recreated the minikube, with bridge.
t.txt
Still i can't call the pod through his service.
curl -v replyer-ext:8080

  • Rebuilt URL to: replyer-ext:8080/
  • Trying 10.109.12.235...
  • TCP_NODELAY set
  • connect to 10.109.12.235 port 8080 failed: Connection refused
  • Failed to connect to replyer-ext port 8080: Connection refused
  • Closing connection 0
    curl: (7) Failed to connect to replyer-ext port 8080: Connection refused

This time the command: for intf in /sys/devices/virtual/net/bridge/brif/*; do cat $intf/hairpin_mode; done
returns only 1 (while before it was 0)

@belfo
Copy link
Author

belfo commented Jan 19, 2022

Also i have this logs in the journal:
Jan 18 17:32:09 ubuntu-18-lts kubelet[29135]: I0118 17:32:09.312567 29135 docker_service.go:566] "Hairpin mode is set but kubenet is not enabled, falling back to HairpinVeth" hairpinMode=promiscuous-bridge
Jan 18 17:32:09 ubuntu-18-lts kubelet[29135]: I0118 17:32:09.312636 29135 docker_service.go:242] "Hairpin mode is set" hairpinMode=hairpin-veth

@afbjorklund
Copy link
Collaborator

There are more resources under https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/

@belfo
Copy link
Author

belfo commented Jan 19, 2022

I checked https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#pods-are-not-accessible-via-their-service-ip
Hairpin is enabled (https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#a-pod-fails-to-reach-itself-via-the-service-ip) :

root     13400  2.3  0.3 2164100 113824 ?      Ssl  09:25   0:17 /var/lib/minikube/binaries/v1.22.3/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=docker --hostname-override=ubuntu-18-lts --kubeconfig=/etc/kubernetes/kubelet.conf --network-plugin=cni --node-ip=10.0.2.15 --resolv-conf=/run/systemd/resolve/resolv.conf
Jan 19 09:25:26 ubuntu-18-lts kubelet[13400]: I0119 09:25:26.433117   13400 docker_service.go:566] "Hairpin mode is set but kubenet is not enabled, falling back to HairpinVeth" hairpinMode=promiscuous-bridge
Jan 19 09:25:26 ubuntu-18-lts kubelet[13400]: I0119 09:25:26.433140   13400 docker_service.go:242] "Hairpin mode is set" hairpinMode=hairpin-veth

and

for intf in /sys/devices/virtual/net/*/brif/*; do cat $intf/hairpin_mode; done
1
1
1
1
1
1

and hostname -i return an ip

hostname -i
10.0.2.15

(I'm using vagrant)

Still the call don't work:

 # curl -v replyer-ext:8080
* Rebuilt URL to: replyer-ext:8080/
*   Trying 10.103.198.103...
* TCP_NODELAY set
* connect to 10.103.198.103 port 8080 failed: Connection refused
* Failed to connect to replyer-ext port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to replyer-ext port 8080: Connection refused

continue to search

@lmeyerov
Copy link

Good spot on --cni=bridge, that improved things for us for driver=none (GPUs)

@belfo
Copy link
Author

belfo commented Jan 25, 2022

After some trial, it worked after adding the following rule after minikube started.
sudo iptables -P FORWARD ACCEPT.

@spowelljr spowelljr added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 25, 2022
@belfo
Copy link
Author

belfo commented Jan 26, 2022

So after searching i found the reason.
https://docs.docker.com/network/iptables/#docker-on-a-router

To make it work i added :
sudo iptables -I DOCKER-USER -i docker0 -o docker0 -j ACCEPT
sudo iptables -I DOCKER-USER -i bridge -o bridge -j ACCEPT

By adding it to the docker-user it's not cleared when docker daemon restart (done by the start of minikube)

Maybe minikube should add that rule after start?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 26, 2022
@sharifelgamal sharifelgamal added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 1, 2022
@jimlindeman
Copy link

FYI, with minikube against virtualbox backend, we've also seen that we've had to have 2 replicas of a pod in order for the code in the pod to access its own service-url. This seemed to be a iptables-based implementation of services limitation where a given pod couldn't redirect to itself.

@ramayer
Copy link

ramayer commented Jan 12, 2023

I believe this git issue makes the Apache Solr Operator ( https://solr.apache.org/operator/ ) difficult to use with minikube.

Solr clusters seems to fail under minikube when minikube is created with --cni=auto but works with --cni=bridge in my dev environment.

More details in apache/solr-operator#498

@lopiola
Copy link

lopiola commented Sep 6, 2023

I have the same problem on Ubuntu 20.04 with minikube start --driver=none --cni=bridge, there is no routing between the pods, but also no routing from inside the cluster to the Internet.

Thank you @belfo for digging into this, your findings have helped a lot. This is enough for me after the minikube start to make the network work:

iptables -I DOCKER-USER -i bridge -o bridge -j ACCEPT
iptables -I DOCKER-USER -i bridge ! -o bridge -j ACCEPT

The traffic returning from the outside to the cluster is already taken care of by the entry in KUBE-FORWARD chain.

Maybe minikube should add that rule after start?

I second that suggestion, plus the rule for the external routing.

@linux0x5c
Copy link

@lopiola I have the same problem on Debian 11.7 with minikube start --driver=none --cni=bridge.

iptables -I DOCKER-USER -i bridge -o bridge -j ACCEPT
iptables -I DOCKER-USER -i bridge ! -o bridge -j ACCEPT

or
iptables -P FORWARD ACCEPT

worked for me, I can't understand why minikube don't add this rules

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/none-driver help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests