Pod unable to reach itself through a service using driver none #13370

belfo · 2022-01-18T07:47:06Z

What Happened?

If a pod has a service which points to the pod, the pod cannot reach itself through the service IP. Other pods can reach the service and the pod itself can reach other services.
Similar to #1568

But i'm using driver none, and the different proposal in the issue hasn't help.

Just a simple service :
apiVersion: v1
kind: Pod
metadata:
name: replyer
namespace: sample-domain1-ns
spec:
containers:

name: replyer
image: nginx-echo-headers:v1
imagePullPolicy: IfNotPresent
restartPolicy: Always

apiVersion: v1
kind: Service
metadata:
namespace: sample-domain1-ns
name: replyer-ext
spec:
selector:
k8s-app: replyer
ports:

name: app
port: 8080
protocol: TCP
targetPort: 8080

nginx-echo-headers is a sample nginx that print back request (https://github.com/brndnmtthws/nginx-echo-headers), and i added curl to the image.

From the running pod if i run curl:

/ # curl -v replyer-ext:8080

Rebuilt URL to: replyer-ext:8080/
Trying 10.98.117.249...
TCP_NODELAY set
connect to 10.98.117.249 port 8080 failed: Operation timed out
Failed to connect to replyer-ext port 8080: Operation timed out
Closing connection 0
curl: (7) Failed to connect to replyer-ext port 8080: Operation timed out

Attach the log file

I added the logs, i have other pods running, hopefully the logs are clear
log2.txt

Operating System

Ubuntu

Driver

None

afbjorklund · 2022-01-18T10:56:58Z

I don't think it is possible to run it without a CNI (?), so something like --cni=bridge should work around...

KubeNet is getting deprecated, so probably shouldn't recommend that. Not sure if kindnet works on host ?

belfo · 2022-01-18T11:32:48Z

My start command is:
sudo minikube start --extra-config=apiserver.service-node-port-range=80-40000 --cni=bridge --driver=none

afbjorklund · 2022-01-18T11:58:55Z

The default setup (auto) is supposed to work out-of-the-box, also when running directly on localhost.

minikube start --driver=none --cni=''

Currently it just disables CNI, which is probably not the best choice. So should pick something else.

Driver none used, CNI unnecessary in this configuration, recommending no CNI

It is slightly better than kubeadm, which goes into CrashLoopBackOff waiting on CoreDNS.

But only ever-so slightly. For newer Kubernetes (1.24+), it should use both CRI and CNI...

belfo · 2022-01-18T13:06:37Z

The default setup (auto) is supposed to work out-of-the-box, also when running directly on localhost.
minikube start --driver=none --cni=''
Currently it just disables CNI, which is probably not the best choice. So should pick something else.
Driver none used, CNI unnecessary in this configuration, recommending no CNI
It is slightly better than kubeadm, which goes into CrashLoopBackOff waiting on CoreDNS.

But only ever-so slightly. For newer Kubernetes (1.24+), it should use both CRI and CNI...

Not sure to understand.
Auto is currently disabling cni?
So starting with cni=bridge should be ok? I also tested cni=calico but still the same effect.

belfo · 2022-01-18T14:54:56Z

To start from a clean cluster i deleted the minikube and restarted with the above command:
sudo minikube start --extra-config=apiserver.service-node-port-range=80-40000 --cni=bridge --driver=none

Some pod don't start:
kube-system pod/coredns-78fcd69978-25rc4 0/1 ContainerCreating 0 13m

Works with default cni (auto) or calico
In the logs i see:
Jan 18 15:55:30 ubuntu-18-lts kubelet[2568]: E0118 15:55:30.399890 2568 pod_workers.go:836] "Error syncing pod, skipping" err="failed to "KillPodSandbox" for "02e3fae4-3c8c-4b4e-a039-3ef15a740c3a" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"coredns-78fcd69978-25rc4_kube-system\" network: error getting ClusterInformation: connection is unauthorized: Unauthorized"" pod="kube-system/coredns-78fcd69978-25rc4" podUID=02e3fae4-3c8c-4b4e-a039-3ef15a740c3a
Jan 18 15:55:32 ubuntu-18-lts kubelet[2568]: I0118 15:55:32.577939 2568 cni.go:204] "Error validating CNI config list" configList="\n{\n "cniVersion": "0.3.1",\n "name": "bridge",\n "plugins": [\n {\n "type": "bridge",\n "bridge": "bridge",\n "addIf": "true",\n "isDefaultGateway": true,\n "forceAddress": false,\n "ipMasq": true,\n "hairpinMode": true,\n "ipam": {\n "type": "host-local",\n "subnet": "10.244.0.0/16"\n }\n },\n {\n "type": "portmap",\n "capabilities": {\n "portMappings": true\n }\n }\n ]\n}\n" err="[failed to find plugin "bridge" in path [/opt/cni/bin]]"

afbjorklund · 2022-01-18T15:10:57Z

failed to find plugin "bridge" in path [/opt/cni/bin]

That looks like the CNI plugins are missing, you are supposed to provide both CRI and CNI when running "none"

belfo · 2022-01-19T07:55:15Z

Thx @afbjorklund .
So i installled the plugins, and recreated the minikube, with bridge.
t.txt
Still i can't call the pod through his service.
curl -v replyer-ext:8080

Rebuilt URL to: replyer-ext:8080/
Trying 10.109.12.235...
TCP_NODELAY set
connect to 10.109.12.235 port 8080 failed: Connection refused
Failed to connect to replyer-ext port 8080: Connection refused
Closing connection 0
curl: (7) Failed to connect to replyer-ext port 8080: Connection refused

This time the command: for intf in /sys/devices/virtual/net/bridge/brif/*; do cat $intf/hairpin_mode; done
returns only 1 (while before it was 0)

belfo · 2022-01-19T08:00:00Z

Also i have this logs in the journal:
Jan 18 17:32:09 ubuntu-18-lts kubelet[29135]: I0118 17:32:09.312567 29135 docker_service.go:566] "Hairpin mode is set but kubenet is not enabled, falling back to HairpinVeth" hairpinMode=promiscuous-bridge
Jan 18 17:32:09 ubuntu-18-lts kubelet[29135]: I0118 17:32:09.312636 29135 docker_service.go:242] "Hairpin mode is set" hairpinMode=hairpin-veth

afbjorklund · 2022-01-19T08:01:16Z

There are more resources under https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/

belfo · 2022-01-19T08:55:43Z

I checked https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#pods-are-not-accessible-via-their-service-ip
Hairpin is enabled (https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#a-pod-fails-to-reach-itself-via-the-service-ip) :

root     13400  2.3  0.3 2164100 113824 ?      Ssl  09:25   0:17 /var/lib/minikube/binaries/v1.22.3/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=docker --hostname-override=ubuntu-18-lts --kubeconfig=/etc/kubernetes/kubelet.conf --network-plugin=cni --node-ip=10.0.2.15 --resolv-conf=/run/systemd/resolve/resolv.conf

Jan 19 09:25:26 ubuntu-18-lts kubelet[13400]: I0119 09:25:26.433117   13400 docker_service.go:566] "Hairpin mode is set but kubenet is not enabled, falling back to HairpinVeth" hairpinMode=promiscuous-bridge
Jan 19 09:25:26 ubuntu-18-lts kubelet[13400]: I0119 09:25:26.433140   13400 docker_service.go:242] "Hairpin mode is set" hairpinMode=hairpin-veth

and

for intf in /sys/devices/virtual/net/*/brif/*; do cat $intf/hairpin_mode; done
1
1
1
1
1
1

and hostname -i return an ip

hostname -i
10.0.2.15

(I'm using vagrant)

Still the call don't work:

 # curl -v replyer-ext:8080
* Rebuilt URL to: replyer-ext:8080/
*   Trying 10.103.198.103...
* TCP_NODELAY set
* connect to 10.103.198.103 port 8080 failed: Connection refused
* Failed to connect to replyer-ext port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to replyer-ext port 8080: Connection refused

continue to search

lmeyerov · 2022-01-23T02:21:38Z

Good spot on --cni=bridge, that improved things for us for driver=none (GPUs)

belfo · 2022-01-25T09:18:27Z

After some trial, it worked after adding the following rule after minikube started.
sudo iptables -P FORWARD ACCEPT.

belfo · 2022-01-26T08:11:23Z

So after searching i found the reason.
https://docs.docker.com/network/iptables/#docker-on-a-router

To make it work i added :
sudo iptables -I DOCKER-USER -i docker0 -o docker0 -j ACCEPT
sudo iptables -I DOCKER-USER -i bridge -o bridge -j ACCEPT

By adding it to the docker-user it's not cleared when docker daemon restart (done by the start of minikube)

Maybe minikube should add that rule after start?

k8s-triage-robot · 2022-04-26T08:13:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-05-26T08:54:27Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

jimlindeman · 2022-11-14T02:20:27Z

FYI, with minikube against virtualbox backend, we've also seen that we've had to have 2 replicas of a pod in order for the code in the pod to access its own service-url. This seemed to be a iptables-based implementation of services limitation where a given pod couldn't redirect to itself.

ramayer · 2023-01-12T18:14:55Z

I believe this git issue makes the Apache Solr Operator ( https://solr.apache.org/operator/ ) difficult to use with minikube.

Solr clusters seems to fail under minikube when minikube is created with --cni=auto but works with --cni=bridge in my dev environment.

More details in apache/solr-operator#498

lopiola · 2023-09-06T18:46:06Z

I have the same problem on Ubuntu 20.04 with minikube start --driver=none --cni=bridge, there is no routing between the pods, but also no routing from inside the cluster to the Internet.

Thank you @belfo for digging into this, your findings have helped a lot. This is enough for me after the minikube start to make the network work:

iptables -I DOCKER-USER -i bridge -o bridge -j ACCEPT
iptables -I DOCKER-USER -i bridge ! -o bridge -j ACCEPT

The traffic returning from the outside to the cluster is already taken care of by the entry in KUBE-FORWARD chain.

Maybe minikube should add that rule after start?

I second that suggestion, plus the rule for the external routing.

linux0x5c · 2023-09-15T03:55:23Z

@lopiola I have the same problem on Debian 11.7 with minikube start --driver=none --cni=bridge.

iptables -I DOCKER-USER -i bridge -o bridge -j ACCEPT
iptables -I DOCKER-USER -i bridge ! -o bridge -j ACCEPT

or
iptables -P FORWARD ACCEPT

worked for me, I can't understand why minikube don't add this rules

afbjorklund added co/none-driver kind/bug Categorizes issue or PR as related to a bug. labels Jan 18, 2022

afbjorklund added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 18, 2022

spowelljr added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 25, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 26, 2022

sharifelgamal added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 1, 2022

ramayer mentioned this issue Jan 12, 2023

Solr Operator seems very picky about the Kubernetes environment it's using (guessing networking/dns) apache/solr-operator#498

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod unable to reach itself through a service using driver none #13370

Pod unable to reach itself through a service using driver none #13370

belfo commented Jan 18, 2022 •

edited

Loading

afbjorklund commented Jan 18, 2022 •

edited

Loading

belfo commented Jan 18, 2022 •

edited

Loading

afbjorklund commented Jan 18, 2022 •

edited

Loading

belfo commented Jan 18, 2022

belfo commented Jan 18, 2022 •

edited

Loading

afbjorklund commented Jan 18, 2022 •

edited

Loading

belfo commented Jan 19, 2022

belfo commented Jan 19, 2022

afbjorklund commented Jan 19, 2022

belfo commented Jan 19, 2022

lmeyerov commented Jan 23, 2022

belfo commented Jan 25, 2022 •

edited

Loading

belfo commented Jan 26, 2022

k8s-triage-robot commented Apr 26, 2022

k8s-triage-robot commented May 26, 2022

jimlindeman commented Nov 14, 2022

ramayer commented Jan 12, 2023

lopiola commented Sep 6, 2023 •

edited

Loading

linux0x5c commented Sep 15, 2023

Pod unable to reach itself through a service using driver none #13370

Pod unable to reach itself through a service using driver none #13370

Comments

belfo commented Jan 18, 2022 • edited Loading

What Happened?

Attach the log file

Operating System

Driver

afbjorklund commented Jan 18, 2022 • edited Loading

belfo commented Jan 18, 2022 • edited Loading

afbjorklund commented Jan 18, 2022 • edited Loading

belfo commented Jan 18, 2022

belfo commented Jan 18, 2022 • edited Loading

afbjorklund commented Jan 18, 2022 • edited Loading

belfo commented Jan 19, 2022

belfo commented Jan 19, 2022

afbjorklund commented Jan 19, 2022

belfo commented Jan 19, 2022

lmeyerov commented Jan 23, 2022

belfo commented Jan 25, 2022 • edited Loading

belfo commented Jan 26, 2022

k8s-triage-robot commented Apr 26, 2022

k8s-triage-robot commented May 26, 2022

jimlindeman commented Nov 14, 2022

ramayer commented Jan 12, 2023

lopiola commented Sep 6, 2023 • edited Loading

linux0x5c commented Sep 15, 2023

belfo commented Jan 18, 2022 •

edited

Loading

afbjorklund commented Jan 18, 2022 •

edited

Loading

belfo commented Jan 18, 2022 •

edited

Loading

afbjorklund commented Jan 18, 2022 •

edited

Loading

belfo commented Jan 18, 2022 •

edited

Loading

afbjorklund commented Jan 18, 2022 •

edited

Loading

belfo commented Jan 25, 2022 •

edited

Loading

lopiola commented Sep 6, 2023 •

edited

Loading