-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node #3370
Comments
@tnqn @antoninbas @jianjuns Can I take it? |
@Jexf Thanks for the report and feel free to take the issue. I assume you are using kube-proxy ipvs mode? otherwise the kernel would have used antrea-gw0's IP as the source IP of the ARP request. It would keep the original source IP that triggers the ARP request only when the source IP is owned by the host and arp_announce is 0. |
@tnqn Thanks for the reply. Yes, we are using kube-proxy ipvs mode. Setting arp_announce to 1 for the antrea-gw0 device sounds good to me, but I'm not sure if there has the same problem on windows. |
Would running kube-proxy with |
@antoninbas That sounds better. It seems designed for this scenario. Then the only thing we could do is to document this requirement when working with kube-proxy ipvs. Does it work for you @Jexf? antrea/pkg/agent/ipassigner/ip_assigner_linux.go Lines 213 to 219 in a87ef55
|
@tnqn @antoninbas Thanks for the reply, |
Describe the bug
Pod requests inner-cluster svc failed with kube-proxy when the pod arp entries GC on the node.
When the current Pod A accesses Service B for the first time (assuming that the Pod is just running and the gateway ARP entry is empty), it will request the MAC address of the gateway first. After the request reaches the node antrea-gw0, it will cache the ARP entry of Pod A, and then make an ARP reply. After Pod A receives the gateway ARP reply, it caches the gateway ARP entry.
Pod A has no cross-node access/Service access requirements within a certain period of time (for example, within 5 minutes).
Since the ARP aging and recycling on the current node are all default parameters and have not been optimized, the gc_stale_time is 60s, the base_reachable_time_ms is 30000ms, and the gc_thresh1 is 128. Normal ARP aging and recycling will be performed. When the time exceeds 2 minutes, the ARP entry corresponding to Pod A on the node has entered the stale state, and because the number of ARP entries on the node exceeds 128, ARP GC is triggered, and the ARP entry corresponding to Pod A on the node will be recycled..
Although the gateway ARP entry in the Pod has also entered the stale state, due to the small number of ARP entries, the ARP GC will not be triggered because the number of ARP entries does not reach gc_thresh1 online, so the gateway ARP entry in PodA always exists.
After 5 minutes, when Pod A accesses Service B again, the packets replied by the backend endpoints arrive at the Pod A node. Check the ip neigh and find that it is empty, then send an ARP Request to Pod A. Because the source IP address is the Service's ClusterIP, and the source MAC is the gateway MAC, it was filtered out after checking on Tables SpoofGuard, resulting in failure.
the detail packets capture log( Pod A IP:10.224.1.170, SVC B ClusterIP:10.10.0.10)
To Reproduce
1.Use Antrea with Kube-Proxy
2.Create a pod A and svc B
3.The pod A requests svc B
4. the pod arp entry GC on the node
5. the gateway ARP entry in PodA still exists
6. The pod A requests svc B again
Versions:
Antrea 1.5.0
The text was updated successfully, but these errors were encountered: