Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node #3370

Jexf · 2022-02-28T04:16:48Z

Describe the bug

Pod requests inner-cluster svc failed with kube-proxy when the pod arp entries GC on the node.

When the current Pod A accesses Service B for the first time (assuming that the Pod is just running and the gateway ARP entry is empty), it will request the MAC address of the gateway first. After the request reaches the node antrea-gw0, it will cache the ARP entry of Pod A, and then make an ARP reply. After Pod A receives the gateway ARP reply, it caches the gateway ARP entry.
Pod A has no cross-node access/Service access requirements within a certain period of time (for example, within 5 minutes).
Since the ARP aging and recycling on the current node are all default parameters and have not been optimized, the gc_stale_time is 60s, the base_reachable_time_ms is 30000ms, and the gc_thresh1 is 128. Normal ARP aging and recycling will be performed. When the time exceeds 2 minutes, the ARP entry corresponding to Pod A on the node has entered the stale state, and because the number of ARP entries on the node exceeds 128, ARP GC is triggered, and the ARP entry corresponding to Pod A on the node will be recycled..
Although the gateway ARP entry in the Pod has also entered the stale state, due to the small number of ARP entries, the ARP GC will not be triggered because the number of ARP entries does not reach gc_thresh1 online, so the gateway ARP entry in PodA always exists.
After 5 minutes, when Pod A accesses Service B again, the packets replied by the backend endpoints arrive at the Pod A node. Check the ip neigh and find that it is empty, then send an ARP Request to Pod A. Because the source IP address is the Service's ClusterIP, and the source MAC is the gateway MAC, it was filtered out after checking on Tables SpoofGuard, resulting in failure.

the detail packets capture log( Pod A IP:10.224.1.170, SVC B ClusterIP:10.10.0.10)

:36:26.193507 62:82:15:bd:0d:84 > fa:5c:47:de:06:02, ethertype IPv4 (0x0800), length 66: 10.224.1.170.47898 > 10.10.0.10.domain: Flags [S], seq 2589761518, win 28200, options [mss 1410,nop,nop,sackOK,nop,wscale 7], length 0
16:36:26.193549 fa:5c:47:de:06:02 > f2:39:bf:f0:38:c8, ethertype IPv4 (0x0800), length 66: 10.224.1.170.47898 > 10.224.1.90.domain: Flags [S], seq 2589761518, win 28200, options [mss 1410,nop,nop,sackOK,nop,wscale 7], length 0
16:36:26.193838 f2:39:bf:f0:38:c8 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 10.224.1.170 tell 10.224.1.90, length 28
16:36:26.194216 f2:39:bf:f0:38:c8 > fa:5c:47:de:06:02, ethertype IPv4 (0x0800), length 66: 10.224.1.90.domain > 10.224.1.170.47898: Flags [S.], seq 3058005556, ack 2589761519, win 28200, options [mss 1410,nop,nop,sackOK,nop,wscale 7], length 0
16:36:26.194238 fa:5c:47:de:06:02 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 10.224.1.170 tell 10.10.0.10, length 28
16:36:27.196254 fa:5c:47:de:06:02 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 10.224.1.170 tell 10.10.0.10, length 28

To Reproduce
1.Use Antrea with Kube-Proxy
2.Create a pod A and svc B
3.The pod A requests svc B
4. the pod arp entry GC on the node
5. the gateway ARP entry in PodA still exists
6. The pod A requests svc B again

Versions:
Antrea 1.5.0

The text was updated successfully, but these errors were encountered:

Jexf · 2022-02-28T04:18:03Z

@tnqn @antoninbas @jianjuns Can I take it?

tnqn · 2022-02-28T06:42:46Z

@Jexf Thanks for the report and feel free to take the issue. I assume you are using kube-proxy ipvs mode? otherwise the kernel would have used antrea-gw0's IP as the source IP of the ARP request. It would keep the original source IP that triggers the ARP request only when the source IP is owned by the host and arp_announce is 0.
What's your plan of fixing it? A potential solution is to increase restriction level for arp annoucing of antrea-gw0 to 1, but not sure if this requires extra privilege for agent pod.

Jexf · 2022-02-28T09:40:53Z

@tnqn Thanks for the reply. Yes, we are using kube-proxy ipvs mode. Setting arp_announce to 1 for the antrea-gw0 device sounds good to me, but I'm not sure if there has the same problem on windows.

antoninbas · 2022-02-28T18:40:15Z

Would running kube-proxy with --ipvs-strict-arp solve the issue as kube-proxy would take care of setting arp_announce to 2?

tnqn · 2022-03-01T06:29:54Z

@antoninbas That sounds better. It seems designed for this scenario. Then the only thing we could do is to document this requirement when working with kube-proxy ipvs. Does it work for you @Jexf?
Another thing need to note is that Egress needs arp_ignore to be 0 to respond ARP requests received on external interfaces. Since --ipvs-strict-arp will change it to 1, we need to ensure the userspace arp responder will be activated:

antrea/pkg/agent/ipassigner/ip_assigner_linux.go

Lines 213 to 219 in a87ef55

    
           // Start the ARP responder only when the dummy device is not created. The kernel will handle ARP requests 
        
           // for IPs assigned to the dummy devices by default. 
        
           // TODO: Check the arp_ignore sysctl parameter of the transport interface to determine whether to start 
        
           // the ARP responder or not. 
        
           if a.dummyDevice == nil && a.arpResponder != nil { 
        
           	go a.arpResponder.Run(ch) 
        
           }

Jexf · 2022-03-02T01:41:01Z

@tnqn @antoninbas Thanks for the reply, --ipvs-strict-arp works well for me.

Jexf added the kind/bug Categorizes issue or PR as related to a bug. label Feb 28, 2022

Jexf changed the title ~~Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry will be gc on the node~~ Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node Feb 28, 2022

antoninbas assigned Jexf Feb 28, 2022

Jexf closed this as completed Mar 4, 2022

antoninbas mentioned this issue May 16, 2022

ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

Closed

jianjuns mentioned this issue May 17, 2022

Make Egress work with kube-proxy IPVS strictARP mode #3804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node #3370

Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node #3370

Jexf commented Feb 28, 2022 •

edited

Loading

Jexf commented Feb 28, 2022

tnqn commented Feb 28, 2022

Jexf commented Feb 28, 2022

antoninbas commented Feb 28, 2022

tnqn commented Mar 1, 2022 •

edited

Loading

Jexf commented Mar 2, 2022

Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node #3370

Pod requests inner-cluster svc failed with kube-proxy when the pod arp entry GC on the node #3370

Comments

Jexf commented Feb 28, 2022 • edited Loading

Jexf commented Feb 28, 2022

tnqn commented Feb 28, 2022

Jexf commented Feb 28, 2022

antoninbas commented Feb 28, 2022

tnqn commented Mar 1, 2022 • edited Loading

Jexf commented Mar 2, 2022

Jexf commented Feb 28, 2022 •

edited

Loading

tnqn commented Mar 1, 2022 •

edited

Loading