-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable AF_XDP for cmd-forwarder-vpp management interface #283
Comments
For some reason |
Found a problem on clusters - forwarder just hangs during the start without any logs. Created a JIRA issue - https://jira.fd.io/browse/VPP-1994 |
It seems that it became clear why we see the forwarder (and node) hanging. So, when VPP takes the uplink interface, it grabs the primary node interface. And traffic goes directly to the VPP, bypassing Linux. Therefore, we lose connection with the node and it seems to us that it hangs. @edwarnicke |
Could you please say more? Also, as I know AF_XDP is not working with calico. Am I wrong? |
@glazychev-art Look into AD_XDP and eBPF. You should be able to craft an eBPF program that is passed in for AD_XDP that only passes on VXLAN/Wireguard/IPSEC packets (sort of like pinhole) and then that traffic will go to VPP, and all other traffic will go to the kernel interface. |
Most likely the action plan will be:
|
Current state:
There was an idea to update VPP to the latest version. Perhaps the patch https://gerrit.fd.io/r/c/vpp/+/37274 is not entirely correct if we run the cluster locally (kind). I continue to work in this direction. |
@glazychev-art Is calico-vpp being on an older vpp version still blocking us updating to a more recent vpp version? |
@edwarnicke
Do we need to update? |
@glazychev-art Its probably a good idea to update yes |
@glazychev-art It might also be a good idea to put in tests in VPP to prevent some of the breakage we are seeing happening in the future. |
@edwarnicke But what do we do with ARP packets? Perhaps we need also filter frames by Destination MAC, if they are different for VPP and kernel interfaces Do you have any thoughts? |
@glazychev-art Could you point me to your existing eBPF program? |
@glazychev-art Have you looked at bpf_clone_redirect() ? |
@edwarnicke
In short, we pass all ARP packets to VPP and filter IP packets - if UDP port belongs to VxLAN, Wireguard and so on - we pass it VPP, otherwise - to kernel |
@edwarnicke |
@glazychev-art Trying to create an sk_buff sounds like it might be prone to error. We may also want to think through what the problem really is. Is the problem that we are not receiving arp packets, or is the problem how we construct our neighbor table in VPP? |
I think the problem is that we are not receiving arp packets. |
Have we checked this? It might be true, but I wouldn't simply presume it. |
I think I tested something similar. Without But definitely, we need to double-check that. |
@edwarnicke But this is not the case for IPv6. Since it has neighbor mechanism, Linux side doesn't save NA (Neighbor Advertisement) if we send NS (Neighbor Solicitation) from the VPP side. I tried changing the Solicited and Override flags in the response but it didn't help. Should we continue to work in this direction or does it make sense to implement only IPv4? |
Current state:
|
Current state:
|
I've tried to resolve IPv6 neighbors in the kernel space manually. |
Are we typically looking for anything other than the mac address of the gateway IP for the IPv6 case? If so, could we simply scrape the linux Neighbor table for v6? |
This may also help: |
Current state:
Instead, we can resolve gateways for a given interface in a slightly different way. Before creating AF_XDP, we can use netlink.RouteList and then ping every gateway found. This will allow us to add neighbor entries to the linux. And they will later be read and added to the VPP. @edwarnicke |
@edwarnicke
|
Current state:
AWS - doesn't start. Logs from forwarder:
Packet - started, but ping doesn't work. This is most likely due to the fact that af_packet vpp plugin doesn't process bonded interfaces (they are used by packet) Measurements on Kind
|
EstimationTo run ci on kind cluster with xdp we need:
|
@edwarnicke |
@glazychev-art Its strange that AF_PACKET is faster for TCP but slower for UDP. Do we have any notion of why? |
@edwarnicke
(we don't see them with AF_PACKET) So, I think the problem may be in the VPP plugin. |
As part of this task, we have done the integration of On public clusters, we ran into problems. Separate issues were created I think this issue can be closed |
Currently cmd-forwarder-vpp uses AF_PACKET to bind to an existing Node interface using LinkToAfPacket
AF_XDP is faster than than AF_PACKET, but AF_XDP is only useable for our purposes from kernel version 5.4 onward. The good news is that lots of places have kernel versions that new (including the more recent version of Docker Desktop).
AF_XDP is supported in govpp
Because AF_XDP is only supported for newer kernels, a check will need to be made and then the correct method (AF_XDP if available, otherwise AF_PACKET) before choosing the method to use.
The text was updated successfully, but these errors were encountered: