Enable AF_XDP for cmd-forwarder-vpp management interface #283

edwarnicke · 2021-07-15T23:47:46Z

Currently cmd-forwarder-vpp uses AF_PACKET to bind to an existing Node interface using LinkToAfPacket

AF_XDP is faster than than AF_PACKET, but AF_XDP is only useable for our purposes from kernel version 5.4 onward. The good news is that lots of places have kernel versions that new (including the more recent version of Docker Desktop).

AF_XDP is supported in govpp

Because AF_XDP is only supported for newer kernels, a check will need to be made and then the correct method (AF_XDP if available, otherwise AF_PACKET) before choosing the method to use.

glazychev-art · 2021-08-09T10:03:04Z

~~blocked by #284~~

For some reason AF_XDP doesn't work fine with vpp v20.09

glazychev-art · 2021-08-20T13:39:45Z

Found a problem on clusters - forwarder just hangs during the start without any logs.
I tested it on kind and packet cluster - situation is the same.

Created a JIRA issue - https://jira.fd.io/browse/VPP-1994

glazychev-art · 2022-12-30T11:52:44Z

It seems that it became clear why we see the forwarder (and node) hanging.
If I understand correctly, AF_XDP moves frames directly to VPP, without Linux network stack. But we know that the forwarder uses hostNetwork: true - https://github.com/networkservicemesh/deployments-k8s/blob/main/apps/forwarder-vpp/forwarder.yaml#L19. This is required for the interdomain.

So, when VPP takes the uplink interface, it grabs the primary node interface. And traffic goes directly to the VPP, bypassing Linux. Therefore, we lose connection with the node and it seems to us that it hangs.

@edwarnicke
As I see it, Calico-vpp has a similar scenario - https://projectcalico.docs.tigera.io/reference/vpp/host-network
Should we take a similar approach?

denis-tingaikin · 2023-01-02T08:47:30Z

@glazychev-art

As I see it, Calico-vpp has a similar scenario - https://projectcalico.docs.tigera.io/reference/vpp/host-network
Should we take a similar approach?

Could you please say more?

Also, as I know AF_XDP is not working with calico. Am I wrong?

edwarnicke · 2023-01-04T20:49:44Z

@glazychev-art Look into AD_XDP and eBPF. You should be able to craft an eBPF program that is passed in for AD_XDP that only passes on VXLAN/Wireguard/IPSEC packets (sort of like pinhole) and then that traffic will go to VPP, and all other traffic will go to the kernel interface.

glazychev-art · 2023-01-10T09:26:45Z

Most likely the action plan will be:

run vpp af_xdp with a custom eBPF program (fix possible bugs)
figure out how to communicate with eBPF from golang
check kernel/libbpf/vpp-af-xdp compatibility
update vpp version: ~22h
a. fix AF_PACKET ~1h
b. fix wireguard [vpp] ~ 6h
c. fix ACLs ~ 3h
d. check calico-vpp ~ 4h
e. check and fix other possible problems ~ 8h
implement an eBPF program ~8h
create a chain element to update eBPF map (it will contain UDP ports)
update cmd-forwarder-vpp: ~4h
a. apply the new chain element ~ 1h
b. update VppInit function (add AF_XDP) ~2h
c. update forwarder Dockerfile to build it correctly ~ 1h
Risks ~12h

glazychev-art · 2023-01-17T14:42:15Z

Current state:

Prepared a eBPF program
Built govpp with this patch - https://gerrit.fd.io/r/c/vpp/+/37274
Run cmd-forwarder-vpp docker tests - they are working very well. They don't work without the patch from step 2
Still have a problem with kubernetes - forwarders not responding after creation

There was an idea to update VPP to the latest version.
Docker tests also don't work without https://gerrit.fd.io/r/c/vpp/+/37274, and the problem with kubernetes was not resolved.

Perhaps the patch https://gerrit.fd.io/r/c/vpp/+/37274 is not entirely correct if we run the cluster locally (kind). I continue to work in this direction.

edwarnicke · 2023-01-17T15:00:48Z

@glazychev-art Is calico-vpp being on an older vpp version still blocking us updating to a more recent vpp version?

glazychev-art · 2023-01-17T15:11:23Z

@edwarnicke
Not really - it was updated recently (on main branch) - projectcalico/vpp-dataplane@d8288e1.
I've tested this vpp revision and seen a few problems:

Minor - we need to use newer api version for AF_PACKET. For unknown reasons, our current one no longer works.
More serious - with many improvements to Wireguard vpp, the event mechanism (when we get that wireguard interface is ready) was broken. Will need to figure it out
We need to deal with ACLs because our current usage returns an error

Do we need to update?

edwarnicke · 2023-01-17T15:29:47Z

@glazychev-art Its probably a good idea to update yes

edwarnicke · 2023-01-17T15:32:34Z

@glazychev-art It might also be a good idea to put in tests in VPP to prevent some of the breakage we are seeing happening in the future.

glazychev-art · 2023-01-18T12:47:53Z

@edwarnicke
I have a question related to eBPF program. Currently I've implemented it so that it only filters IP UDP packets based on a port.

But what do we do with ARP packets?
We definitely need ARP packets to be handled by the kernel for the proper pod function.
On the other hand, we also need ARP in the VPP so that we can find out the MAC addresses of other forwarders.

Perhaps we need also filter frames by Destination MAC, if they are different for VPP and kernel interfaces

Do you have any thoughts?

edwarnicke · 2023-01-19T18:48:59Z

@glazychev-art Could you point me to your existing eBPF program?

edwarnicke · 2023-01-19T18:55:36Z

@glazychev-art Have you looked at bpf_clone_redirect() ?

glazychev-art · 2023-01-19T19:15:59Z

@edwarnicke
Currenlty eBPF program looks like

/*
 * SPDX-License-Identifier: GPL-2.0 OR Apache-2.0
 * Dual-licensed under GPL version 2.0 or Apache License version 2.0
 * Copyright (c) 2020 Cisco and/or its affiliates.
 */
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <bpf/bpf_helpers.h>


/*
 * when compiled, debug print can be viewed with eg.
 * sudo cat /sys/kernel/debug/tracing/trace_pipe
 */
#ifdef DEBUG
#define s__(n)   # n
#define s_(n)    s__(n)
#define x_(fmt)  __FILE__ ":" s_(__LINE__) ": " fmt "\n"
#define DEBUG_PRINT_(fmt, ...) do { \
    const char fmt__[] = fmt; \
    bpf_trace_printk(fmt__, sizeof(fmt), ## __VA_ARGS__); } while(0)
#define DEBUG_PRINT(fmt, ...)   DEBUG_PRINT_ (x_(fmt), ## __VA_ARGS__)
#else   /* DEBUG */
#define DEBUG_PRINT(fmt, ...)
#endif  /* DEBUG */

#define ntohs(x)        __constant_ntohs(x)
#define MAX_NR_PORTS 65536
			    
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, MAX_NR_PORTS);
    __type(key, int);
    __type(value, unsigned short int);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} ports_map SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_XSKMAP);
    __uint(max_entries, 64);
    __type(key, int);
    __type(value, int);
} xsks_map SEC(".maps");


SEC("xdp_sock")
int xdp_sock_prog(struct xdp_md *ctx) {
    const void *data = (void *)(long)ctx->data;
    const void *data_end = (void *)(long)ctx->data_end;
    int qid = ctx->rx_queue_index;
    
    DEBUG_PRINT("rx %ld bytes packet", (long)data_end - (long)data);
    
    if (data + sizeof(struct ethhdr) > data_end) {
        DEBUG_PRINT("packet too small");
        return XDP_PASS;
    }
   
    const struct ethhdr *eth = data;
    if (eth->h_proto != ntohs(ETH_P_IP) && eth->h_proto != ntohs(ETH_P_ARP)) {
          return XDP_PASS;
    }
    
    if (eth->h_proto == ntohs(ETH_P_ARP)) {
      if (!bpf_map_lookup_elem(&xsks_map, &qid))
      {
        DEBUG_PRINT("no socket found");
        return XDP_PASS;
      }

      DEBUG_PRINT("going to socket %d", qid);
      return bpf_redirect_map(&xsks_map, qid, 0);
    }

    if (data + sizeof(struct ethhdr) + sizeof(struct iphdr) + sizeof(struct udphdr) > data_end) {
        DEBUG_PRINT("packet too small");
        return XDP_PASS;
    }

    const struct iphdr *ip = (void *)(eth + 1);
    switch (ip->protocol) {
      case IPPROTO_UDP: {
            const struct udphdr *udp = (void *)(ip + 1);
            const int port = ntohs(udp->dest);
            if (!bpf_map_lookup_elem(&ports_map, &port))
      	    {
      	        DEBUG_PRINT("unsupported udp dst port %x", (int)udp->dest);
        	return XDP_PASS;
      	    }
      	    break;
          }
      default:
        DEBUG_PRINT("unsupported ip proto %x", (int)ip->protocol);
        return XDP_PASS;
    }

    if (!bpf_map_lookup_elem(&xsks_map, &qid))
      {
        DEBUG_PRINT("no socket found");
        return XDP_PASS;
      }

    DEBUG_PRINT("going to socket %d", qid);
    return bpf_redirect_map(&xsks_map, qid, 0);
}

/* actually Dual GPLv2/Apache2, but GPLv2 as far as kernel is concerned */
SEC("license")
char _license[] = "GPL";

In short, we pass all ARP packets to VPP and filter IP packets - if UDP port belongs to VxLAN, Wireguard and so on - we pass it VPP, otherwise - to kernel

glazychev-art · 2023-01-19T19:17:40Z

@edwarnicke
Yes, I looked at long bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags).
But as you can see, it receives sk_buff. So it seems, that we can call this function after XDP layer, when we already chose kernel (in TC ingress layer for example).
Probably we need to create sk_buff manually in xdp function and call bpf_clone_redirect.

edwarnicke · 2023-01-19T20:52:36Z

@glazychev-art Trying to create an sk_buff sounds like it might be prone to error.

We may also want to think through what the problem really is. Is the problem that we are not receiving arp packets, or is the problem how we construct our neighbor table in VPP?

glazychev-art · 2023-01-20T04:49:25Z

I think the problem is that we are not receiving arp packets.
We construct the VPP neighbor table correctly - we take all ARP entries from the kernel known at the start time.
Next, we need to know about other pods in the VPP - for example, about another forwarder in order to set up a tunnel.
On the other hand, we also need to process arp in the kernel too - for example, when passing the request forwarder --> manager.

edwarnicke · 2023-02-15T15:12:03Z

So, kernel will only accept and remember the response if it sent the request itself.

Have we checked this? It might be true, but I wouldn't simply presume it.

glazychev-art · 2023-02-16T10:21:22Z

I think I tested something similar. Without NeighSubscribeAt, but I looked at ip neigh.

But definitely, we need to double-check that.

glazychev-art · 2023-03-14T13:07:10Z

@edwarnicke
It looks like that NeighSubscribeAt and IPNeighborAddDel are working fine for IPv4 interfaces.

But this is not the case for IPv6. Since it has neighbor mechanism, Linux side doesn't save NA (Neighbor Advertisement) if we send NS (Neighbor Solicitation) from the VPP side. I tried changing the Solicited and Override flags in the response but it didn't help.

Should we continue to work in this direction or does it make sense to implement only IPv4?

glazychev-art · 2023-03-27T12:42:15Z

Current state:

Rechecked work with flags (including Router flag)
Used wireshark to check if the packet is valid (e.g. checksum)
It looks like Linux really rejects unexpected NA:

   When a valid Neighbor Advertisement is received (either solicited or
   unsolicited), the Neighbor Cache is searched for the target's entry.
   If no entry exists, the advertisement SHOULD be silently discarded.
   There is no need to create an entry if none exists, since the
   recipient has apparently not initiated any communication with the
   target.

https://www.rfc-editor.org/rfc/rfc4861.html#section-7.2.5

glazychev-art · 2023-03-28T12:46:15Z

Current state:

double-checked the possibility of copying the incoming packet both to the userspace and to the kernel space - did not find a way.
considered using TC egress level. We could store there whether an NS (neighbour solicitation) was issued from the kernel space. And use this information in XDP ingress layer. But it looks like this will bring more problems, because we don't know whether we will receive an answer at all.
perhaps it makes sense to send an NS during the NSM Request from the chain element. For example, we can try - https://pkg.go.dev/github.com/mdlayher/ndp

glazychev-art · 2023-03-28T14:20:52Z

I've tried to resolve IPv6 neighbors in the kernel space manually.
And it works, because the forwarder receives the event from the netlink and adds the neighbor to vpp via IPNeighborAddDel. Ping works after that.

edwarnicke · 2023-03-28T14:22:49Z

Are we typically looking for anything other than the mac address of the gateway IP for the IPv6 case?

If so, could we simply scrape the linux Neighbor table for v6?

edwarnicke · 2023-03-28T14:40:30Z

This may also help:

https://insights.sei.cmu.edu/blog/ping-sweeping-in-ipv6/

glazychev-art · 2023-03-29T13:36:43Z

Current state:

checked this - https://insights.sei.cmu.edu/blog/ping-sweeping-in-ipv6/. Looks interesting. But pinging ff02::1 doesn't always invokes Neighbor Discovering. Let's look at wireshark:
ICMP IPv6 request:

ICMP IPv6 response:

So we see that the linux doesn't send Neighbour Solicitation at all (fd00:10:244:1::1 is a gateway).

Instead, we can resolve gateways for a given interface in a slightly different way. Before creating AF_XDP, we can use netlink.RouteList and then ping every gateway found. This will allow us to add neighbor entries to the linux. And they will later be read and added to the VPP.

@edwarnicke
What do you think?

glazychev-art · 2023-03-31T13:23:10Z

@edwarnicke
It seems that it is not possible to run more than one AF_XDP forwarder on one node, unlike AF_PACKET (forwarders use hostNetwork). Logs from the second:

af_xdp               [error ]: af_xdp_create_queue: xsk_socket__create() failed (is linux netdev vpp1host up?): Device or resource busy
create interface af_xdp: xsk_socket__create() failed (is linux netdev vpp1host up?): Device or resource busy

glazychev-art · 2023-04-06T15:33:25Z

Current state:
Tested a new forwarder on public clusters:
GKE - doesn't start. Logs from forwarder:

Apr  3 05:38:16.954 [INFO] [cmd:vpp] libbpf: Kernel error message: virtio_net: XDP expects header/data in single page, any_header_sg required
Apr  3 05:38:16.954 [INFO] [cmd:vpp] vpp[10244]: af_xdp: af_xdp_load_program: bpf_set_link_xdp_fd(eth0) failed: Invalid argument
Apr  3 05:38:18.228 [ERRO] [cmd:/bin/forwarder] [duration:12.809608ms] [hostIfName:eth0] [vppapi:AfXdpCreate] VPPApiError: System call error #6 (-16)
panic: error: VPPApiError: System call error #6 (-16)

AWS - doesn't start. Logs from forwarder:

Apr  3 13:24:25.406 [INFO] [cmd:vpp] libbpf: Kernel error message: veth: Peer MTU is too large to set XDP
Apr  3 13:24:25.406 [INFO] [cmd:vpp] vpp[10508]: af_xdp: af_xdp_load_program: bpf_set_link_xdp_fd(eth0) failed: Numerical result out of range
Apr  3 13:24:26.563 [ERRO] [cmd:/bin/forwarder] [duration:18.015838ms] [hostIfName:eth0] [vppapi:AfXdpCreate] VPPApiError: System call error #6 (-16)
panic: error: VPPApiError: System call error #6 (-16)

Packet - started, but ping doesn't work. This is most likely due to the fact that af_packet vpp plugin doesn't process bonded interfaces (they are used by packet)
AKS - ping works only without hostNetwork: true flag. But poor performance (compared to AF_PACKET about 2 times slower)
Kind - works, but performance has not increased (even decreased slightly).

Measurements on Kind

`iperf3` TCP

Ethernet remote mechanism (VxLAN)

AF_PACKET:

Connecting to host 172.16.1.100, port 5201
[  5] local 172.16.1.101 port 43488 connected to 172.16.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  46.6 MBytes   391 Mbits/sec  174    969 KBytes       
[  5]   1.00-2.00   sec  48.8 MBytes   409 Mbits/sec    0   1.02 MBytes       
[  5]   2.00-3.00   sec  58.8 MBytes   493 Mbits/sec    0   1.07 MBytes       
[  5]   3.00-4.00   sec  53.8 MBytes   451 Mbits/sec    0   1.10 MBytes       
[  5]   4.00-5.00   sec  46.2 MBytes   388 Mbits/sec    0   1.12 MBytes       
[  5]   5.00-6.00   sec  62.5 MBytes   524 Mbits/sec    0   1.13 MBytes       
[  5]   6.00-7.00   sec  45.0 MBytes   377 Mbits/sec    0   1.14 MBytes       
[  5]   7.00-8.00   sec  65.0 MBytes   545 Mbits/sec    0   1.18 MBytes       
[  5]   8.00-9.00   sec  56.2 MBytes   472 Mbits/sec    0   1.22 MBytes       
[  5]   9.00-10.00  sec  45.0 MBytes   377 Mbits/sec    0   1.24 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   528 MBytes   443 Mbits/sec  174             sender
[  5]   0.00-10.08  sec   526 MBytes   438 Mbits/sec                  receiver

AF_XDP:

Connecting to host 172.16.1.100, port 5201
[  5] local 172.16.1.101 port 36586 connected to 172.16.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  46.9 MBytes   393 Mbits/sec  1326    113 KBytes       
[  5]   1.00-2.00   sec  41.3 MBytes   346 Mbits/sec  1114   42.2 KBytes       
[  5]   2.00-3.00   sec  36.2 MBytes   304 Mbits/sec  1058   34.0 KBytes       
[  5]   3.00-4.00   sec  54.2 MBytes   455 Mbits/sec  1560   20.4 KBytes       
[  5]   4.00-5.00   sec  36.3 MBytes   304 Mbits/sec  1149   44.9 KBytes       
[  5]   5.00-6.00   sec  27.9 MBytes   234 Mbits/sec  953   20.4 KBytes       
[  5]   6.00-7.00   sec  37.9 MBytes   318 Mbits/sec  1106   25.9 KBytes       
[  5]   7.00-8.00   sec  33.1 MBytes   278 Mbits/sec  964   25.9 KBytes       
[  5]   8.00-9.00   sec  39.2 MBytes   329 Mbits/sec  1448   32.7 KBytes       
[  5]   9.00-10.00  sec  51.1 MBytes   429 Mbits/sec  1445   23.1 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   404 MBytes   339 Mbits/sec  12123             sender
[  5]   0.00-10.00  sec   403 MBytes   338 Mbits/sec                  receiver

Note: many Retrs

IP remote mechanism (Wireguard)

AF_PACKET:

Connecting to host 172.16.1.100, port 5201
[  5] local 172.16.1.101 port 49978 connected to 172.16.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  88.3 MBytes   740 Mbits/sec    2    487 KBytes       
[  5]   1.00-2.00   sec  87.4 MBytes   733 Mbits/sec    0    606 KBytes       
[  5]   2.00-3.00   sec  76.5 MBytes   642 Mbits/sec    6    495 KBytes       
[  5]   3.00-4.00   sec  74.6 MBytes   626 Mbits/sec    0    596 KBytes       
[  5]   4.00-5.00   sec  42.3 MBytes   355 Mbits/sec    0    649 KBytes       
[  5]   5.00-6.00   sec  21.7 MBytes   182 Mbits/sec    8    473 KBytes       
[  5]   6.00-7.00   sec  36.9 MBytes   310 Mbits/sec    0    545 KBytes       
[  5]   7.00-8.00   sec  88.9 MBytes   746 Mbits/sec    0    636 KBytes       
[  5]   8.00-9.00   sec  82.4 MBytes   691 Mbits/sec    8    539 KBytes       
[  5]   9.00-10.00  sec  92.0 MBytes   772 Mbits/sec    0    664 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   691 MBytes   580 Mbits/sec   24             sender
[  5]   0.00-10.03  sec   690 MBytes   577 Mbits/sec                  receiver

AF_XDP:

Connecting to host 172.16.1.100, port 5201
[  5] local 172.16.1.101 port 46608 connected to 172.16.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   104 MBytes   873 Mbits/sec   47    645 KBytes       
[  5]   1.00-2.00   sec  98.7 MBytes   828 Mbits/sec   39    538 KBytes       
[  5]   2.00-3.00   sec  90.9 MBytes   763 Mbits/sec    0    655 KBytes       
[  5]   3.00-4.00   sec  65.2 MBytes   547 Mbits/sec   14    533 KBytes       
[  5]   4.00-5.00   sec  53.3 MBytes   447 Mbits/sec    7    603 KBytes       
[  5]   5.00-6.00   sec  52.4 MBytes   440 Mbits/sec    0    660 KBytes       
[  5]   6.00-7.00   sec  39.1 MBytes   328 Mbits/sec    8    526 KBytes       
[  5]   7.00-8.00   sec  38.7 MBytes   325 Mbits/sec    0    587 KBytes       
[  5]   8.00-9.00   sec  94.8 MBytes   796 Mbits/sec    0    675 KBytes       
[  5]   9.00-10.00  sec  96.0 MBytes   805 Mbits/sec    7    618 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec  122             sender
[  5]   0.00-10.05  sec   732 MBytes   611 Mbits/sec                  receiver

`iperf3` UDP

AF_PACKET

Accepted connection from 172.16.1.101, port 39452
[  5] local 172.16.1.100 port 5201 connected to 172.16.1.101 port 40692
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec   118 MBytes   986 Mbits/sec  0.077 ms  525084/613923 (86%)  
[  5]   1.00-2.00   sec   117 MBytes   980 Mbits/sec  0.002 ms  576553/664766 (87%)  
[  5]   2.00-3.00   sec   120 MBytes  1.01 Gbits/sec  0.050 ms  576732/667716 (86%)  
[  5]   3.00-4.00   sec   120 MBytes  1.00 Gbits/sec  0.002 ms  581367/671794 (87%)  
[  5]   4.00-5.00   sec   120 MBytes  1.00 Gbits/sec  0.002 ms  612951/703307 (87%)  
[  5]   5.00-6.00   sec   122 MBytes  1.03 Gbits/sec  0.001 ms  535717/628083 (85%)  
[  5]   6.00-7.00   sec   117 MBytes   980 Mbits/sec  0.041 ms  578869/667122 (87%)  
[  5]   7.00-8.00   sec   119 MBytes  1.00 Gbits/sec  0.002 ms  577990/668247 (86%)  
[  5]   8.00-9.00   sec   116 MBytes   974 Mbits/sec  0.002 ms  582754/670426 (87%)  
[  5]   9.00-10.00  sec   120 MBytes  1.01 Gbits/sec  0.024 ms  579465/670305 (86%)  
[  5]  10.00-10.21  sec  2.50 MBytes   100 Mbits/sec  0.002 ms  38604/40489 (95%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.21  sec  1.16 GBytes   979 Mbits/sec  0.002 ms  5766086/6666178 (86%)  receiver

AF_XDP

[  5] local 172.16.1.100 port 5201 connected to 172.16.1.101 port 41437
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec   156 MBytes  1.31 Gbits/sec  0.001 ms  491872/609832 (81%)  
[  5]   1.00-2.00   sec   168 MBytes  1.41 Gbits/sec  0.001 ms  557337/684419 (81%)  
[  5]   2.00-3.00   sec   166 MBytes  1.39 Gbits/sec  0.001 ms  551925/677423 (81%)  
[  5]   3.00-4.00   sec   163 MBytes  1.36 Gbits/sec  0.001 ms  557553/680349 (82%)  
[  5]   4.00-5.00   sec   165 MBytes  1.38 Gbits/sec  0.001 ms  553140/677503 (82%)  
[  5]   5.00-6.00   sec   170 MBytes  1.43 Gbits/sec  0.002 ms  558848/687616 (81%)  
[  5]   6.00-7.00   sec   161 MBytes  1.35 Gbits/sec  0.001 ms  558833/680687 (82%)  
[  5]   7.00-8.00   sec   162 MBytes  1.36 Gbits/sec  0.001 ms  575608/698261 (82%)  
[  5]   8.00-9.00   sec   163 MBytes  1.36 Gbits/sec  0.001 ms  550618/673519 (82%)  
[  5]   9.00-10.00  sec   169 MBytes  1.42 Gbits/sec  0.001 ms  555133/683148 (81%)  
[  5]  10.00-11.00  sec   434 KBytes  3.55 Mbits/sec  3.840 ms  0/320 (0%)  
[  5]  11.00-12.00  sec  43.4 KBytes   355 Kbits/sec  7.520 ms  0/32 (0%)

Conclusions

Client sends UDP:
AF_XDP is faster than AF_PACKET by ~40% (1.37 Gbits/sec vs 0.98 Gbits/sec)

Client sends TCP:
Average of 10 runs
Ethernet:
AF_PACKET is faster than AF_XDP by ~13% (460.3 Mbits/sec vs 407.2 Mbits/sec)
IP:
AF_XDP is equal to AF_PACKET (372,1 Mbits/sec vs 370,2 Mbits/sec)

glazychev-art · 2023-04-10T08:52:53Z

Estimation

To run ci on kind cluster with xdp we need:

Prepare a PR for sdk-vpp ~ 1h
Prepare a PR for cmd-forwarder-vpp ~ 1h
Add a new afxdp suite to deployments-k8s ~ 2h
Add and test the suite on kind ~ 2h
Risks ~ 2h

glazychev-art · 2023-04-10T13:29:26Z

@edwarnicke
Due to problems with public clusters (see the beginning of the post), there is an option to support af_xdp only on kind in this release.
What do you think of it?

edwarnicke · 2023-04-10T22:42:50Z

@glazychev-art Its strange that AF_PACKET is faster for TCP but slower for UDP. Do we have any notion of why?

glazychev-art · 2023-04-11T04:00:56Z

@edwarnicke
Yes, there are a couple of guesses:

If we look at iperf3 logs from TCP mode, we will look a huge number of retransmissions:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  46.9 MBytes   393 Mbits/sec  1326    113 KBytes       
[  5]   1.00-2.00   sec  41.3 MBytes   346 Mbits/sec  1114   42.2 KBytes
...

(we don't see them with AF_PACKET)
2. I was able to reproduce something similar on bare vpp instances:
https://lists.fd.io/g/vpp-dev/topic/af_xdp_performance/98105671
3. If we look at the vpp gerrit, we can see several open af_xdp patches, that the owners claim will greatly increase performance (I tried them, it didn't help for TCP).
https://gerrit.fd.io/r/c/vpp/+/37653
https://gerrit.fd.io/r/c/vpp/+/38135

So, I think the problem may be in the VPP plugin.

glazychev-art · 2023-04-28T07:21:49Z

As part of this task, we have done the integration of AF_XDP interface on the kind cluster. This is working successfully.
https://github.com/networkservicemesh/integration-k8s-kind/actions/runs/4798046461/jobs/8535800517

On public clusters, we ran into problems. Separate issues were created
#859
Performance:
#860

I think this issue can be closed

edwarnicke assigned glazychev-art Jul 15, 2021

glazychev-art mentioned this issue Aug 9, 2021

Add AF_XDP support #303

Closed

denis-tingaikin added this to Release v1.8.0 Dec 9, 2022

NikitaSkrynnik moved this to Todo in Release v1.8.0 Dec 23, 2022

denis-tingaikin moved this from Todo to In Progress in Release v1.8.0 Dec 27, 2022

glazychev-art moved this from Todo to In Progress in Release v1.9.0 Mar 10, 2023

denis-tingaikin moved this from In Progress to Under review in Release v1.9.0 Mar 20, 2023

glazychev-art moved this from Under review to In Progress in Release v1.9.0 Mar 27, 2023

This was referenced Apr 10, 2023

Add AF_XDP support #849

Merged

Add afxdppinhole chain element networkservicemesh/sdk-vpp#706

Merged

glazychev-art closed this as completed Apr 11, 2023

github-project-automation bot moved this from In Progress to Done in Release v1.9.0 Apr 11, 2023

glazychev-art reopened this Apr 11, 2023

glazychev-art moved this from Done to In Progress in Release v1.9.0 Apr 11, 2023

glazychev-art mentioned this issue Apr 11, 2023

AF_XDP examples networkservicemesh/deployments-k8s#8984

Merged

9 tasks

d-uzlov moved this from In Progress to Under review in Release v1.9.0 Apr 12, 2023

glazychev-art mentioned this issue Apr 25, 2023

Add af_xdp tests networkservicemesh/integration-k8s-kind#810

Merged

glazychev-art closed this as completed Apr 28, 2023

github-project-automation bot moved this from Under review to Done in Release v1.9.0 Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable AF_XDP for cmd-forwarder-vpp management interface #283

Enable AF_XDP for cmd-forwarder-vpp management interface #283

edwarnicke commented Jul 15, 2021

glazychev-art commented Aug 9, 2021 •

edited

Loading

glazychev-art commented Aug 20, 2021

glazychev-art commented Dec 30, 2022

denis-tingaikin commented Jan 2, 2023 •

edited

Loading

edwarnicke commented Jan 4, 2023

glazychev-art commented Jan 10, 2023 •

edited

Loading

glazychev-art commented Jan 17, 2023

edwarnicke commented Jan 17, 2023

glazychev-art commented Jan 17, 2023 •

edited

Loading

edwarnicke commented Jan 17, 2023

edwarnicke commented Jan 17, 2023

glazychev-art commented Jan 18, 2023 •

edited

Loading

edwarnicke commented Jan 19, 2023

edwarnicke commented Jan 19, 2023

glazychev-art commented Jan 19, 2023 •

edited

Loading

glazychev-art commented Jan 19, 2023

edwarnicke commented Jan 19, 2023

glazychev-art commented Jan 20, 2023

edwarnicke commented Feb 15, 2023

glazychev-art commented Feb 16, 2023

glazychev-art commented Mar 14, 2023

glazychev-art commented Mar 27, 2023

glazychev-art commented Mar 28, 2023

glazychev-art commented Mar 28, 2023

edwarnicke commented Mar 28, 2023

edwarnicke commented Mar 28, 2023

glazychev-art commented Mar 29, 2023

glazychev-art commented Mar 31, 2023

glazychev-art commented Apr 6, 2023 •

edited

Loading

glazychev-art commented Apr 10, 2023

glazychev-art commented Apr 10, 2023

edwarnicke commented Apr 10, 2023

glazychev-art commented Apr 11, 2023 •

edited

Loading

glazychev-art commented Apr 28, 2023

Enable AF_XDP for cmd-forwarder-vpp management interface #283

Enable AF_XDP for cmd-forwarder-vpp management interface #283

Comments

edwarnicke commented Jul 15, 2021

glazychev-art commented Aug 9, 2021 • edited Loading

glazychev-art commented Aug 20, 2021

glazychev-art commented Dec 30, 2022

denis-tingaikin commented Jan 2, 2023 • edited Loading

edwarnicke commented Jan 4, 2023

glazychev-art commented Jan 10, 2023 • edited Loading

glazychev-art commented Jan 17, 2023

edwarnicke commented Jan 17, 2023

glazychev-art commented Jan 17, 2023 • edited Loading

edwarnicke commented Jan 17, 2023

edwarnicke commented Jan 17, 2023

glazychev-art commented Jan 18, 2023 • edited Loading

edwarnicke commented Jan 19, 2023

edwarnicke commented Jan 19, 2023

glazychev-art commented Jan 19, 2023 • edited Loading

glazychev-art commented Jan 19, 2023

edwarnicke commented Jan 19, 2023

glazychev-art commented Jan 20, 2023

edwarnicke commented Feb 15, 2023

glazychev-art commented Feb 16, 2023

glazychev-art commented Mar 14, 2023

glazychev-art commented Mar 27, 2023

glazychev-art commented Mar 28, 2023

glazychev-art commented Mar 28, 2023

edwarnicke commented Mar 28, 2023

edwarnicke commented Mar 28, 2023

glazychev-art commented Mar 29, 2023

glazychev-art commented Mar 31, 2023

glazychev-art commented Apr 6, 2023 • edited Loading

Measurements on Kind

iperf3 TCP

Ethernet remote mechanism (VxLAN)

IP remote mechanism (Wireguard)

iperf3 UDP

Conclusions

glazychev-art commented Apr 10, 2023

Estimation

glazychev-art commented Apr 10, 2023

edwarnicke commented Apr 10, 2023

glazychev-art commented Apr 11, 2023 • edited Loading

glazychev-art commented Apr 28, 2023

glazychev-art commented Aug 9, 2021 •

edited

Loading

denis-tingaikin commented Jan 2, 2023 •

edited

Loading

glazychev-art commented Jan 10, 2023 •

edited

Loading

glazychev-art commented Jan 17, 2023 •

edited

Loading

glazychev-art commented Jan 18, 2023 •

edited

Loading

glazychev-art commented Jan 19, 2023 •

edited

Loading

glazychev-art commented Apr 6, 2023 •

edited

Loading

`iperf3` TCP

`iperf3` UDP

glazychev-art commented Apr 11, 2023 •

edited

Loading