-
Notifications
You must be signed in to change notification settings - Fork 144
/
libxdp.3
521 lines (463 loc) · 23 KB
/
libxdp.3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
.TH "libxdp" "3" "June 29, 2023" "v1.4.3" "libxdp - library for loading XDP programs"
.SH "NAME"
libxdp \- library for attaching XDP programs and using AF_XDP sockets
.SH "SYNOPSIS"
.PP
This directory contains the files for the \fIlibxdp\fP library for
attaching XDP programs to network interfaces and using AF_XDP
sockets. The library is fairly lightweight and relies on \fIlibbpf\fP to
do the heavy lifting for processing eBPF object files etc.
.PP
\fILibxdp\fP provides two primary features on top of \fIlibbpf\fP. The first is
the ability to load multiple XDP programs in sequence on a single
network device (which is not natively supported by the kernel). This
support relies on the \fIfreplace\fP functionality in the kernel, which
makes it possible to attach an eBPF program as a replacement for a
global function in another (already loaded) eBPF program. The second
main feature is helper functions for configuring AF_XDP sockets as
well as reading and writing packets from these sockets.
.PP
Some of the functionality provided by libxdp depends on particular kernel
features; see the "Kernel feature compatibility" section below for details.
.SS "Using libxdp from an application"
.PP
Basic usage of libxdp from an application is quite straight forward. The
following example loads, then unloads, an XDP program from the 'lo' interface:
.RS
.nf
\fC#define IFINDEX 1
struct xdp_program *prog;
int err;
prog = xdp_program__open_file("my-program.o", "section_name", NULL);
err = xdp_program__attach(prog, IFINDEX, XDP_MODE_NATIVE, 0);
if (!err)
xdp_program__detach(prog, IFINDEX, XDP_MODE_NATIVE, 0);
xdp_program__close(prog);
\fP
.fi
.RE
.PP
The \fIxdp_program\fP structure is an opaque structure that represents a single XDP
program. \fIlibxdp\fP contains functions to create such a struct either from a BPF
object file on disk, from a \fIlibbpf\fP BPF object, or from an identifier of a
program that is already loaded into the kernel:
.RS
.nf
\fCstruct xdp_program *xdp_program__from_bpf_obj(struct bpf_object *obj,
const char *section_name);
struct xdp_program *xdp_program__find_file(const char *filename,
const char *section_name,
struct bpf_object_open_opts *opts);
struct xdp_program *xdp_program__open_file(const char *filename,
const char *section_name,
struct bpf_object_open_opts *opts);
struct xdp_program *xdp_program__from_fd(int fd);
struct xdp_program *xdp_program__from_id(__u32 prog_id);
struct xdp_program *xdp_program__from_pin(const char *pin_path);
\fP
.fi
.RE
.PP
The functions that open a BPF object or file need the function name of the XDP
program as well as the file name or object, since an ELF file can contain
multiple XDP programs. The \fIxdp_program__find_file()\fP function takes a filename
without a path, and will look for the object in \fILIBXDP_OBJECT_PATH\fP which
defaults to \fI/usr/lib/bpf\fP (or \fI/usr/lib64/bpf\fP on systems using a split library
path). This is convenient for applications shipping pre-compiled eBPF object
files.
.PP
The \fIxdp_program__attach()\fP function will attach the program to an interface,
building a dispatcher program to execute it. Multiple programs can be attached
at once with \fIxdp_program__attach_multi()\fP; they will be sorted in order of
their run priority, and execution from one program to the next will proceed
based on the chain call actions defined for each program (see the \fBProgram
metadata\fP section below). Because the loading process involves modifying the
attach type of the program, the attach functions only work with \fIstruct
xdp_program\fP objects that have not yet been loaded into the kernel.
.PP
When using the attach functions to attach to an interface that already has an
XDP program loaded, libxdp will attempt to add the program to the list of loaded
programs. However, this may fail, either due to missing kernel support, or
because the already-attached program was not loaded using a dispatcher
compatible with libxdp. If the kernel support for incremental attach (merged in
kernel 5.10) is missing, the only way to actually run multiple programs on a
single interface is to attach them all at the same time with
\fIxdp_program__attach_multi()\fP. If the existing program is not an XDP dispatcher,
that program will have to be detached from the interface before libxdp can
attach a new one. This can be done by calling \fIxdp_program__detach()\fP with a
reference to the loaded program; but note that this will of course break any
application relying on that other XDP program to be present.
.SH "Program metadata"
.PP
To support multiple XDP programs on the same interface, libxdp uses two pieces
of metadata for each XDP program: Run priority and chain call actions.
.SS "Run priority"
.PP
This is the priority of the program and is a simple integer used
to sort programs when loading multiple programs onto the same interface.
Programs that wish to run early (such as a packet filter) should set low values
for this, while programs that want to run later (such as a packet forwarder or
counter) should set higher values. Note that later programs are only run if the
previous programs end with a return code that is part of its chain call actions
(see below). If not specified, the default priority value is 50.
.SS "Chain call actions"
.PP
These are the program return codes that the program indicate for packets that
should continue processing. If the program returns one of these actions, later
programs in the call chain will be run, whereas if it returns any other action,
processing will be interrupted, and the XDP dispatcher will return the verdict
immediately. If not set, this defaults to just XDP_PASS, which is likely the
value most programs should use.
.SS "Specifying metadata"
.PP
The metadata outlined above is specified as BTF information embedded in the ELF
file containing the XDP program. The \fIxdp_helpers.h\fP file shipped with libxdp
contains helper macros to include this information, which can be used as
follows:
.RS
.nf
\fC#include <bpf/bpf_helpers.h>
#include <xdp/xdp_helpers.h>
struct {
__uint(priority, 10);
__uint(XDP_PASS, 1);
__uint(XDP_DROP, 1);
} XDP_RUN_CONFIG(my_xdp_func);
\fP
.fi
.RE
.PP
This example specifies that the XDP program in \fImy_xdp_func\fP should have
priority 10 and that its chain call actions are \fIXDP_PASS\fP and \fIXDP_DROP\fP.
In a source file with multiple XDP programs in the same file, a definition like
the above can be included for each program (main XDP function). Any program that
does not specify any config information will use the default values outlined
above.
.SS "Inspecting and modifying metadata"
.PP
\fIlibxdp\fP exposes the following functions that an application can use to inspect
and modify the metadata on an XDP program. Modification is only possible before
a program is attached on an interface. These functions won't modify the BTF
information itself, but the new values will be stored as part of the program
attachment.
.RS
.nf
\fCunsigned int xdp_program__run_prio(const struct xdp_program *xdp_prog);
int xdp_program__set_run_prio(struct xdp_program *xdp_prog,
unsigned int run_prio);
bool xdp_program__chain_call_enabled(const struct xdp_program *xdp_prog,
enum xdp_action action);
int xdp_program__set_chain_call_enabled(struct xdp_program *prog,
unsigned int action,
bool enabled);
int xdp_program__print_chain_call_actions(const struct xdp_program *prog,
char *buf,
size_t buf_len);
\fP
.fi
.RE
.SH "The dispatcher program"
.PP
To support multiple non-offloaded programs on the same network interface,
\fIlibxdp\fP uses a \fBdispatcher program\fP which is a small wrapper program that will
call each component program in turn, expect the return code, and then chain call
to the next program based on the chain call actions of the previous program (see
the \fBProgram metadata\fP section above).
.PP
While applications using \fIlibxdp\fP do not need to know the details of the
dispatcher program to just load an XDP program unto an interface, \fIlibxdp\fP does
expose the dispatcher and its attached component programs, which can be used to
list the programs currently attached to an interface.
.PP
The structure used for this is \fIstruct xdp_multiprog\fP, which can only be
constructed from the programs loaded on an interface based on ifindex. The API
for getting a multiprog reference and iterating through the attached programs
looks like this:
.RS
.nf
\fCstruct xdp_multiprog *xdp_multiprog__get_from_ifindex(int ifindex);
struct xdp_program *xdp_multiprog__next_prog(const struct xdp_program *prog,
const struct xdp_multiprog *mp);
void xdp_multiprog__close(struct xdp_multiprog *mp);
int xdp_multiprog__detach(struct xdp_multiprog *mp, int ifindex);
enum xdp_attach_mode xdp_multiprog__attach_mode(const struct xdp_multiprog *mp);
struct xdp_program *xdp_multiprog__main_prog(const struct xdp_multiprog *mp);
struct xdp_program *xdp_multiprog__hw_prog(const struct xdp_multiprog *mp);
bool xdp_multiprog__is_legacy(const struct xdp_multiprog *mp);
\fP
.fi
.RE
.PP
If a non-offloaded program is attached to the interface which \fIlibxdp\fP doesn't
recognise as a dispatcher program, an \fIxdp_multiprog\fP structure will still be
returned, and \fIxdp_multiprog__is_legacy()\fP will return true for that program
(note that this also holds true if only an offloaded program is loaded). A
reference to that (regular) XDP program can be obtained by
\fIxdp_multiprog__main_prog()\fP. If the program attached to the interface \fBis\fP a
dispatcher program, \fIxdp_multiprog__main_prog()\fP will return a reference to the
dispatcher program itself, which is mainly useful for obtaining other data about
that program (such as the program ID). A reference to an offloaded program can
be acquired using \fIxdp_multiprog_hw_prog()\fP. Function
\fIxdp_multiprog__attach_mode()\fP returns the attach mode of the non-offloaded
program, whether an offloaded program is attached should be checked through
\fIxdp_multiprog_hw_prog()\fP.
.SS "Pinning in bpffs"
.PP
The kernel will automatically detach component programs from the dispatcher once
the last reference to them disappears. To prevent this from happening, \fIlibxdp\fP
will pin the component program references in \fIbpffs\fP before attaching the
dispatcher to the network interface. The pathnames generated for pinning is as
follows:
.IP \(em 4
/sys/fs/bpf/xdp/dispatch-IFINDEX-DID - dispatcher program for IFINDEX with BPF program ID DID
.IP \(em 4
/sys/fs/bpf/xdp/dispatch-IFINDEX-DID/prog0-prog - component program 0, program reference
.IP \(em 4
/sys/fs/bpf/xdp/dispatch-IFINDEX-DID/prog0-link - component program 0, bpf_link reference
.IP \(em 4
/sys/fs/bpf/xdp/dispatch-IFINDEX-DID/prog1-prog - component program 1, program reference
.IP \(em 4
/sys/fs/bpf/xdp/dispatch-IFINDEX-DID/prog1-link - component program 1, bpf_link reference
.IP \(em 4
etc, up to ten component programs
.PP
If set, the \fILIBXDP_BPFFS\fP environment variable will override the location of
\fIbpffs\fP, but the \fIxdp\fP subdirectory is always used. If no \fIbpffs\fP is mounted,
libxdp will consult the environment variable \fILIBXDP_BPFFS_AUTOMOUNT\fP. If this
is set to \fI1\fP, libxdp will attempt to automount a bpffs. If not, libxdp will
fall back to loading a single program without a dispatcher, as if the kernel did
not support the features needed for multiprog attachment.
.SH "Using AF_XDP sockets"
.PP
Libxdp implements helper functions for configuring AF_XDP sockets as
well as reading and writing packets from these sockets. AF_XDP sockets
can be used to redirect packets to user-space at high rates from an
XDP program. Note that this functionality used to reside in libbpf,
but has now been moved over to libxdp as it is a better fit for this
library. As of the 1.0 release of libbpf, the AF_XDP socket support
will be removed and all future development will be performed
in libxdp instead.
.PP
For an overview of AF_XDP sockets, please refer to this Linux Plumbers
paper
(\fIhttp://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf\fP)
and the documentation in the Linux kernel
(Documentation/networking/af_xdp.rst or
\fIhttps://www.kernel.org/doc/html/latest/networking/af_xdp.html\fP).
.PP
For an example on how to use the interface, take a look at the AF_XDP-example
and AF_XDP-forwarding programs in the bpf-examples repository:
\fIhttps://github.com/xdp-project/bpf-examples\fP.
.SS "Control path"
.PP
Libxdp provides helper functions for creating and destroying umems and
sockets as shown below. The first thing that a user generally wants to
do is to create a umem area. This is the area that will contain all
packets received and the ones that are going to be sent. After that,
AF_XDP sockets can be created tied to this umem. These can either be
sockets that have exclusive ownership of that umem through
xsk_socket__create() or shared with other sockets using
xsk_socket__create_shared. There is one option called
XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD that can be set in the
libxdp_flags field (also called libbpf_flags for compatibility
reasons). This will make libxdp not load any XDP program or set and
BPF maps which is a must if users want to add their own XDP program.
.PP
If there is already a socket created with socket(AF_XDP, SOCK_RAW, 0)
not bound and not tied to any umem, file descriptor of this socket can
be used in an xsk_umem__create_with_fd() variant of the umem creation
function.
.RS
.nf
\fCint xsk_umem__create(struct xsk_umem **umem,
void *umem_area, __u64 size,
struct xsk_ring_prod *fill,
struct xsk_ring_cons *comp,
const struct xsk_umem_config *config);
int xsk_umem__create_with_fd(struct xsk_umem **umem,
int fd, void *umem_area, __u64 size,
struct xsk_ring_prod *fill,
struct xsk_ring_cons *comp,
const struct xsk_umem_config *config);
int xsk_socket__create(struct xsk_socket **xsk,
const char *ifname, __u32 queue_id,
struct xsk_umem *umem,
struct xsk_ring_cons *rx,
struct xsk_ring_prod *tx,
const struct xsk_socket_config *config);
int xsk_socket__create_shared(struct xsk_socket **xsk_ptr,
const char *ifname,
__u32 queue_id, struct xsk_umem *umem,
struct xsk_ring_cons *rx,
struct xsk_ring_prod *tx,
struct xsk_ring_prod *fill,
struct xsk_ring_cons *comp,
const struct xsk_socket_config *config);
int xsk_umem__delete(struct xsk_umem *umem);
void xsk_socket__delete(struct xsk_socket *xsk);
\fP
.fi
.RE
.PP
There are also two helper function to get the file descriptor of a
umem or a socket. These are needed when using standard Linux syscalls
such as poll(), recvmsg(), sendto(), etc.
.RS
.nf
\fCint xsk_umem__fd(const struct xsk_umem *umem);
int xsk_socket__fd(const struct xsk_socket *xsk);
\fP
.fi
.RE
.PP
The control path also provides two APIs for setting up AF_XDP sockets when the
process that is going to use the AF_XDP socket is non-privileged. These two
functions perform the operations that require privileges and can be executed
from some form of control process that has the necessary privileges. The
xsk_socket__create executed on the non-privileged process will then skip these
two steps. For an example on how to use these, please take a look at the
AF_XDP-example program in the bpf-examples repository:
\fIhttps://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example\fP.
.RS
.nf
\fCint xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd);
int xsk_socket__update_xskmap(struct xsk_socket *xsk, int xsks_map_fd);
\fP
.fi
.RE
.PP
To further reduce required level of privileges, an AF_XDP socket can be created
beforehand with socket(AF_XDP, SOCK_RAW, 0) and passed to a non-privileged
process. This socket can be used in xsk_umem__create_with_fd() and later
in xsk_socket__create() with created umem. xsk_socket__create_shared() would
still require privileges for AF_XDP socket creation.
.SS "Data path"
.PP
For performance reasons, all the data path functions are static inline
functions found in the xsk.h header file so they can be optimized into
the target application binary for best possible performance. There are
four FIFO rings of two main types: producer rings (fill and Tx) and
consumer rings (Rx and completion). The producer rings use
xsk_ring_prod functions and consumer rings use xsk_ring_cons
functions. For producer rings, you start with \fIreserving\fP one or more
slots in a producer ring and then when they have been filled out, you
\fIsubmit\fP them so that the kernel will act on them. For a consumer
ring, you \fIpeek\fP if there are any new packets in the ring and if so
you can read them from the ring. Once you are done reading them, you
\fIrelease\fP them back to the kernel so it can use them for new
packets. There is also a \fIcancel\fP operation for consumer rings if the
application does not want to consume all packets received with the
peek operation.
.RS
.nf
\fC__u32 xsk_ring_prod__reserve(struct xsk_ring_prod *prod, __u32 nb, __u32 *idx);
void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb);
__u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx);
void xsk_ring_cons__cancel(struct xsk_ring_cons *cons, __u32 nb);
void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb);
\fP
.fi
.RE
.PP
The functions below are used for reading and writing the descriptors
of the rings. xsk_ring_prod__fill_addr() and xsk_ring_prod__tx_desc()
\fBwrites\fP entries in the fill and Tx rings respectively, while
xsk_ring_cons__comp_addr and xsk_ring_cons__rx_desc \fBreads\fP entries from
the completion and Rx rings respectively. The \fIidx\fP is the parameter
returned in the xsk_ring_prod__reserve or xsk_ring_cons__peek
calls. To advance to the next entry, simply do \fIidx++\fP.
.RS
.nf
\fC__u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill, __u32 idx);
struct xdp_desc *xsk_ring_prod__tx_desc(struct xsk_ring_prod *tx, __u32 idx);
const __u64 *xsk_ring_cons__comp_addr(const struct xsk_ring_cons *comp, __u32 idx);
const struct xdp_desc *xsk_ring_cons__rx_desc(const struct xsk_ring_cons *rx, __u32 idx);
\fP
.fi
.RE
.PP
The xsk_umem functions are used to get a pointer to the packet data
itself, always located inside the umem. In the default aligned mode,
you can get the addr variable straight from the Rx descriptor. But in
unaligned mode, you need to use the three last function below as the
offset used is carried in the upper 16 bits of the addr. Therefore,
you cannot use the addr straight from the descriptor in the unaligned
case.
.RS
.nf
\fCvoid *xsk_umem__get_data(void *umem_area, __u64 addr);
__u64 xsk_umem__extract_addr(__u64 addr);
__u64 xsk_umem__extract_offset(__u64 addr);
__u64 xsk_umem__add_offset_to_addr(__u64 addr);
\fP
.fi
.RE
.PP
There is one more function in the data path and that checks if the
need_wakeup flag is set. Use of this flag is highly encouraged and
should be enabled by setting \fIXDP_USE_NEED_WAKEUP\fP bit in the
\fIxdp_bind_flags\fP field that is provided to the
xsk_socket_create_[shared]() calls. If this function returns true,
then you need to call \fIrecvmsg()\fP, \fIsendto()\fP, or \fIpoll()\fP depending on the
situation. \fIrecvmsg()\fP if you are \fBreceiving\fP, or \fIsendto()\fP if you are
\fBsending\fP. \fIpoll()\fP can be used for both cases and provide the ability to
sleep too, as with any other socket. But note that poll is a slower
operation than the other two.
.RS
.nf
\fCint xsk_ring_prod__needs_wakeup(const struct xsk_ring_prod *r);
\fP
.fi
.RE
.PP
For an example on how to use all these APIs, take a look at the AF_XDP-example
and AF_XDP-forwarding programs in the bpf-examples repository:
\fIhttps://github.com/xdp-project/bpf-examples\fP.
.SH "Kernel and BPF program feature compatibility"
.PP
The features exposed by libxdp relies on certain kernel versions and BPF
features to work. To get the full benefit of all features, libxdp needs to be
used with kernel 5.10 or newer, unless the commits mentioned below have been
backported. However, libxdp will probe the kernel and transparently fall back to
legacy loading procedures, so it is possible to use the library with older
versions, although some features will be unavailable, as detailed below.
.PP
The ability to attach multiple BPF programs to a single interface relies on the
kernel "BPF program extension" feature which was introduced by commit
be8704ff07d2 ("bpf: Introduce dynamic program extensions") in the upstream
kernel and first appeared in kernel release 5.6. To \fBincrementally\fP attach
multiple programs, a further refinement added by commit 4a1e7c0c63e0 ("bpf:
Support attaching freplace programs to multiple attach points") is needed; this
first appeared in the upstream kernel version 5.10. The functionality relies on
the "BPF trampolines" feature which is unfortunately only available on the
x86_64 architecture. In other words, kernels before 5.6 can only attach a single
XDP program to each interface, kernels 5.6+ can attach multiple programs if they
are all attached at the same time, and kernels 5.10 have full support for XDP
multiprog on x86_64. On other architectures, only a single program can be
attached to each interface.
.PP
To load AF_XDP programs, kernel support for AF_XDP sockets needs to be included
and enabled in the kernel build. In addition, when using AF_XDP sockets, an XDP
program is also loaded on the interface. The XDP program used for this by libxdp
requires the ability to do map lookups into XSK maps, which was introduced with
commit fada7fdc83c0 ("bpf: Allow bpf_map_lookup_elem() on an xskmap") in kernel
5.3. This means that the minimum required kernel version for using AF_XDP is
kernel 5.3; however, for the AF_XDP XDP program to co-exist with other programs,
the same constraints for multiprog applies as outlined above.
.PP
Note that some Linux distributions backport features to earlier kernel versions,
especially in enterprise kernels; for instance, Red Hat Enterprise Linux kernels
include everything needed for libxdp to function since RHEL 8.5.
.PP
Finally, XDP programs loaded using the multiprog facility must include type
information (using the BPF Type Format, BTF). To get this, compile the programs
with a recent version of Clang/LLVM (version 10+), and enable debug information
when compiling (using the \fI\-g\fP option).
.SH "BUGS"
.PP
Please report any bugs on Github: \fIhttps://github.com/xdp-project/xdp-tools/issues\fP
.SH "AUTHORS"
.PP
libxdp and this man page were written by Toke
Høiland-Jørgensen. AF_XDP support and documentation was contributed by
Magnus Karlsson.