Skip to content

Commit 1a7551f

Browse files
tohojoAlexei Starovoitov
authored andcommitted
Documentation/bpf: Add documentation for BPF_PROG_RUN
This adds documentation for the BPF_PROG_RUN command; a short overview of the command itself, and a more verbose description of the "live packet" mode for XDP introduced in the previous commit. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-3-toke@redhat.com
1 parent b530e9e commit 1a7551f

File tree

2 files changed

+118
-0
lines changed

2 files changed

+118
-0
lines changed

Documentation/bpf/bpf_prog_run.rst

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===================================
4+
Running BPF programs from userspace
5+
===================================
6+
7+
This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
8+
from userspace.
9+
10+
.. contents::
11+
:local:
12+
:depth: 2
13+
14+
15+
Overview
16+
--------
17+
18+
The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
19+
execute a BPF program in the kernel and return the results to userspace. This
20+
can be used to unit test BPF programs against user-supplied context objects, and
21+
as way to explicitly execute programs in the kernel for their side effects. The
22+
command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
23+
to be defined in the UAPI header, aliased to the same value.
24+
25+
The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
26+
following types:
27+
28+
- ``BPF_PROG_TYPE_SOCKET_FILTER``
29+
- ``BPF_PROG_TYPE_SCHED_CLS``
30+
- ``BPF_PROG_TYPE_SCHED_ACT``
31+
- ``BPF_PROG_TYPE_XDP``
32+
- ``BPF_PROG_TYPE_SK_LOOKUP``
33+
- ``BPF_PROG_TYPE_CGROUP_SKB``
34+
- ``BPF_PROG_TYPE_LWT_IN``
35+
- ``BPF_PROG_TYPE_LWT_OUT``
36+
- ``BPF_PROG_TYPE_LWT_XMIT``
37+
- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
38+
- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
39+
- ``BPF_PROG_TYPE_STRUCT_OPS``
40+
- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
41+
- ``BPF_PROG_TYPE_SYSCALL``
42+
43+
When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
44+
object and (for program types operating on network packets) a buffer containing
45+
the packet data that the BPF program will operate on. The kernel will then
46+
execute the program and return the results to userspace. Note that programs will
47+
not have any side effects while being run in this mode; in particular, packets
48+
will not actually be redirected or dropped, the program return code will just be
49+
returned to userspace. A separate mode for live execution of XDP programs is
50+
provided, documented separately below.
51+
52+
Running XDP programs in "live frame mode"
53+
-----------------------------------------
54+
55+
The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
56+
which can be used to execute XDP programs in a way where packets will actually
57+
be processed by the kernel after the execution of the XDP program as if they
58+
arrived on a physical interface. This mode is activated by setting the
59+
``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
60+
``BPF_PROG_RUN``.
61+
62+
The live packet mode is optimised for high performance execution of the supplied
63+
XDP program many times (suitable for, e.g., running as a traffic generator),
64+
which means the semantics are not quite as straight-forward as the regular test
65+
run mode. Specifically:
66+
67+
- When executing an XDP program in live frame mode, the result of the execution
68+
will not be returned to userspace; instead, the kernel will perform the
69+
operation indicated by the program's return code (drop the packet, redirect
70+
it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
71+
in the syscall parameters when running in this mode will be rejected. In
72+
addition, not all failures will be reported back to userspace directly;
73+
specifically, only fatal errors in setup or during execution (like memory
74+
allocation errors) will halt execution and return an error. If an error occurs
75+
in packet processing, like a failure to redirect to a given interface,
76+
execution will continue with the next repetition; these errors can be detected
77+
via the same trace points as for regular XDP programs.
78+
79+
- Userspace can supply an ifindex as part of the context object, just like in
80+
the regular (non-live) mode. The XDP program will be executed as though the
81+
packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
82+
object will point to that interface. Furthermore, if the XDP program returns
83+
``XDP_PASS``, the packet will be injected into the kernel networking stack as
84+
though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
85+
will be transmitted *out* of that same interface. Do note, though, that
86+
because the program execution is not happening in driver context, an
87+
``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
88+
that same interface (i.e., it will only work if the driver has support for the
89+
``ndo_xdp_xmit`` driver op).
90+
91+
- When running the program with multiple repetitions, the execution will happen
92+
in batches. The batch size defaults to 64 packets (which is same as the
93+
maximum NAPI receive batch size), but can be specified by userspace through
94+
the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
95+
the kernel executes the XDP program repeatedly, each invocation getting a
96+
separate copy of the packet data. For each repetition, if the program drops
97+
the packet, the data page is immediately recycled (see below). Otherwise, the
98+
packet is buffered until the end of the batch, at which point all packets
99+
buffered this way during the batch are transmitted at once.
100+
101+
- When setting up the test run, the kernel will initialise a pool of memory
102+
pages of the same size as the batch size. Each memory page will be initialised
103+
with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
104+
invocation. When possible, the pages will be recycled on future program
105+
invocations, to improve performance. Pages will generally be recycled a full
106+
batch at a time, except when a packet is dropped (by return code or because
107+
of, say, a redirection error), in which case that page will be recycled
108+
immediately. If a packet ends up being passed to the regular networking stack
109+
(because the XDP program returns ``XDP_PASS``, or because it ends up being
110+
redirected to an interface that injects it into the stack), the page will be
111+
released and a new one will be allocated when the pool is empty.
112+
113+
When recycling, the page content is not rewritten; only the packet boundary
114+
pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
115+
be reset to the original values. This means that if a program rewrites the
116+
packet contents, it has to be prepared to see either the original content or
117+
the modified version on subsequent invocations.

Documentation/bpf/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture.
2121
helpers
2222
programs
2323
maps
24+
bpf_prog_run
2425
classic_vs_extended.rst
2526
bpf_licensing
2627
test_debug

0 commit comments

Comments
 (0)