Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Browse files Browse the repository at this point in the history
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-06-05

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
   connect: "This allows to override source IP (including the case when it's
   set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
   TCP and connected UDP (fast path) are not affected. This makes UDP support
   complete, that is, connected UDP is handled by connect hooks, unconnected
   by sendmsg ones.", from Andrey.

2) Rework of the AF_XDP API to allow extending it in future for type writer
   model if necessary. In this mode a memory window is passed to hardware
   and multiple frames might be filled into that window instead of just one
   that is the case in the current fixed frame-size model. With the new
   changes made this can be supported without having to add a new descriptor
   format. Also, core bits for the zero-copy support for AF_XDP have been
   merged as agreed upon, where i40e bits will be routed via Jeff later on.
   Various improvements to documentation and sample programs included as
   well, all from Björn and Magnus.

3) Given BPF's flexibility, a new program type has been added to implement
   infrared decoders. Quote: "The kernel IR decoders support the most
   widely used IR protocols, but there are many protocols which are not
   supported. [...] There is a 'long tail' of unsupported IR protocols,
   for which lircd is need to decode the IR. IR encoding is done in such
   a way that some simple circuit can decode it; therefore, BPF is ideal.
   [...] user-space can define a decoder in BPF, attach it to the rc
   device through the lirc chardev.", from Sean.

4) Several improvements and fixes to BPF core, among others, dumping map
   and prog IDs into fdinfo which is a straight forward way to correlate
   BPF objects used by applications, removing an indirect call and therefore
   retpoline in all map lookup/update/delete calls by invoking the callback
   directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
   for tc BPF programs to have an efficient way of looking up cgroup v2 id
   for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
   state that hasn't been filled, to allow context access wrt pt_regs in
   32 bit archs for tracing, and last but not least various test cases
   for fixes that landed in bpf earlier, from Daniel.

5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
   a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
   call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.

6) Add a new bpf_get_current_cgroup_id() helper that can be used in
   tracing to retrieve the cgroup id from the current process in order
   to allow for e.g. aggregation of container-level events, from Yonghong.

7) Two follow-up fixes for BTF to reject invalid input values and
   related to that also two test cases for BPF kselftests, from Martin.

8) Various API improvements to the bpf_fib_lookup() helper, that is,
   dropping MPLS bits which are not fully hashed out yet, rejecting
   invalid helper flags, returning error for unsupported address
   families as well as renaming flowlabel to flowinfo, from David.

9) Various fixes and improvements to sockmap BPF kselftests in particular
   in proper error detection and data verification, from Prashant.

10) Two arm32 BPF JIT improvements. One is to fix imm range check with
    regards to whether immediate fits into 24 bits, and a naming cleanup
    to get functions related to rsh handling consistent to those handling
    lsh, from Wang.

11) Two compile warning fixes in BPF, one for BTF and a false positive
    to silent gcc in stack_map_get_build_id_offset(), from Arnd.

12) Add missing seg6.h header into tools include infrastructure in order
    to fix compilation of BPF kselftests, from Mathieu.

13) Several formatting cleanups in the BPF UAPI helper description that
    also fix an error during rst2man compilation, from Quentin.

14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
    not built into the kernel, from Yue.

15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
davem330 committed Jun 5, 2018
2 parents a6fa908 + 9fa0610 commit fd129f8
Show file tree
Hide file tree
Showing 76 changed files with 3,841 additions and 730 deletions.
101 changes: 58 additions & 43 deletions Documentation/networking/af_xdp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ packet processing.

This document assumes that the reader is familiar with BPF and XDP. If
not, the Cilium project has an excellent reference guide at
http://cilium.readthedocs.io/en/doc-1.0/bpf/.
http://cilium.readthedocs.io/en/latest/bpf/.

Using the XDP_REDIRECT action from an XDP program, the program can
redirect ingress frames to other XDP enabled netdevs, using the
Expand All @@ -33,22 +33,22 @@ for a while due to a possible retransmit, the descriptor that points
to that packet can be changed to point to another and reused right
away. This again avoids copying data.

The UMEM consists of a number of equally size frames and each frame
has a unique frame id. A descriptor in one of the rings references a
frame by referencing its frame id. The user space allocates memory for
this UMEM using whatever means it feels is most appropriate (malloc,
mmap, huge pages, etc). This memory area is then registered with the
kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two
rings: the FILL ring and the COMPLETION ring. The fill ring is used by
the application to send down frame ids for the kernel to fill in with
RX packet data. References to these frames will then appear in the RX
ring once each packet has been received. The completion ring, on the
other hand, contains frame ids that the kernel has transmitted
completely and can now be used again by user space, for either TX or
RX. Thus, the frame ids appearing in the completion ring are ids that
were previously transmitted using the TX ring. In summary, the RX and
FILL rings are used for the RX path and the TX and COMPLETION rings
are used for the TX path.
The UMEM consists of a number of equally sized chunks. A descriptor in
one of the rings references a frame by referencing its addr. The addr
is simply an offset within the entire UMEM region. The user space
allocates memory for this UMEM using whatever means it feels is most
appropriate (malloc, mmap, huge pages, etc). This memory area is then
registered with the kernel using the new setsockopt XDP_UMEM_REG. The
UMEM also has two rings: the FILL ring and the COMPLETION ring. The
fill ring is used by the application to send down addr for the kernel
to fill in with RX packet data. References to these frames will then
appear in the RX ring once each packet has been received. The
completion ring, on the other hand, contains frame addr that the
kernel has transmitted completely and can now be used again by user
space, for either TX or RX. Thus, the frame addrs appearing in the
completion ring are addrs that were previously transmitted using the
TX ring. In summary, the RX and FILL rings are used for the RX path
and the TX and COMPLETION rings are used for the TX path.

The socket is then finally bound with a bind() call to a device and a
specific queue id on that device, and it is not until bind is
Expand All @@ -59,13 +59,13 @@ wants to do this, it simply skips the registration of the UMEM and its
corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind
call and submits the XSK of the process it would like to share UMEM
with as well as its own newly created XSK socket. The new process will
then receive frame id references in its own RX ring that point to this
shared UMEM. Note that since the ring structures are single-consumer /
single-producer (for performance reasons), the new process has to
create its own socket with associated RX and TX rings, since it cannot
share this with the other process. This is also the reason that there
is only one set of FILL and COMPLETION rings per UMEM. It is the
responsibility of a single process to handle the UMEM.
then receive frame addr references in its own RX ring that point to
this shared UMEM. Note that since the ring structures are
single-consumer / single-producer (for performance reasons), the new
process has to create its own socket with associated RX and TX rings,
since it cannot share this with the other process. This is also the
reason that there is only one set of FILL and COMPLETION rings per
UMEM. It is the responsibility of a single process to handle the UMEM.

How is then packets distributed from an XDP program to the XSKs? There
is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The
Expand Down Expand Up @@ -102,10 +102,10 @@ UMEM

UMEM is a region of virtual contiguous memory, divided into
equal-sized frames. An UMEM is associated to a netdev and a specific
queue id of that netdev. It is created and configured (frame size,
frame headroom, start address and size) by using the XDP_UMEM_REG
setsockopt system call. A UMEM is bound to a netdev and queue id, via
the bind() system call.
queue id of that netdev. It is created and configured (chunk size,
headroom, start address and size) by using the XDP_UMEM_REG setsockopt
system call. A UMEM is bound to a netdev and queue id, via the bind()
system call.

An AF_XDP is socket linked to a single UMEM, but one UMEM can have
multiple AF_XDP sockets. To share an UMEM created via one socket A,
Expand Down Expand Up @@ -147,13 +147,17 @@ UMEM Fill Ring
~~~~~~~~~~~~~~

The Fill ring is used to transfer ownership of UMEM frames from
user-space to kernel-space. The UMEM indicies are passed in the
ring. As an example, if the UMEM is 64k and each frame is 4k, then the
UMEM has 16 frames and can pass indicies between 0 and 15.
user-space to kernel-space. The UMEM addrs are passed in the ring. As
an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
16 chunks and can pass addrs between 0 and 64k.

Frames passed to the kernel are used for the ingress path (RX rings).

The user application produces UMEM indicies to this ring.
The user application produces UMEM addrs to this ring. Note that the
kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
and 3000 refers to the same chunk.


UMEM Completetion Ring
~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -165,16 +169,15 @@ used.
Frames passed from the kernel to user-space are frames that has been
sent (TX ring) and can be used by user-space again.

The user application consumes UMEM indicies from this ring.
The user application consumes UMEM addrs from this ring.


RX Ring
~~~~~~~

The RX ring is the receiving side of a socket. Each entry in the ring
is a struct xdp_desc descriptor. The descriptor contains UMEM index
(idx), the length of the data (len), the offset into the frame
(offset).
is a struct xdp_desc descriptor. The descriptor contains UMEM offset
(addr) and the length of the data (len).

If no frames have been passed to kernel via the Fill ring, no
descriptors will (or can) appear on the RX ring.
Expand Down Expand Up @@ -221,38 +224,50 @@ side is xdpsock_user.c and the XDP side xdpsock_kern.c.

Naive ring dequeue and enqueue could look like this::

// struct xdp_rxtx_ring {
// __u32 *producer;
// __u32 *consumer;
// struct xdp_desc *desc;
// };

// struct xdp_umem_ring {
// __u32 *producer;
// __u32 *consumer;
// __u64 *desc;
// };

// typedef struct xdp_rxtx_ring RING;
// typedef struct xdp_umem_ring RING;

// typedef struct xdp_desc RING_TYPE;
// typedef __u32 RING_TYPE;
// typedef __u64 RING_TYPE;

int dequeue_one(RING *ring, RING_TYPE *item)
{
__u32 entries = ring->ptrs.producer - ring->ptrs.consumer;
__u32 entries = *ring->producer - *ring->consumer;

if (entries == 0)
return -1;

// read-barrier!

*item = ring->desc[ring->ptrs.consumer & (RING_SIZE - 1)];
ring->ptrs.consumer++;
*item = ring->desc[*ring->consumer & (RING_SIZE - 1)];
(*ring->consumer)++;
return 0;
}

int enqueue_one(RING *ring, const RING_TYPE *item)
{
u32 free_entries = RING_SIZE - (ring->ptrs.producer - ring->ptrs.consumer);
u32 free_entries = RING_SIZE - (*ring->producer - *ring->consumer);

if (free_entries == 0)
return -1;

ring->desc[ring->ptrs.producer & (RING_SIZE - 1)] = *item;
ring->desc[*ring->producer & (RING_SIZE - 1)] = *item;

// write-barrier!

ring->ptrs.producer++;
(*ring->producer)++;
return 0;
}

Expand Down
2 changes: 2 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -2722,6 +2722,7 @@ L: netdev@vger.kernel.org
L: linux-kernel@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
Q: https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
S: Supported
F: arch/x86/net/bpf_jit*
F: Documentation/networking/filter.txt
Expand All @@ -2740,6 +2741,7 @@ F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/

BROADCOM B44 10/100 ETHERNET DRIVER
Expand Down
16 changes: 8 additions & 8 deletions arch/arm/net/bpf_jit_32.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
*
* 1. First argument is passed using the arm 32bit registers and rest of the
* arguments are passed on stack scratch space.
* 2. First callee-saved arugument is mapped to arm 32 bit registers and rest
* 2. First callee-saved argument is mapped to arm 32 bit registers and rest
* arguments are mapped to scratch space on stack.
* 3. We need two 64 bit temp registers to do complex operations on eBPF
* registers.
Expand Down Expand Up @@ -701,7 +701,7 @@ static inline void emit_a32_arsh_r64(const u8 dst[], const u8 src[], bool dstk,
}

/* dst = dst >> src */
static inline void emit_a32_lsr_r64(const u8 dst[], const u8 src[], bool dstk,
static inline void emit_a32_rsh_r64(const u8 dst[], const u8 src[], bool dstk,
bool sstk, struct jit_ctx *ctx) {
const u8 *tmp = bpf2a32[TMP_REG_1];
const u8 *tmp2 = bpf2a32[TMP_REG_2];
Expand All @@ -717,7 +717,7 @@ static inline void emit_a32_lsr_r64(const u8 dst[], const u8 src[], bool dstk,
emit(ARM_LDR_I(rm, ARM_SP, STACK_VAR(dst_hi)), ctx);
}

/* Do LSH operation */
/* Do RSH operation */
emit(ARM_RSB_I(ARM_IP, rt, 32), ctx);
emit(ARM_SUBS_I(tmp2[0], rt, 32), ctx);
emit(ARM_MOV_SR(ARM_LR, rd, SRTYPE_LSR, rt), ctx);
Expand Down Expand Up @@ -767,7 +767,7 @@ static inline void emit_a32_lsh_i64(const u8 dst[], bool dstk,
}

/* dst = dst >> val */
static inline void emit_a32_lsr_i64(const u8 dst[], bool dstk,
static inline void emit_a32_rsh_i64(const u8 dst[], bool dstk,
const u32 val, struct jit_ctx *ctx) {
const u8 *tmp = bpf2a32[TMP_REG_1];
const u8 *tmp2 = bpf2a32[TMP_REG_2];
Expand Down Expand Up @@ -1192,8 +1192,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
s32 jmp_offset;

#define check_imm(bits, imm) do { \
if ((((imm) > 0) && ((imm) >> (bits))) || \
(((imm) < 0) && (~(imm) >> (bits)))) { \
if ((imm) >= (1 << ((bits) - 1)) || \
(imm) < -(1 << ((bits) - 1))) { \
pr_info("[%2d] imm=%d(0x%x) out of range\n", \
i, imm, imm); \
return -EINVAL; \
Expand Down Expand Up @@ -1323,15 +1323,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
case BPF_ALU64 | BPF_RSH | BPF_K:
if (unlikely(imm > 63))
return -EINVAL;
emit_a32_lsr_i64(dst, dstk, imm, ctx);
emit_a32_rsh_i64(dst, dstk, imm, ctx);
break;
/* dst = dst << src */
case BPF_ALU64 | BPF_LSH | BPF_X:
emit_a32_lsh_r64(dst, src, dstk, sstk, ctx);
break;
/* dst = dst >> src */
case BPF_ALU64 | BPF_RSH | BPF_X:
emit_a32_lsr_r64(dst, src, dstk, sstk, ctx);
emit_a32_rsh_r64(dst, src, dstk, sstk, ctx);
break;
/* dst = dst >> src (signed) */
case BPF_ALU64 | BPF_ARSH | BPF_X:
Expand Down
13 changes: 13 additions & 0 deletions drivers/media/rc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,19 @@ config LIRC
passes raw IR to and from userspace, which is needed for
IR transmitting (aka "blasting") and for the lirc daemon.

config BPF_LIRC_MODE2
bool "Support for eBPF programs attached to lirc devices"
depends on BPF_SYSCALL
depends on RC_CORE=y
depends on LIRC
help
Allow attaching eBPF programs to a lirc device using the bpf(2)
syscall command BPF_PROG_ATTACH. This is supported for raw IR
receivers.

These eBPF programs can be used to decode IR into scancodes, for
IR protocols not supported by the kernel decoders.

menuconfig RC_DECODERS
bool "Remote controller decoders"
depends on RC_CORE
Expand Down
1 change: 1 addition & 0 deletions drivers/media/rc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ obj-y += keymaps/
obj-$(CONFIG_RC_CORE) += rc-core.o
rc-core-y := rc-main.o rc-ir-raw.o
rc-core-$(CONFIG_LIRC) += lirc_dev.o
rc-core-$(CONFIG_BPF_LIRC_MODE2) += bpf-lirc.o
obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
Expand Down
Loading

0 comments on commit fd129f8

Please sign in to comment.