Skip to content

Commit

Permalink
Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kerne…
Browse files Browse the repository at this point in the history
…l/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - bpf:
        - allow bpf programs calling kernel functions (initially to
          reuse TCP congestion control implementations)
        - enable task local storage for tracing programs - remove the
          need to store per-task state in hash maps, and allow tracing
          programs access to task local storage previously added for
          BPF_LSM
        - add bpf_for_each_map_elem() helper, allowing programs to walk
          all map elements in a more robust and easier to verify fashion
        - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
          redirection
        - lpm: add support for batched ops in LPM trie
        - add BTF_KIND_FLOAT support - mostly to allow use of BTF on
          s390 which has floats in its headers files
        - improve BPF syscall documentation and extend the use of kdoc
          parsing scripts we already employ for bpf-helpers
        - libbpf, bpftool: support static linking of BPF ELF files
        - improve support for encapsulation of L2 packets

   - xdp: restructure redirect actions to avoid a runtime lookup,
     improving performance by 4-8% in microbenchmarks

   - xsk: build skb by page (aka generic zerocopy xmit) - improve
     performance of software AF_XDP path by 33% for devices which don't
     need headers in the linear skb part (e.g. virtio)

   - nexthop: resilient next-hop groups - improve path stability on
     next-hops group changes (incl. offload for mlxsw)

   - ipv6: segment routing: add support for IPv4 decapsulation

   - icmp: add support for RFC 8335 extended PROBE messages

   - inet: use bigger hash table for IP ID generation

   - tcp: deal better with delayed TX completions - make sure we don't
     give up on fast TCP retransmissions only because driver is slow in
     reporting that it completed transmitting the original

   - tcp: reorder tcp_congestion_ops for better cache locality

   - mptcp:
        - add sockopt support for common TCP options
        - add support for common TCP msg flags
        - include multiple address ids in RM_ADDR
        - add reset option support for resetting one subflow

   - udp: GRO L4 improvements - improve 'forward' / 'frag_list'
     co-existence with UDP tunnel GRO, allowing the first to take place
     correctly even for encapsulated UDP traffic

   - micro-optimize dev_gro_receive() and flow dissection, avoid
     retpoline overhead on VLAN and TEB GRO

   - use less memory for sysctls, add a new sysctl type, to allow using
     u8 instead of "int" and "long" and shrink networking sysctls

   - veth: allow GRO without XDP - this allows aggregating UDP packets
     before handing them off to routing, bridge, OvS, etc.

   - allow specifing ifindex when device is moved to another namespace

   - netfilter:
        - nft_socket: add support for cgroupsv2
        - nftables: add catch-all set element - special element used to
          define a default action in case normal lookup missed
        - use net_generic infra in many modules to avoid allocating
          per-ns memory unnecessarily

   - xps: improve the xps handling to avoid potential out-of-bound
     accesses and use-after-free when XPS change race with other
     re-configuration under traffic

   - add a config knob to turn off per-cpu netdev refcnt to catch
     underflows in testing

  Device APIs:

   - add WWAN subsystem to organize the WWAN interfaces better and
     hopefully start driving towards more unified and vendor-
     independent APIs

   - ethtool:
        - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
          support)
        - allow network drivers to dump arbitrary SFP EEPROM data,
          current offset+length API was a poor fit for modern SFP which
          define EEPROM in terms of pages (incl. mlx5 support)

   - act_police, flow_offload: add support for packet-per-second
     policing (incl. offload for nfp)

   - psample: add additional metadata attributes like transit delay for
     packets sampled from switch HW (and corresponding egress and
     policy-based sampling in the mlxsw driver)

   - dsa: improve support for sandwiched LAGs with bridge and DSA

   - netfilter:
        - flowtable: use direct xmit in topologies with IP forwarding,
          bridging, vlans etc.
        - nftables: counter hardware offload support

   - Bluetooth:
        - improvements for firmware download w/ Intel devices
        - add support for reading AOSP vendor capabilities
        - add support for virtio transport driver

   - mac80211:
        - allow concurrent monitor iface and ethernet rx decap
        - set priority and queue mapping for injected frames

   - phy: add support for Clause-45 PHY Loopback

   - pci/iov: add sysfs MSI-X vector assignment interface to distribute
     MSI-X resources to VFs (incl. mlx5 support)

  New hardware/drivers:

   - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
     Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
     interfaces.

   - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
     BCM63xx switches

   - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches

   - ath11k: support for QCN9074 a 802.11ax device

   - Bluetooth: Broadcom BCM4330 and BMC4334

   - phy: Marvell 88X2222 transceiver support

   - mdio: add BCM6368 MDIO mux bus controller

   - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips

   - mana: driver for Microsoft Azure Network Adapter (MANA)

   - Actions Semi Owl Ethernet MAC

   - can: driver for ETAS ES58X CAN/USB interfaces

  Pure driver changes:

   - add XDP support to: enetc, igc, stmmac

   - add AF_XDP support to: stmmac

   - virtio:
        - page_to_skb() use build_skb when there's sufficient tailroom
          (21% improvement for 1000B UDP frames)
        - support XDP even without dedicated Tx queues - share the Tx
          queues with the stack when necessary

   - mlx5:
        - flow rules: add support for mirroring with conntrack, matching
          on ICMP, GTP, flex filters and more
        - support packet sampling with flow offloads
        - persist uplink representor netdev across eswitch mode changes
        - allow coexistence of CQE compression and HW time-stamping
        - add ethtool extended link error state reporting

   - ice, iavf: support flow filters, UDP Segmentation Offload

   - dpaa2-switch:
        - move the driver out of staging
        - add spanning tree (STP) support
        - add rx copybreak support
        - add tc flower hardware offload on ingress traffic

   - ionic:
        - implement Rx page reuse
        - support HW PTP time-stamping

   - octeon: support TC hardware offloads - flower matching on ingress
     and egress ratelimitting.

   - stmmac:
        - add RX frame steering based on VLAN priority in tc flower
        - support frame preemption (FPE)
        - intel: add cross time-stamping freq difference adjustment

   - ocelot:
        - support forwarding of MRP frames in HW
        - support multiple bridges
        - support PTP Sync one-step timestamping

   - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
     learning, flooding etc.

   - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
     SC7280 SoCs)

   - mt7601u: enable TDLS support

   - mt76:
        - add support for 802.3 rx frames (mt7915/mt7615)
        - mt7915 flash pre-calibration support
        - mt7921/mt7663 runtime power management fixes"

* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
  net: selftest: fix build issue if INET is disabled
  net: netrom: nr_in: Remove redundant assignment to ns
  net: tun: Remove redundant assignment to ret
  net: phy: marvell: add downshift support for M88E1240
  net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
  net/sched: act_ct: Remove redundant ct get and check
  icmp: standardize naming of RFC 8335 PROBE constants
  bpf, selftests: Update array map tests for per-cpu batched ops
  bpf: Add batched ops support for percpu array
  bpf: Implement formatted output helpers with bstr_printf
  seq_file: Add a seq_bprintf function
  sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
  net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
  net: fix a concurrency bug in l2tp_tunnel_register()
  net/smc: Remove redundant assignment to rc
  mpls: Remove redundant assignment to err
  llc2: Remove redundant assignment to rc
  net/tls: Remove redundant initialization of record
  rds: Remove redundant assignment to nr_sig
  dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
  ...
  • Loading branch information
torvalds committed Apr 29, 2021
2 parents 635de95 + 4a52dd8 commit 9d31d23
Show file tree
Hide file tree
Showing 1,895 changed files with 121,451 additions and 35,419 deletions.
29 changes: 29 additions & 0 deletions Documentation/ABI/testing/sysfs-bus-pci
Original file line number Diff line number Diff line change
Expand Up @@ -378,3 +378,32 @@ Description:
The value comes from the PCI kernel device state and can be one
of: "unknown", "error", "D0", D1", "D2", "D3hot", "D3cold".
The file is read only.

What: /sys/bus/pci/devices/.../sriov_vf_total_msix
Date: January 2021
Contact: Leon Romanovsky <leonro@nvidia.com>
Description:
This file is associated with a SR-IOV physical function (PF).
It contains the total number of MSI-X vectors available for
assignment to all virtual functions (VFs) associated with PF.
The value will be zero if the device doesn't support this
functionality. For supported devices, the value will be
constant and won't be changed after MSI-X vectors assignment.

What: /sys/bus/pci/devices/.../sriov_vf_msix_count
Date: January 2021
Contact: Leon Romanovsky <leonro@nvidia.com>
Description:
This file is associated with a SR-IOV virtual function (VF).
It allows configuration of the number of MSI-X vectors for
the VF. This allows devices that have a global pool of MSI-X
vectors to optimally divide them between VFs based on VF usage.

The values accepted are:
* > 0 - this number will be reported as the Table Size in the
VF's MSI-X capability
* < 0 - not valid
* = 0 - will reset to the device default value

The file is writable if the PF is bound to a driver that
implements ->sriov_set_msix_vec_count().
12 changes: 12 additions & 0 deletions Documentation/ABI/testing/sysfs-class-net-phydev
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,15 @@ Description:
Boolean value indicating whether the PHY device is used in
standalone mode, without a net_device associated, by PHYLINK.
Attribute created only when this is the case.

What: /sys/class/mdio_bus/<bus>/<device>/phy_dev_flags
Date: March 2021
KernelVersion: 5.13
Contact: netdev@vger.kernel.org
Description:
32-bit hexadecimal number representing a bit mask of the
configuration bits passed from the consumer of the PHY
(Ethernet MAC, switch, etc.) to the PHY driver. The flags are
only used internally by the kernel and their placement are
not meant to be stable across kernel versions. This is intended
for facilitating the debugging of PHY drivers.
11 changes: 11 additions & 0 deletions Documentation/admin-guide/sysctl/net.rst
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,17 @@ permit to distribute the load on several cpus.
If set to 1 (default), timestamps are sampled as soon as possible, before
queueing.

netdev_unregister_timeout_secs
------------------------------

Unregister network device timeout in seconds.
This option controls the timeout (in seconds) used to issue a warning while
waiting for a network device refcount to drop to 0 during device
unregistration. A lower value may be useful during bisection to detect
a leaked reference faster. A larger value may be useful to prevent false
warnings on slow/loaded systems.
Default value is 10, minimum 1, maximum 3600.

optmem_max
----------

Expand Down
15 changes: 15 additions & 0 deletions Documentation/bpf/bpf_design_QA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,3 +258,18 @@ Q: Can BPF functionality such as new program or map types, new
helpers, etc be added out of kernel module code?

A: NO.

Q: Directly calling kernel function is an ABI?
----------------------------------------------
Q: Some kernel functions (e.g. tcp_slow_start) can be called
by BPF programs. Do these kernel functions become an ABI?

A: NO.

The kernel function protos will change and the bpf programs will be
rejected by the verifier. Also, for example, some of the bpf-callable
kernel functions have already been used by other kernel tcp
cc (congestion-control) implementations. If any of these kernel
functions has changed, both the in-tree and out-of-tree kernel tcp cc
implementations have to be changed. The same goes for the bpf
programs and they have to be adjusted accordingly.
30 changes: 21 additions & 9 deletions Documentation/bpf/bpf_devel_QA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ list:
This may also include issues related to XDP, BPF tracing, etc.

Given netdev has a high volume of traffic, please also add the BPF
maintainers to Cc (from kernel MAINTAINERS_ file):
maintainers to Cc (from kernel ``MAINTAINERS`` file):

* Alexei Starovoitov <ast@kernel.org>
* Daniel Borkmann <daniel@iogearbox.net>
Expand Down Expand Up @@ -234,21 +234,21 @@ be subject to change.

Q: samples/bpf preference vs selftests?
---------------------------------------
Q: When should I add code to `samples/bpf/`_ and when to BPF kernel
selftests_ ?
Q: When should I add code to ``samples/bpf/`` and when to BPF kernel
selftests_?

A: In general, we prefer additions to BPF kernel selftests_ rather than
`samples/bpf/`_. The rationale is very simple: kernel selftests are
``samples/bpf/``. The rationale is very simple: kernel selftests are
regularly run by various bots to test for kernel regressions.

The more test cases we add to BPF selftests, the better the coverage
and the less likely it is that those could accidentally break. It is
not that BPF kernel selftests cannot demo how a specific feature can
be used.

That said, `samples/bpf/`_ may be a good place for people to get started,
That said, ``samples/bpf/`` may be a good place for people to get started,
so it might be advisable that simple demos of features could go into
`samples/bpf/`_, but advanced functional and corner-case testing rather
``samples/bpf/``, but advanced functional and corner-case testing rather
into kernel selftests.

If your sample looks like a test case, then go for BPF kernel selftests
Expand Down Expand Up @@ -449,6 +449,19 @@ from source at

https://github.com/acmel/dwarves

pahole starts to use libbpf definitions and APIs since v1.13 after the
commit 21507cd3e97b ("pahole: add libbpf as submodule under lib/bpf").
It works well with the git repository because the libbpf submodule will
use "git submodule update --init --recursive" to update.

Unfortunately, the default github release source code does not contain
libbpf submodule source code and this will cause build issues, the tarball
from https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ is same with
github, you can get the source tarball with corresponding libbpf submodule
codes from

https://fedorapeople.org/~acme/dwarves

Some distros have pahole version 1.16 packaged already, e.g.
Fedora, Gentoo.

Expand Down Expand Up @@ -645,10 +658,9 @@ when:

.. Links
.. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/
.. _MAINTAINERS: ../../MAINTAINERS
.. _netdev-FAQ: ../networking/netdev-FAQ.rst
.. _samples/bpf/: ../../samples/bpf/
.. _selftests: ../../tools/testing/selftests/bpf/
.. _selftests:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/
.. _Documentation/dev-tools/kselftest.rst:
https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html
.. _Documentation/bpf/btf.rst: btf.rst
Expand Down
17 changes: 15 additions & 2 deletions Documentation/bpf/btf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ sequentially and type id is assigned to each recognized type starting from id
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
#define BTF_KIND_VAR 14 /* Variable */
#define BTF_KIND_DATASEC 15 /* Section */
#define BTF_KIND_FLOAT 16 /* Floating point */

Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
Expand All @@ -95,8 +96,8 @@ Each type contains the following common data::
/* "info" bits arrangement
* bits 0-15: vlen (e.g. # of struct's members)
* bits 16-23: unused
* bits 24-27: kind (e.g. int, ptr, array...etc)
* bits 28-30: unused
* bits 24-28: kind (e.g. int, ptr, array...etc)
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union and fwd
*/
Expand Down Expand Up @@ -452,6 +453,18 @@ map definition.
* ``offset``: the in-section offset of the variable
* ``size``: the size of the variable in bytes

2.2.16 BTF_KIND_FLOAT
~~~~~~~~~~~~~~~~~~~~~

``struct btf_type`` encoding requirement:
* ``name_off``: any valid offset
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_FLOAT
* ``info.vlen``: 0
* ``size``: the size of the float type in bytes: 2, 4, 8, 12 or 16.

No additional type data follow ``btf_type``.

3. BTF Kernel API
*****************

Expand Down
9 changes: 6 additions & 3 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.

The primary info for the bpf syscall is available in the `man-pages`_
for `bpf(2)`_.

BPF Type Format (BTF)
=====================

Expand All @@ -35,6 +32,12 @@ Two sets of Questions and Answers (Q&A) are maintained.
bpf_design_QA
bpf_devel_QA

Syscall API
===========

The primary info for the bpf syscall is available in the `man-pages`_
for `bpf(2)`_. For more information about the userspace API, see
Documentation/userspace-api/ebpf/index.rst.

Helper functions
================
Expand Down
92 changes: 92 additions & 0 deletions Documentation/devicetree/bindings/net/actions,owl-emac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/net/actions,owl-emac.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#

title: Actions Semi Owl SoCs Ethernet MAC Controller

maintainers:
- Cristian Ciocaltea <cristian.ciocaltea@gmail.com>

description: |
This Ethernet MAC is used on the Owl family of SoCs from Actions Semi.
It provides the RMII and SMII interfaces and is compliant with the
IEEE 802.3 CSMA/CD standard, supporting both half-duplex and full-duplex
operation modes at 10/100 Mb/s data transfer rates.
allOf:
- $ref: "ethernet-controller.yaml#"

properties:
compatible:
oneOf:
- const: actions,owl-emac
- items:
- enum:
- actions,s500-emac
- const: actions,owl-emac

reg:
maxItems: 1

interrupts:
maxItems: 1

clocks:
minItems: 2
maxItems: 2

clock-names:
additionalItems: false
items:
- const: eth
- const: rmii

resets:
maxItems: 1

actions,ethcfg:
$ref: /schemas/types.yaml#/definitions/phandle
description:
Phandle to the device containing custom config.

required:
- compatible
- reg
- interrupts
- clocks
- clock-names
- resets
- phy-mode
- phy-handle

unevaluatedProperties: false

examples:
- |
#include <dt-bindings/clock/actions,s500-cmu.h>
#include <dt-bindings/interrupt-controller/arm-gic.h>
#include <dt-bindings/reset/actions,s500-reset.h>
ethernet@b0310000 {
compatible = "actions,s500-emac", "actions,owl-emac";
reg = <0xb0310000 0x10000>;
interrupts = <GIC_SPI 0 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cmu 59 /*CLK_ETHERNET*/>, <&cmu CLK_RMII_REF>;
clock-names = "eth", "rmii";
resets = <&cmu RESET_ETHERNET>;
phy-mode = "rmii";
phy-handle = <&eth_phy>;
mdio {
#address-cells = <1>;
#size-cells = <0>;
eth_phy: ethernet-phy@3 {
reg = <0x3>;
interrupt-parent = <&sirq>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
};
};
};
17 changes: 13 additions & 4 deletions Documentation/devicetree/bindings/net/brcm,bcm4908-enet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,18 @@ properties:
maxItems: 1

interrupts:
description: RX interrupt
minItems: 1
maxItems: 2
items:
- description: RX interrupt
- description: TX interrupt

interrupt-names:
const: rx
minItems: 1
maxItems: 2
items:
- const: rx
- const: tx

required:
- reg
Expand All @@ -43,6 +51,7 @@ examples:
compatible = "brcm,bcm4908-enet";
reg = <0x80002000 0x1000>;
interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "rx";
interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 87 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "rx", "tx";
};
Loading

0 comments on commit 9d31d23

Please sign in to comment.