Skip to content

Commit

Permalink
[sonic-linux-kernel] add udp_l3mdev_accept kernel upstream patch (son…
Browse files Browse the repository at this point in the history
…ic-net#70)

* [sonic-linux-kernel] add udp_l3mdev_accept kernel upstream patch

This commit adds the udp_l3mdev_accept kernel patch
to the linux kernel. This requires an abiname change
from 4.9.0-7-amd64 to 4.9.0-7-1-amd64

Signed-off-by: Harish Venkatraman <Harish_Venkatraman@dell.com>

* move abiname to 8-2

Signed-off-by: Guohan Lu <gulv@microsoft.com>
  • Loading branch information
vharish02 authored and lguohan committed Jan 20, 2019
1 parent 37734aa commit 3b2114d
Show file tree
Hide file tree
Showing 5 changed files with 314 additions and 7 deletions.
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
SHELL = /bin/bash
.SHELLFLAGS += -e

KERNEL_ABI_MINOR_VERSION = 1
KERNEL_ABI_MINOR_VERSION = 2
KVERSION_SHORT ?= 4.9.0-8-$(KERNEL_ABI_MINOR_VERSION)
KVERSION ?= $(KVERSION_SHORT)-amd64
KERNEL_VERSION ?= 4.9.110
Expand All @@ -20,11 +20,11 @@ ifneq ($(kernel_procure_method), build)
# Downloading kernel

# TBD, need upload the new kernel packages
LINUX_HEADER_COMMON_URL = "https://sonicstorage.blob.core.windows.net/packages/kernel-public/linux-headers-4.9.0-8-common_4.9.110-3+deb9u2_all.deb?sv=2015-04-05&sr=b&sig=LlKqKecY6MDSwFMTxpKErh7FX1Lrvse2yVt3niYnhds%3D&se=2128-02-24T04%3A47%3A48Z&sp=r"
LINUX_HEADER_COMMON_URL = "https://sonicstorage.blob.core.windows.net/packages/kernel-public/linux-headers-$(KVERSION_SHORT)-common_$(KERNEL_VERSION)-$(KERNEL_SUBVERSION)_all.deb?sv=2015-04-05&sr=b&sig=LlKqKecY6MDSwFMTxpKErh7FX1Lrvse2yVt3niYnhds%3D&se=2128-02-24T04%3A47%3A48Z&sp=r"

LINUX_HEADER_AMD64_URL = "https://sonicstorage.blob.core.windows.net/packages/kernel-public/linux-headers-4.9.0-8-amd64_4.9.110-3+deb9u2_amd64.deb?sv=2015-04-05&sr=b&sig=SfQLjiBPPMcRUOmBPvXq2%2F6SsW3ul9%2FlaROplXdGij0%3D&se=2128-02-24T04%3A48%3A38Z&sp=r"
LINUX_HEADER_AMD64_URL = "https://sonicstorage.blob.core.windows.net/packages/kernel-public/linux-headers-$(KVERSION_SHORT)-amd64_$(KERNEL_VERSION)-$(KERNEL_SUBVERSION)_amd64.deb?sv=2015-04-05&sr=b&sig=SfQLjiBPPMcRUOmBPvXq2%2F6SsW3ul9%2FlaROplXdGij0%3D&se=2128-02-24T04%3A48%3A38Z&sp=r"

LINUX_IMAGE_URL = "https://sonicstorage.blob.core.windows.net/packages/kernel-public/linux-image-4.9.0-8-amd64_4.9.110-3+deb9u2_amd64.deb?sv=2015-04-05&sr=b&sig=PNKGpLjELZem0IacUMM1jU%2Ft5ujoVvUb9JzxrUhE1Wk%3D&se=2128-02-24T04%3A48%3A57Z&sp=r"
LINUX_IMAGE_URL = "https://sonicstorage.blob.core.windows.net/packages/kernel-public/linux-image-$(KVERSION_SHORT)-amd64_$(KERNEL_VERSION)-$(KERNEL_SUBVERSION)_amd64.deb?sv=2015-04-05&sr=b&sig=PNKGpLjELZem0IacUMM1jU%2Ft5ujoVvUb9JzxrUhE1Wk%3D&se=2128-02-24T04%3A48%3A57Z&sp=r"

$(addprefix $(DEST)/, $(MAIN_TARGET)): $(DEST)/% :
# Obtaining the Debian kernel packages
Expand Down
306 changes: 306 additions & 0 deletions patch/0025-net-udp_l3mdev_accept-support.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
From 58be55eb5de3a319c0740ba8071440d972c821c3 Mon Sep 17 00:00:00 2001
From: Harish Venkatraman <Harish_Venkatraman@dell.com>
Date: Tue, 25 Sep 2018 09:56:25 -0700
Subject: [PATCH] net: udp_l3mdev_accept support

From 63a6fff353d01da5a22b72670c434bf12fa0e3b8 Mon Sep 17 00:00:00 2001
From: Robert Shearman <rshearma@brocade.com>
Date: Thu, 26 Jan 2017 18:02:24 +0000
Subject: [PATCH] net: Avoid receiving packets with an l3mdev on unbound UDP
sockets

Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.

So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

---
Documentation/networking/ip-sysctl.txt | 7 +++++++
Documentation/networking/vrf.txt | 7 ++++---
include/net/netns/ipv4.h | 4 ++++
net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++
net/ipv4/udp.c | 27 ++++++++++++++++++++-------
net/ipv6/udp.c | 27 ++++++++++++++++++++-------
6 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 3db8c67..7ceeff3 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -737,6 +737,13 @@ tcp_challenge_ack_limit - INTEGER

UDP variables:

+udp_l3mdev_accept - BOOLEAN
+ Enabling this option allows a "global" bound socket to work
+ across L3 master domains (e.g., VRFs) with packets capable of
+ being received regardless of the L3 domain in which they
+ originated. Only valid when the kernel was compiled with
+ CONFIG_NET_L3_MASTER_DEV.
+
udp_mem - vector of 3 INTEGERs: min, pressure, max
Number of pages allowed for queueing by all UDP sockets.

diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
index 755dab8..3918dae 100644
--- a/Documentation/networking/vrf.txt
+++ b/Documentation/networking/vrf.txt
@@ -98,10 +98,11 @@ VRF device:

or to specify the output device using cmsg and IP_PKTINFO.

-TCP services running in the default VRF context (ie., not bound to any VRF
-device) can work across all VRF domains by enabling the tcp_l3mdev_accept
-sysctl option:
+TCP & UDP services running in the default VRF context (ie., not bound
+to any VRF device) can work across all VRF domains by enabling the
+tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
sysctl -w net.ipv4.tcp_l3mdev_accept=1
+ sysctl -w net.ipv4.udp_l3mdev_accept=1

netfilter rules on the VRF device can be used to limit access to services
running in the default VRF context as well.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 7adf438..3917764 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -111,6 +111,10 @@ struct netns_ipv4 {
int sysctl_tcp_fin_timeout;
unsigned int sysctl_tcp_notsent_lowat;

+#ifdef CONFIG_NET_L3_MASTER_DEV
+ int sysctl_udp_l3mdev_accept;
+#endif
+
int sysctl_igmp_max_memberships;
int sysctl_igmp_max_msf;
int sysctl_igmp_llm_reports;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 566cfc5..2006032 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -971,6 +971,17 @@ static struct ctl_table ipv4_net_table[] = {
.extra2 = &one,
},
#endif
+#ifdef CONFIG_NET_L3_MASTER_DEV
+ {
+ .procname = "udp_l3mdev_accept",
+ .data = &init_net.ipv4.sysctl_udp_l3mdev_accept,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+#endif
{ }
};

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aa2a20e..7229028 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -134,6 +134,17 @@ EXPORT_SYMBOL(udp_memory_allocated);
#define MAX_UDP_PORTS 65536
#define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)

+/* IPCB reference means this can not be used from early demux */
+static bool udp_lib_exact_dif_match(struct net *net, struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+ if (!net->ipv4.sysctl_udp_l3mdev_accept &&
+ skb && ipv4_l3mdev_skb(IPCB(skb)->flags))
+ return true;
+#endif
+ return false;
+}
+
static int udp_lib_lport_inuse(struct net *net, __u16 num,
const struct udp_hslot *hslot,
unsigned long *bitmap,
@@ -391,7 +402,8 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum)

static int compute_score(struct sock *sk, struct net *net,
__be32 saddr, __be16 sport,
- __be32 daddr, unsigned short hnum, int dif)
+ __be32 daddr, unsigned short hnum, int dif,
+ bool exact_dif)
{
int score;
struct inet_sock *inet;
@@ -422,7 +434,7 @@ static int compute_score(struct sock *sk, struct net *net,
score += 4;
}

- if (sk->sk_bound_dev_if) {
+ if (sk->sk_bound_dev_if || exact_dif) {
if (sk->sk_bound_dev_if != dif)
return -1;
score += 4;
@@ -447,7 +459,7 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
/* called with rcu_read_lock() */
static struct sock *udp4_lib_lookup2(struct net *net,
__be32 saddr, __be16 sport,
- __be32 daddr, unsigned int hnum, int dif,
+ __be32 daddr, unsigned int hnum, int dif, bool exact_dif,
struct udp_hslot *hslot2,
struct sk_buff *skb)
{
@@ -459,7 +471,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
badness = 0;
udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
score = compute_score(sk, net, saddr, sport,
- daddr, hnum, dif);
+ daddr, hnum, dif, exact_dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
@@ -494,6 +506,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
unsigned short hnum = ntohs(dport);
unsigned int hash2, slot2, slot = udp_hashfn(net, hnum, udptable->mask);
struct udp_hslot *hslot2, *hslot = &udptable->hash[slot];
+ bool exact_dif = udp_lib_exact_dif_match(net, skb);
int score, badness, matches = 0, reuseport = 0;
u32 hash = 0;

@@ -506,7 +519,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,

result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, skb);
+ exact_dif, hslot2, skb);
if (!result) {
unsigned int old_slot2 = slot2;
hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
@@ -521,7 +534,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,

result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, skb);
+ exact_dif, hslot2, skb);
}
return result;
}
@@ -530,7 +543,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
badness = 0;
sk_for_each_rcu(sk, &hslot->head) {
score = compute_score(sk, net, saddr, sport,
- daddr, hnum, dif);
+ daddr, hnum, dif, exact_dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4db5f54..1317d2f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -55,6 +55,16 @@
#include <trace/events/skb.h>
#include "udp_impl.h"

+static bool udp6_lib_exact_dif_match(struct net *net, struct sk_buff *skb)
+{
+#if defined(CONFIG_NET_L3_MASTER_DEV)
+ if (!net->ipv4.sysctl_udp_l3mdev_accept &&
+ skb && ipv6_l3mdev_skb(IP6CB(skb)->flags))
+ return true;
+#endif
+ return false;
+}
+
static u32 udp6_ehashfn(const struct net *net,
const struct in6_addr *laddr,
const u16 lport,
@@ -118,7 +128,7 @@ static void udp_v6_rehash(struct sock *sk)
static int compute_score(struct sock *sk, struct net *net,
const struct in6_addr *saddr, __be16 sport,
const struct in6_addr *daddr, unsigned short hnum,
- int dif)
+ int dif, bool exact_dif)
{
int score;
struct inet_sock *inet;
@@ -149,7 +159,7 @@ static int compute_score(struct sock *sk, struct net *net,
score++;
}

- if (sk->sk_bound_dev_if) {
+ if (sk->sk_bound_dev_if || exact_dif) {
if (sk->sk_bound_dev_if != dif)
return -1;
score++;
@@ -165,7 +175,7 @@ static int compute_score(struct sock *sk, struct net *net,
static struct sock *udp6_lib_lookup2(struct net *net,
const struct in6_addr *saddr, __be16 sport,
const struct in6_addr *daddr, unsigned int hnum, int dif,
- struct udp_hslot *hslot2,
+ bool exact_dif, struct udp_hslot *hslot2,
struct sk_buff *skb)
{
struct sock *sk, *result;
@@ -176,7 +186,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
badness = -1;
udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
score = compute_score(sk, net, saddr, sport,
- daddr, hnum, dif);
+ daddr, hnum, dif, exact_dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
@@ -212,6 +222,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
unsigned short hnum = ntohs(dport);
unsigned int hash2, slot2, slot = udp_hashfn(net, hnum, udptable->mask);
struct udp_hslot *hslot2, *hslot = &udptable->hash[slot];
+ bool exact_dif = udp6_lib_exact_dif_match(net, skb);
int score, badness, matches = 0, reuseport = 0;
u32 hash = 0;

@@ -223,7 +234,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
goto begin;

result = udp6_lib_lookup2(net, saddr, sport,
- daddr, hnum, dif,
+ daddr, hnum, dif, exact_dif,
hslot2, skb);
if (!result) {
unsigned int old_slot2 = slot2;
@@ -239,7 +250,8 @@ struct sock *__udp6_lib_lookup(struct net *net,

result = udp6_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, skb);
+ exact_dif, hslot2,
+ skb);
}
return result;
}
@@ -247,7 +259,8 @@ struct sock *__udp6_lib_lookup(struct net *net,
result = NULL;
badness = -1;
sk_for_each_rcu(sk, &hslot->head) {
- score = compute_score(sk, net, saddr, sport, daddr, hnum, dif);
+ score = compute_score(sk, net, saddr, sport, daddr, hnum, dif,
+ exact_dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
--
2.7.4

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Update debian abiname to 8-1
Update debian abiname to 8-2

From: Guohan Lu <gulv@microsoft.com>

Expand All @@ -14,7 +14,7 @@ index cfc4409..fae58f7 100644
@@ -1,5 +1,5 @@
[abi]
-abiname: 8
+abiname: 8-1
+abiname: 8-2
ignore-changes:
__cpuhp_*
bpf_analyzer
2 changes: 1 addition & 1 deletion patch/preconfig/series
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
changelog.patch
packaging-update-abiname-to-8-1.patch
packaging-update-abiname-to-8-2.patch
1 change: 1 addition & 0 deletions patch/series
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ linux-4.13-thermal-intel_pch_thermal-Fix-enable-check-on.patch
0022-platform-x86-mlx-platform-backport-from-4.19.patch
0023-platform-x86-mlx-platform-Add-support-for-register-a.patch
0024-config-mellanox-fan-configuration.patch
0025-net-udp_l3mdev_accept-support.patch
#
# This series applies on GIT commit 1451b36b2b0d62178e42f648d8a18131af18f7d8
# Tkernel-sched-core-fix-cgroup-fork-race.patch
Expand Down

0 comments on commit 3b2114d

Please sign in to comment.