Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LS1021ATSN: Could not enable to tag VLAN when using Mode 3(best_effort_vlan_filtering). #90

Open
Leo1726 opened this issue Apr 5, 2021 · 10 comments

Comments

@Leo1726
Copy link

Leo1726 commented Apr 5, 2021

Enironment

LS1021ATSN
- Ubuntu 18.04
- OpenIL v1.9

Step to reproduce

Connect as Node 1(192.168.200.222)-->{swp4}Switch(192.168.200.23){swp2}-->Node 2(192.168.200.220)

Expected behavior:

  1. Node 1 send normal Ethernet frame, then switch could tag VID=1 and output it to Node 2.
  2. Switch send frame with tag VID=1, and output to Node 2.

Test Method:

Use ping IP, then observe on Wireshark v3.3.4

VLAN Config

port vlan ids
swp2	 1 PVID Egress Untagged
swp4	 1 PVID
br0	 1 PVID Egress Untagged

Log

Actual Behavior

image

Node 1 ping Node 2: 
$ ping 192.168.200.220

image

  • This realized expected behavior 1.
  • Then, let best_effort_vlan_filtering true, Mode 3.
    image
Switch ping Node 2: 
$ ping 192.168.200.220

image

  • This realized expected behavior 2.
  • However, if I try to use Node 1 ping Node 2, it works with no VLAN Tag.
Node 1 ping Node 2: 
$ ping 192.168.200.220

image
- Dose it mean that expected behavior 1 & 2 could not coexist, which means Mode 3 is not the expansion of Mode 2???

@vladimiroltean
Copy link
Contributor

vladimiroltean commented Apr 7, 2021

Can you please let me know if these 3 kernel patches solve your issue?
1:

From 2349286d74897403311c6d2cfb8e2ead6d85b364 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Wed, 7 Apr 2021 19:01:55 +0300
Subject: [PATCH 1/3] net: dsa: sja1105: use the bridge pvid in
 best_effort_vlan_filtering mode

The best-effort VLAN filtering mode is the sja1105 driver's attempt to
allow frame tagging towards the CPU with a unique VLAN ID corresponding
to the source port at the same time as allowing the bridge to freely
alter the VLAN table. It works by making the switch classify all untagged
ingress traffic to a secret pvid managed by net/dsa/tag_8021q.c.
Also, VLAN-tagged frames are retagged to another secret VLAN managed by
tag_8021q. Both these VLANs managed by tag_8021q are called "rx_vid".
The retagged rx_vid has some bits which encode a "sub-VLAN", and the
pvid-based rx_vid has those sub-VLAN bits set to zero. Software looks at
the rx_vid and knows what port and original VLAN the packet came from.

There is a huge oversight in the setup created by the sja1105 driver for
the best-effort VLAN filtering mode. That is:

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp4 master br0
bridge vlan del dev swp4 vid 1
bridge vlan add dev swp4 vid 1 pvid

Then we send an untagged packet from swp2 and expect the switch to
forward it to swp4 and deliver it with VLAN 1 added. It is forwarded,
except the packet is egress-untagged.

This happens because of the way in which tag_8021q works (there is a
detailed picture in net/dsa/tag_8021q.c, above dsa_8021q_setup_port).
In the example above, the tag_8021q pvid of swp2 is 1026. This VLAN is
added to all other switch ports, to allow untagged traffic forwarding.
On all ports, the tag_8021q pvid of 1026 is installed as egress-untagged,
in order to hide the existence of DSA tag_8021q from the user.

This is fine, except when the real (bridge) pvid is egress-tagged, it
isn't. The user _wants_ to see this VLAN in the outside world, and we
can't really do that, because the sja1105 driver doesn't use that VLAN
but another one which the user knows nothing about.

As a side note, this only happens for untagged traffic on the ingress
port. If the packet arrives as pvid-tagged (i.e. tagged with VID 1)
on a port with tag_8021q, then the packet is classified to VLAN ID 1
(the bridge pvid) as opposed to the tag_8021q pvid. So we don't have
the same problem.

Consider the following more generic example:

Port             | sw0p0 sw0p1 sw0p2 sw0p3  |   sw1p0 sw1p1 sw1p2 sw1p3
=================+==========================+=============================
tag_8021q rx_vid | 1024  1025  1026  1027   |   1088  1089  1090  1091
Bridge VLAN      |  1     1     2     1     |    3     2     2     1
Bridge flags     | pvid        pvid         |   pvid  pvid
                 | untag untag              |

VLAN 1024 is added to sw0p1, sw0p2, sw0p3, sw1p0, sw1p1, sw1p2, sw1p3
as untagged.

The following pattern emerges:
A VLAN which is pvid on any port in the bridging domain (therefore has a
tag_8021q rx_vid) and is egress-tagged on another (potentially the same)
port will leak the tag_8021q VLAN. Every egress-tagged bridge VLAN that
is a pvid on another port must have a retagging rule from the tag_8021q
rx_vid to the bridge VLAN.

So the data would indicate that at the very least, we should retag the
tag_8021q pvid back towards the original bridge pvid on the egress ports
where this bridge VLAN is installed as egress-tagged. We could do that,
except:
- We only have 32 VLAN retagging entries in the sja1105, and we do use
  them for other purposes too.
- VLAN retagging works in hardware by making use of a special "loopback
  port" which is limited to only 1Gbps of bandwidth. When using the
  loopback port for traffic retagged towards the CPU that's fine because
  the CPU port is gigabit anyway, but when we start involving it in the
  autonomous forwarding data path we have a problem, because we'd
  bottleneck it.

So we take a step back and think a bit more about the problem.

Due to the need to plug another hole - pvid-tagged traffic is not seen
with a tag_8021q rx_vid by software, but with the bridge pvid, say 1 -
sja1105_build_subvlans() already creates VLAN retagging entries towards
the CPU even for the bridge pvid, not just for tagged VLANs.
That is to say, even if we let the bridge pvid be the hardware's pvid in
best-effort VLAN filtering mode, untagged and pvid-tagged packets will
still arrive at the CPU as tagged with the tag_8021q rx_vid, because
they will both hit the same retagging rule.

But that actually means we don't _need_ the tag_8021q module to dictate
a pvid value for us. We can rely on retagging just fine, and let the
bridge dictate the pvid. This solves the problem in a much cleaner way:
because the packets in the autonomous data path are now classified to
the bridge pvid, the egress-tagged setting on the egress port works just
fine.

[ note that this means we can always rely on VLAN retagging towards the
  CPU, and never on changing the port's pvid. And because the pvid is no
  longer managed by tag_8021q, we can even go as far as enable
  Independent VLAN Learning again. But I digress, that is an
  optimization to make for net-next, this is just to fix a bug ]

The commit I'm blaming is the one which introduced the problem, but the
fix relies on a mechanism that was only added a few commits later:
3f01c91aab92 ("net: dsa: sja1105: implement VLAN retagging for dsa_8021q
sub-VLANs"). This is fine, since they all went into the same kernel
release (v5.8).

Fixes: 2cafa72e516f ("net: dsa: sja1105: add a new best_effort_vlan_filtering devlink parameter")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/sja1105/sja1105_main.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index d9c198ca0197..8b380ccd95cf 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -2240,10 +2240,10 @@ static int sja1105_commit_pvid(struct sja1105_private *priv)
 	struct list_head *vlan_list;
 	int rc = 0;
 
-	if (priv->vlan_state == SJA1105_VLAN_FILTERING_FULL)
-		vlan_list = &priv->bridge_vlans;
-	else
+	if (priv->vlan_state == SJA1105_VLAN_UNAWARE)
 		vlan_list = &priv->dsa_8021q_vlans;
+	else
+		vlan_list = &priv->bridge_vlans;
 
 	list_for_each_entry(v, vlan_list, list) {
 		if (v->pvid) {
@@ -2290,6 +2290,21 @@ sja1105_build_dsa_8021q_vlans(struct sja1105_private *priv,
 	list_for_each_entry(v, &priv->dsa_8021q_vlans, list) {
 		int match = v->vid;
 
+		/* In best-effort VLAN filtering mode, the pvid of the port is
+		 * no longer the tag_8021q rx_vid, but the bridge pvid is.
+		 * The tag_8021q rx_vid is just used for retagging the bridge
+		 * pvid towards the CPU. So let's install only the rx_vid
+		 * values which are strictly required. This means that the
+		 * rxvlan is still installed on the port on which tag_8021q
+		 * thinks it must be pvid (the source port) - this is required
+		 * by the retagging table - but not on the ports where this
+		 * VLAN isn't a pvid (the destination ports).
+		 */
+		if (priv->vlan_state == SJA1105_VLAN_BEST_EFFORT &&
+		    vid_is_dsa_8021q_rxvlan(v->vid) &&
+		    dsa_8021q_rx_subvlan(v->vid) == 0 && !v->pvid)
+			continue;
+
 		new_vlan[match].vlanid = v->vid;
 		new_vlan[match].vmemb_port |= BIT(v->port);
 		new_vlan[match].vlan_bc |= BIT(v->port);
-- 
2.25.1

2:

From ecd4451d0d3e8b46951224dec0e3521a4fbafbe4 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Wed, 7 Apr 2021 20:22:24 +0300
Subject: [PATCH 2/3] net: dsa: sja1105: use 4095 as the private VLAN for
 untagged traffic

One thing became visible when writing the blamed commit, and that was
that STP and PTP frames injected by net/dsa/tag_sja1105.c using the
deferred xmit mechanism are always classified to the pvid of the CPU
port, regardless of whatever VLAN there might be in these packets.

So a decision needed to be taken regarding the mechanism through which
we should ensure that delivery of STP and PTP traffic is possible when
we are in a VLAN awareness mode that involves tag_8021q. This is because
tag_8021q is not concerned with managing the pvid of the CPU port, since
as far as tag_8021q is concerned, no traffic should be sent as untagged
from the CPU port. So we end up not actually having a pvid on the CPU
port if we only listen to tag_8021q, and unless we do something about it.

The decision taken at the time was to keep VLAN 1 in the list of
priv->dsa_8021q_vlans, and make it a pvid of the CPU port. This ensures
that STP and PTP frames can always be sent to the outside world.

However there is a problem. If we do the following while we are in
the best_effort_vlan_filtering=true mode:

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
bridge vlan del dev swp2 vid 1

Then untagged and pvid-tagged frames should be dropped. But we observe
that they aren't, and this is because of the precaution we took that VID
1 is always installed on all ports.

So clearly VLAN 1 is not good for this purpose. What about VLAN 0?
Well, VLAN 0 is managed by the 8021q module, and that module wants to
ensure that 802.1p tagged frames are always received by a port, and are
always transmitted as VLAN-tagged (with VLAN ID 0). Whereas we want our
STP and PTP frames to be untagged if the stack sent them as untagged -
we don't want the driver to just decide out of the blue that it adds
VID 0 to some packets.

So what to do?

Well, there is one other VLAN that is reserved, and that is 4095:
$ ip link add link swp2 name swp2.4095 type vlan id 4095
Error: 8021q: Invalid VLAN id.
$ bridge vlan add dev swp2 vid 4095
Error: bridge: Vlan id is invalid.

After we made this change, VLAN 1 is indeed forwarded and/or dropped
according to the bridge VLAN table, there are no further alterations
done by the sja1105 driver.

Fixes: ec5ae61076d0 ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/sja1105/sja1105.h      |  1 +
 drivers/net/dsa/sja1105/sja1105_main.c | 21 +++++++++------------
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index f9e87fb33da0..6957cb853a70 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -13,6 +13,7 @@
 #include <linux/mutex.h>
 #include "sja1105_static_config.h"
 
+#define SJA1105_DEFAULT_VLAN		(VLAN_N_VID - 1)
 #define SJA1105_NUM_PORTS		5
 #define SJA1105_NUM_TC			8
 #define SJA1105ET_FDB_BIN_SIZE		4
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 8b380ccd95cf..61133098f588 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -321,6 +321,13 @@ static int sja1105_init_l2_lookup_params(struct sja1105_private *priv)
 	return 0;
 }
 
+/* Set up a default VLAN for untagged traffic injected from the CPU
+ * using management routes (e.g. STP, PTP) as opposed to tag_8021q.
+ * All DT-defined ports are members of this VLAN, and there are no
+ * restrictions on forwarding (since the CPU selects the destination).
+ * Frames from this VLAN will always be transmitted as untagged, and
+ * neither the bridge nor the 8021q module cannot create this VLAN ID.
+ */
 static int sja1105_init_static_vlan(struct sja1105_private *priv)
 {
 	struct sja1105_table *table;
@@ -330,17 +337,13 @@ static int sja1105_init_static_vlan(struct sja1105_private *priv)
 		.vmemb_port = 0,
 		.vlan_bc = 0,
 		.tag_port = 0,
-		.vlanid = 1,
+		.vlanid = SJA1105_DEFAULT_VLAN,
 	};
 	struct dsa_switch *ds = priv->ds;
 	int port;
 
 	table = &priv->static_config.tables[BLK_IDX_VLAN_LOOKUP];
 
-	/* The static VLAN table will only contain the initial pvid of 1.
-	 * All other VLANs are to be configured through dynamic entries,
-	 * and kept in the static configuration table as backing memory.
-	 */
 	if (table->entry_count) {
 		kfree(table->entries);
 		table->entry_count = 0;
@@ -353,9 +356,6 @@ static int sja1105_init_static_vlan(struct sja1105_private *priv)
 
 	table->entry_count = 1;
 
-	/* VLAN 1: all DT-defined ports are members; no restrictions on
-	 * forwarding; always transmit as untagged.
-	 */
 	for (port = 0; port < ds->num_ports; port++) {
 		struct sja1105_bridge_vlan *v;
 
@@ -366,15 +366,12 @@ static int sja1105_init_static_vlan(struct sja1105_private *priv)
 		pvid.vlan_bc |= BIT(port);
 		pvid.tag_port &= ~BIT(port);
 
-		/* Let traffic that don't need dsa_8021q (e.g. STP, PTP) be
-		 * transmitted as untagged.
-		 */
 		v = kzalloc(sizeof(*v), GFP_KERNEL);
 		if (!v)
 			return -ENOMEM;
 
 		v->port = port;
-		v->vid = 1;
+		v->vid = SJA1105_DEFAULT_VLAN;
 		v->untagged = true;
 		if (dsa_is_cpu_port(ds, port))
 			v->pvid = true;
-- 
2.25.1

3:

From 732fcff3ab3ecf9d473d0ade082fd9d373cf392a Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Wed, 7 Apr 2021 20:50:56 +0300
Subject: [PATCH 3/3] net: dsa: sja1105: update existing VLANs from the bridge
 VLAN list

When running this sequence of operations:

ip link add br0 type bridge vlan_filtering 1
ip link set swp4 master br0
bridge vlan add dev swp4 vid 1

We observe the traffic sent on swp4 is still untagged, even though the
bridge has overwritten the existing VLAN entry:

port    vlan ids
swp4     1 PVID

br0      1 PVID Egress Untagged

This happens because we didn't consider that the 'bridge vlan add'
command just overwrites VLANs like it's nothing. We treat the 'vid 1
pvid untagged' and the 'vid 1' as two separate VLANs, and the first
still has precedence when calling sja1105_build_vlan_table. Obviously
there is a disagreement regarding semantics, and we end up doing
something unexpected from the PoV of the bridge.

Let's actually consider an "existing VLAN" to be one which is on the
same port, and has the same VLAN ID, as one we already have, and update
it if it has different flags than we do.

The first blamed commit is the one introducing the bug, the second one
is the latest on top of which the bugfix still applies.

Fixes: ec5ae61076d0 ("net: dsa: sja1105: save/restore VLANs using a delta commit method")
Fixes: 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/sja1105/sja1105_main.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 61133098f588..5e40ee14030a 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -2829,11 +2829,22 @@ static int sja1105_vlan_add_one(struct dsa_switch *ds, int port, u16 vid,
 	bool pvid = flags & BRIDGE_VLAN_INFO_PVID;
 	struct sja1105_bridge_vlan *v;
 
-	list_for_each_entry(v, vlan_list, list)
-		if (v->port == port && v->vid == vid &&
-		    v->untagged == untagged && v->pvid == pvid)
+	list_for_each_entry(v, vlan_list, list) {
+		if (v->port == port && v->vid == vid) {
 			/* Already added */
-			return 0;
+			if (v->untagged == untagged && v->pvid == pvid)
+				/* Nothing changed */
+				return 0;
+
+			/* It's the same VLAN, but some of the flags changed
+			 * and the user did not bother to delete it first.
+			 * Update it and trigger sja1105_build_vlan_table.
+			 */
+			v->untagged = untagged;
+			v->pvid = pvid;
+			return 1;
+		}
+	}
 
 	v = kzalloc(sizeof(*v), GFP_KERNEL);
 	if (!v) {
-- 
2.25.1

The third patch may not apply cleanly to the OpenIL kernel - it isn't critical though.

Thanks.

@Leo1726
Copy link
Author

Leo1726 commented Apr 8, 2021

I will try, thanks so much. By the way, I met quite similar situation in LS1028ATSN. How could I solve that since LS1028ATSN has combined mode 2 and 3 together ?

@vladimiroltean
Copy link
Contributor

What do you mean exactly by "combined mode 2 and 3 together"? You mean that one switch operates in mode 2 and another in mode 3?

@Leo1726
Copy link
Author

Leo1726 commented Apr 8, 2021

Sorry for misunderstanding. I meant that I wanted to use LS1028ATSN to realize my expected behavior:

  1. Node 1 send normal Ethernet frame, then switch could tag VID=1 and output it to Node 2.
  2. Switch send frame with tag VID=1, and output to Node 2.

I found quite similar probelm as LS1021ATSN. I just could realize expected behavior 1 or 2, not both of them.

Thanks!

@vladimiroltean
Copy link
Contributor

Understood. There is nothing specific to the board with this issue. I can reproduce it on the LS1028A-TSN too.
Please note that while my patches fix the issue you have reported, they introduce another issue with PTP RX timestamping in best-effort VLAN filtering mode. I am currently investigating it. Sorry for the trouble.

@Leo1726
Copy link
Author

Leo1726 commented Apr 8, 2021

So, how could I reproduce these three patches on LS1028A-TSN since there is no sja1105 driver on it? I couldn't find these files to change them.

@vladimiroltean
Copy link
Contributor

The kernel is located here: https://github.com/openil/linux
And the driver for the switch is at its usual location: drivers/net/dsa/sja1105/
To add the patches to the kernel you can either:
(a) add the patches to the OpenIL/Buildroot build system: https://buildroot.org/downloads/manual/manual.html
(b) compile the kernel using a standalone toolchain:

export ARCH=arm64
make defconfig lsdk.config
make -j 8 Image.gz dtbs 2>&1 | tee build.log && mkimage -A arm64 -O linux -T kernel -C gzip -a 0x80080000 -e 0x80080000 -d arch/arm64/boot/Image.gz uImage && cp uImage arch/arm64/boot/dts/freescale/*ls1028a*.dtb /srv/tftpboot/ls1028/

@vladimiroltean
Copy link
Contributor

vladimiroltean commented Apr 19, 2021

Hi,

I don't think there is a way to solve the problem without limitations. Let me
try to explain what the options are.
DSA models switches as port multiplexers with hardware-offloaded forwarding, so
packets sent to/from the network stack towards the switch need to be presented
separately for each front-facing switch port. This is done because DSA switch
network interfaces are first and foremost regular network devices, and must be
able to support any higher-level protocol (for example, IP forwarding in
software).
The SJA1105 switch is somewhat special in that it does not offer any indication
about the switch port on which a packet came when it forwards that packet to
the host port. Link-local traffic (STP, PTP) is an exception because the
respective protocols explicitly require explicitly targeting a physical port as
opposed to a bridging domain, so the SJA1105 switch offers a native way to
send/receive this kind of packets to/from a specific physical port.
To work around the hardware limitation and offer per-port general purpose
traffic termination, the DSA driver for SJA1105 makes use of
net/dsa/tag_8021q.c and reserves two VLAN ranges:

  • 1024-2047 for "RX VLANs". The packets received on a switch port will be
    tagged with a VLAN ID derived from the number of the switch port, and
    software will decode this VLAN ID and replace it with the real value.
    For autonomous forwarding, the RX VLAN of a port is added to the membership
    list of all the other ports too, and this VLAN is made egress-untagged so
    that it is popped and clients attached to the switch do not see it.
  • 2048-3071 for "TX VLANs". Every TX VLAN contains just the CPU port and one of
    the front-facing ports, and is used for precise steering of packets from the
    network stack towards the corresponding physical ports.

When the switch ports are VLAN-unaware (either bridged or standalone), all
packets received on a port are classified to that port's RX VLAN from
tag_8021q (for example 1024), regardless of the VLAN ID from the packet - that
is to say, 1024 is the pvid. So 2 stations having MAC addresses
00:01:02:03:04:05 (connected to swp0) and 00:01:02:03:04:06 (connected to swp1)
will be learnt by the switch in VLAN 1024 and 1025. To ensure that packets do
not get forwarded between ports based on flooding (the destinations are not
unknown to the FDB), we enable Shared VLAN Learning which makes the switch look
only at the destination MAC address for the forwarding decision lookup, and
ignore the VLAN ID. So with this setting, it works because VLAN 1024 and 1025
span the same set of ports.
For ports under a VLAN-aware bridge, there is no simple way to deduce which
port a packet came from, because the bridge layer controls the VLANs and their
flags (pvid, untagged). And imposing restrictions such as "you cannot have the
same VLAN installed on two ports" would be unproductive, because it would not
allow forwarding of any packet except to the CPU and back. So nothing is done
by default, DSA does not attempt to deduce the source port on RX and this is
why general purpose RX/TX through the swpN net devices is unavailable when they
operate in this mode. PTP and STP are still available because they rely on
packet traps and not on VLANs.
The third mode (best-effort VLAN filtering) is an attempt to combine modes one
and two. We allow the bridge to add VLANs to the SJA1105 switch as long as they
do not overlap with the 1024-3071 range used by tag_8021q. And for they most
part, they indeed do not overlap, with one exception: the pvid (more on this
later).
For every VLAN added by the user (either on the bridge or through 8021q
upper interfaces), the code for best-effort VLAN filtering creates a mapping in
the VLAN Retagging table, such that when a packet with VLAN 100 is flooded, the
copy of it which reaches the CPU port will be received with a VLAN ID managed
by tag_8021q. Software can again look up this mapping table based on the
tag_8021q VLAN and identify the source port, switch ID and original VLAN ID,
and then send the packet up the stack for further processing. This part works
as expected. What doesn't is the fact that the bridge is led into having an
incorrect idea of what is the pvid of each physical port. When you set the pvid
on the first switch port to 1, in hardware, the pvid is still that of the
tag_8021q module (1024) and not that of the bridge (1). Then, when untagged
packets are forwarded to a second bridge port where VLAN 1 is egress-tagged,
the expectation is that untagged packets will have the VLAN tagged pushed on
egress. But the reality is that they don't, because, even though VLAN 1 is
committed to hardware with the proper port membership and flags, ingress
untagged packets are not classified to it, so the egress tagging settings have
no effect. If you would configure tag_8021q to set VLAN ID 1024 as
egress-tagged on the second port, you would see VLAN 1024 pushed to untagged
packets received on the first port. But this isn't the behavior you would like
to see.
The solution in the patches I sent earlier was to change which pvid is
committed to hardware in mode 3. Before it was the pvid from tag_8021q, and
with the changes it is the pvid from the bridge layer. When the pvid committed
to hardware is 1, your use case works as expected.
However, what breaks is PTP, and this has to do with the way in which VLAN
retagging works on SJA1105.
Despite its name, the "VLAN retagging" table doesn't quite do VLAN retagging,
but more like 'cloning packets in a new VLAN'. Otherwise said, the packets
which hit VLAN retagging rules are sent through an internal loopback port
running at 1Gbps, and this port generates new packets, but also does not drop
the original ones. The best workaround I've found so far for obtaining the real
'VLAN retagging' effect was to force the dropping of the original,
pre-retagging packets by removing the egress port from the broadcast domain of
the original VLAN. For example, if there is a VLAN retagging rule for ingress
VLAN 1 on port 0 towards egress VLAN 1024 on port 4, you need to remove port 4
from the broadcast domain of VLAN 1 in order to suppress the original packet
from being sent as a duplicate on port 4.
PTP and STP packets are untagged, so they are classified to the port's pvid.
Before the changes I posted here, these packets were classified to the
tag_8021q pvid (1024 etc), and after my changes they get classified to the
bridge pvid (1). But for the bridge pvid there is a VLAN retagging rule towards
the CPU, and PTP/STP packets hit it too (even though there is a host trapping
entry, that does not prevent them from getting retagged.
When PTP packets are retagged and the original packets are suppressed, the RX
timestamp associated with that original packet is lost too. For the retagged
packet, the switch does not take another timestamp corresponding to the moment
when it went through the loopback port, and does not transfer the PTP timestamp
from the original packet either. It does nothing, so the PTP RX timestamp that
the CPU sees is zero. Needless to say, PTP cannot work if this happens.
I am still investigating this, but it doesn't look like it is possible to tell
the switch to bypass VLAN retagging for PTP, such as by classifying
host-trapped packets to a VLAN ID of their own. So there is no solution which
addresses this directly.
To keep PTP working on swp0 while at the same time we have DSA see untagged
traffic as having VLAN ID 1024 on the CPU port, I only see two possibilities:

  1. Keep swp0's pvid as 1 (the bridge pvid), and perform retagging of VLAN 1 on
    swp0 towards port 4 (the host port) and VLAN 1024, but do not drop the
    original packets, instead keep both, and drop what you don't need in
    software. This limits the maximum useful bandwidth of the CPU port to 500
    Mbps.
  2. Keep swp0's pvid as 1024 (the tag_8021q pvid) and perform retagging on the
    hardware offload data path, i.e. between VLAN 1024 on swp0, and VLAN 1 on
    swp1 if VLAN 1 is configured as an egress-tagged VLAN on that port. The
    disadvantage here is that we are creating a bottleneck for all forwarded
    traffic which needs to be retagged, since the sum of all retagged traffic
    cannot exceed 1Gbps due to the implementation using a loopback port. This is
    not a problem at the moment because the CPU port is the only destination for
    retagged VLANs, and it is 1Gbps anyway.

Additionally, there is one more possibility, which is to delete everything,
take a step back and start from scratch. We recognize that the limitation is
that when a port is under a VLAN-aware bridge, we cannot deduce the exact
source port that packets came from. But, overall, we still know it came from
the bridge. Standalone ports can still use tag_8021q, so when the DSA code
receives a packet tagged with a VLAN between 1024-2047, it can assign skb->dev
to the correct port. But when it receives a VLAN ID out of this range (between
1 and 1023, or between 3072 and 4094) we could modify DSA to support an
"imprecise RX" procedure: we just set skb->dev to br0 directly. Actually we
can't do quite that, we still need to set skb->dev to a switch port, but theh
point is that we can statically predetermine, based on VLAN ID and STP state,
one switch port that could act as 'the' conduit port towards the bridge for
imprecise RX. As long as the ingress VLAN ID is present in the bridge's VLAN
table, and the ingress port is in the LEARNING or FORWARDING STP state, it
really doesn't matter if the source port is correct or not, as far as the
bridge is concerned. This is because we set skb->offload_fwd_mark = 1, which is
an indication to the bridge that the packet was already forwarded in hardware
towards all the ports in the same forwarding domain as the one through which
this packet was received. So if the designated port for imprecise RX is swp0,
the bridge does not need to, say, flood the packet itself towards swp1, swp2
etc. So the designated port could have been just as well swp1 or swp2. But on
the other hand, if the sja1105 is bridged with a foreign interface such as the
eTSEC ports eth0 or eth1, this still works, because the sja1105 ports have a
different bridge port offload_fwd_mark than the gianfar ports.
And TX is still precise, because we still rely on the tag_8021q TX VLANs for
transmission. If we need to transmit a VLAN-tagged packet, we simply
encapsulate it in a tag_8021q VLAN outer header for DSA, we tell the switch to
strip this outer header, and this works just fine.
So basically, with imprecise RX, the only thing will not work is that raw
AF_PACKET sockets installed on swpN (where swpN is not the designated port)
will not be able to see RX traffic if swpN is under a VLAN-aware bridge. But
the bridge itself, and everything that comes afterwards, will be able to see RX
traffic. So IP termination through br0 is still functional.
You could in theory support multiple VLAN-aware bridges as long as there is no
VLAN in common between them (because the DSA tagger will have no way to deduce
if a VLAN-tagged packet is coming from one bridge or the other), but in
practice I fail to see any reason why this would be useful, given the fact that
you can achieve absolutely the same port isolation just by using a single
VLAN-aware bridge and just adding some ports in a VLAN and some ports in
another. For VLAN-unaware bridges it's a different story, you can have multiple
VLAN-unaware bridges spanning the switch tree. Also, you cannot have mixtures
of VLAN-aware and VLAN-unaware bridges, because VLAN filtering is a global
setting in the switch and not per port.
Additionally, there might be some limitations in the way 8021q uppers are
handled, if there is a VLAN-aware bridge in the system. For example, you cannot
have a VLAN interface with the same VLAN ID as a bridge VLAN, and this case is
already caught by DSA and denied. But I am talking about 8021q uppers on top of
standalone interfaces (for example you have swp0 and swp1 under a VLAN-aware
br0, and you want to install swp2.100 on top of swp2). The question is which
VLAN would be best to be sent to the CPU on behalf of swp0.100. If we use VLAN
100, we would have to keep some sort of lookup table (or maybe extend the one
we use for imprecise RX towards the bridge) such that received traffic with
VLAN 100 will set skb->dev = swp2, but this will mean that swp3.100 is not
possible. Also, having VLAN 100 as a bridge VLAN is not possible either.
To allow having both swp2.100 and swp3.100, the only option is to use VLAN
retagging again, such that VLAN 100 coming from swp2 is seen by DSA as having
VLAN ID 1042, and VLAN 100 coming from swp3 is seen as having 1043. But again
there are the pitfalls of VLAN retagging, which are that:

  • The CPU port needs to be removed from the broadcast domain of the original
    pre-retagging VLAN (100), which means that the bridge still cannot use VLAN
    100 either, since that will never reach the CPU anymore (either this, or
    accept duplicates eliminated in software).
  • PTP will not be functional through 8021q uppers, because RX timestamping
    doesn't work with retagged packets. It will just work if untagged. If you
    need VLAN-tagged PTP to work, it will have to be on top of a bridge port, or
    for the VLAN awareness of the switch to be turned off.

Apart from the limitations, there would be some advantages to the imprecise RX
mode. It would basically be more similar to mode 2 (fully VLAN aware) than it
is to mode 3. So we could enable Independent VLAN Learning (so you could have
the same MAC address learnt in multiple VLANs, with potentially different
destination ports), and you would have access to way more bridge VLANs than in
mode 3 (2K vs 32).
So in order to know how to proceed further, I need to know your exact
requirements regarding what must work and what mustn't. Thanks for the
understanding, and sorry for the huge afterthought.

@Leo1726
Copy link
Author

Leo1726 commented Apr 28, 2021

Hi, vladimiroltean, thanks a lot!
I think maybe first probable solution could give a quicker way to solve it.
Btw, if the third one could realize, that will be better.

Keep swp0's pvid as 1 (the bridge pvid), and perform retagging of VLAN 1 on
swp0 towards port 4 (the host port) and VLAN 1024, but do not drop the
original packets, instead keep both, and drop what you don't need in
software. This limits the maximum useful bandwidth of the CPU port to 500
Mbps.

@vladimiroltean
Copy link
Contributor

Hello,
Sorry for the long delay.
The git tree linked below is an implementation of the proposed method 3 (delete the VLAN retagging code and add support for imprecise RX and imprecise TX through a VLAN-aware bridge):
https://github.com/vladimiroltean/linux/tree/sja1105-bridge-fwd-offload-v1
The development was done on top of the mainline kernel, which is at v5.13.
Please let me know if it solves all issues you are seeing, then we can discuss backporting to v5.10 or other ways of integrating these changes into OpenIL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants