Mclag-HLD document #325

shine4chen · 2019-01-23T02:14:17Z

add mclag hld document

shine4chen · 2019-01-23T05:56:24Z

Sorry to submit it long time from community review meeting. Please review it. Nephos plan to submit code soon to catch up with 201903 branch. @lguohan

lguohan · 2019-02-28T06:48:32Z

please describe why ebfilter is used?

shine4chen · 2019-02-28T09:45:43Z

please describe why ebfilter is used?
We use ebfilter to isolate mclag peer link and mclag member port in linux kernel. In asic we use acl mechanism to do it.

…ffic from peer-link to mclag enable LAG. No SAI changed is required for MCLAG support. 2. mclag docker can start on demand. 3. clarify that redundancy peer link can be used for peer-link broken scenario 4. add diagram number to improve reader experience 5. update document version to 0.4

shine4chen · 2019-03-07T10:37:03Z

@lguohan I have revised mclag HLD document per community review meeting opnion. Please help to review and approve it if appropriate.

Signed-off-by: shine <shine.chen@nephosinc.com>

Signed-off-by: shine.chen <shine.chen@nephosinc.com>

1. sync mac-address from orphan port 2. add some handle for failure scenarios 3. add ICCP GR mechanism for warm-reboot

Signed-off-by: shine <shine.chen@nephosinc.com>

boralt · 2019-10-08T17:44:53Z

Which community SONIC release this feature is targeting? I didn't see it in October 2019 release planning schedule.

shine4chen · 2019-10-12T02:03:37Z

@boralt It will be listed in 201910 release soon.

1. Add ND sync-up description 2. Add command 'mclagdctl config loglevel -l <level>'

stephenxs

just post some questions i'm curious about.

stephenxs · 2019-12-21T13:53:54Z

doc/Sonic-mclag-hld.md

+- In the above diagram, PortChannel0001 and PortChannel0002 areis mclag MC-LAG enabled interfaces, status is up.
+- The data flow path is presented by the red line.
+- The data flow path from PA to CE1: When the traffic reach PEER1, it will match the direct route, such as 10.1.1.0/24, and forwarded through PortChannel0001.
+- The data flow path from CE1 to PA: CE1 may send the traffic to PEER1 or PEER2. PEER2 must has route entry that can reach PA. This route entry is installed by routing protocol.


how can CE1 connect to RG?
To use BGP isn't an option because it requires both switches in RG act as one switch in terms of BGP from the CE's perspective of view. But it's not true.
Is it static routing or directly connected routing or protocols like VRRP?

In general, there is no need to establish BGP neighbor relationship between CE and PE. If L3 forwarding is adopted, direct connection or static routing is OK.

stephenxs · 2019-12-21T14:22:41Z

doc/Sonic-mclag-hld.md

+- MCLAG domain consists of only two systems.
+- Each system only join one MC-LAG domain
+- Supports Known Unicast and BUM traffic
+- L3 interface on MLAG ports will have vMAC generated from VRRP algorithm using the same IP address assigned to the L3 LIF (logical interface)；(Not supported currently)


if VRRP runs in one interface, can it sync information (like vMAC here) for another interface?
for example, in diagram 6.2 VRRP is likely to run in peer link, but it's going to negotiate vMAC for other interfaces like portchannel0001

If VRRP is used, in diagram 6.2, VRRP needs to run on portchannel0001.

according to diagram 6.2 the portchannel0001 is a routed interface rather than a switching interface (which belongs to a vlan). in this case which interface can vrrp on peer1/2 communicate on?

Sorry, the expression is not accurate enough. We may use VRRP algorithm to generate VMAC instead of VRRP protocol.

stephenxs · 2019-12-22T01:29:27Z

doc/Sonic-mclag-hld.md

+### 7.1.4. ARP and ND sync-up between MC-LAG peers
+
+- If one peer learns an ARP entry, it will send the ARP entry to the other peer via ICCP. For example, PEER1 learns ARP entry of CE1 from PortChannel0001, it will send this ARP to PEER2 via ICCP. PEER2 receives this ARP entry, and install it into Linux kernel, the learned interface name is PortChannel0001. This requires the name of MC-LAG enabled PortChannel interface in both peer devices must be the same.
+- ICCP don’t flood ARP entry to peer periodically. To prevent the ARP entry from aging, ICCP uses Netlink socket to monitor ARP reply received by Linux kernel. For example, when an ARP entry in PEER2 is aged, the Linux kernel will send an ARP request via PortChannel0001. CE1 receives the ARP request, and send back one ARP reply. For CE1, PEER1 and PEER2 are viewed as the same device, the ARP reply may send to PEER2 or PEER1. If PEER2 receives the ARP reply, the ARP entry is learned again and information is updated in the kernel. At the same time, PEER2 will notify PEER1 via ICCP sync message. If PEER1 receives the ARP reply, since the ARP entry already exists in the kernel, kernel will use Netlink to send the ARP packet to its applications, ICCP will collect the ARP information from the ARP reply packet and send to PEER2, so PEER2 can update the ARP entry in the Linux kernel.


Is it possible that its ARP entry aged while the downstream traffic keeping sending if a CE sends traffic to PEER1 while receiving traffic from PEER2? in this case the downstream traffic will lost while arp aged.

Before ARP entry is aged, Linux kernel will send ARP request, and CE will respond to ARP reply after receiving it. No matter which PE receives the ARP reply, it will be synchronized to the other peer, so ARP entries will not be aged.

if the traffic is forwarded by ASIC rather than kernel protocol stack, from the kernel's perspective of view it doesn't receive any packet from the host and treats the arp entry as stale. in this case will the kernel still send arp request ahead of aging it?

Switches are forwarded by ASIC, and ARP entries in Linux kernel may age periodically. In SONIC, the default ARP aging time is 1800s. Even if no any packet is received from the host, the initialization state of ARP entry is reachable, and will send arp request ahead of aging it. If ARP reply is received, the state will change from stale to reachable.

stephenxs · 2019-12-22T02:38:01Z

doc/Sonic-mclag-hld.md

+
+- If one peer learns a MAC entry from a MC-LAG enabled PortChannel, it will send this MAC to other peer via ICCP. For example, PEER1 learns MAC entry of CE1 from PortChannel0001, it will send this MAC to PEER2 via ICCP. PEER2 receives this MAC, and installs the MAC into Linux kernel, the learned interface is also PortChannel0001. This means the name of MC-LAG enabled PortChannel interface in both peer devices must be the same.
+- If one peer learns a MAC entry from an orphan port, it will also send this MAC to other peer via ICCP. For example, PEER1 learns MAC entry of CE2 from Eth4, it will send this MAC to PEER2 via ICCP. PEER2 receives this MAC, and installs the MAC into Linux kernel, the learned interface is peer link interface PortChannel0002.
+- ICCP don't flood MAC entry to peer periodically. To prevent the MAC entry from aging, ICCP defines two flags for each MAC entry, MAC_AGE_LOCAL and MAC_AGE_PEER. MAC_AGE_LOCAL indicates the MAC entry in my device is aged, and MAC_AGE_PEER indicates the same MAC entry in peer device is aged. The MAC entry will be deleted from my FDB only when the two flags are both set for this MAC. For example, if the MAC of CE1 ages out in PEER2, the MAC entry will set MAC_AGE_LOCAL. If this MAC entry is not set MAC_AGE_PEER flag at the same time (because the MAC entry on PEER1 isn't aged, hence it doesn't tell PEER2 to set the flag), it will be installed back to the ASIC. Then PEER2 notifies the MAC age event to PEER1, PEER1 will set MAC_AGE_PEER for the same MAC.


The flow of handling a local aged mac entry is to reinstall it into the ASIC if it is not aged by PEER. will it cause traffic broadcast during the time between mac aged and reinstalled?
seems only MAC_AGE_LOCAL and MAC_AGE_PEER isn't enough. consider the following flow:

mac is aged in local device, MAC_AGE_LOCAL is set.

mac is hit locally, is there a way to remove MAC_AGE_LOCAL flag? since the mac has already been in the ASIC FDB, the ASIC won't notify the software when the MAC is hit.

mac is aged in the remote device, MAC_AGE_PEER set. if the MAC_AGE_LOCAL flag isn't be removed in the step 2, the mac will be aged, which is not correct.

Yes, in this scenario, Mac will be deleted, but it will be learned by ASIC immediately(millisecond level).

stephenxs · 2019-12-22T02:56:44Z

doc/Sonic-mclag-hld.md

+### 7.2.5. Peer link MAC learning
+
+- When the MC-LAG enabled interface is up, peer link is the backup link for data traffic. MAC learning must be disabled on peer link to prevent data traffic from forwarding. If the learning is enabled, the same MAC (e.g. MAC of CE1) may be learned via MC-LAG port or peer link, and the output port of this MAC will keep toggling.
+- When all local member links in an MC-LAG interface on one peer are down, MAC learning is also disabled in peer link, dynamic MAC entries will be installed to FDB pointing to peer link as the next hop, so traffic destined to those dynamic MAC entries will take the peer link path.


what if multiple mc-lags share the one peer-link? consider the following sequence:

portchannel0001 and portchannel0002 share the portchannel0003 as the peer link.

members of portchannel0001 and portchannel0002 on PEER1 are all down so on PEER1 the MAC entries will be reprogrammed with portchannel0003 as nexthop.

and then member(s) of portchannel0001 on PEER1 become up. should the MAC entries originally belonging to porthannel0001 be reprogrammed with portchannel0001 as nexthop on PEER1 while remaining the MAC entries originally belonging to portchannel0002 untouched? how to distinguish this two kind of MAC addresses?
or the MAC just be refreshed by regular mac learning mechanism?

FDB table entries include MAC address, VLAN and port. Through port, you can distinguish MAC learned by portchannel0001 and portchannel0002.

stephenxs · 2019-12-22T03:27:14Z

doc/Sonic-mclag-hld.md

+- In this scenario, peers may be directly connected, or use other tools such as BFD to detect the status of peer-link(Not supported currently).
+- If peer link and peer keepalive link is the same link, peer link down may cause peer connection down. In the case when keepalive connection is down, please see the above section. User should not design the network in this way.
+- When peer link is down, as shown above, all the MACs that point to the peer-link will be removed in both peers. Data forwarding for CE continues as usual. If ICCP connection uses this peer link interface, the action is the same as described in "peer connection down". If ICCP connection doesn’t use this peer link interface, this is not a split-brain scenario because the state can still be synchronized by keepalive link. If one MC-LAG enabled port is down, data traffic may get lost since the peer link as a backup path is down.
+


will it be an issue that PEER1 and PEER2 share the same ip address?

If receives a message with the same IP or MAC address as own, Linux kernel may print an warning message. In addition, no other problems were found.

merge master branch

Signed-off-by: shine.chen <shine.chen@mediatek.com>

shine added 5 commits January 22, 2019 18:07

add images of the mclag-hld document

a3ee33a

add the mclag-hld document

f39fd2c

update MCLAG_HLD_8.png

6577edb

update some image url on sonic-mclag-hld doc

13d0ffd

minor format update on sonic-mclag-hld doc

3ee460d

shine added 2 commits January 22, 2019 23:32

add the mclag-hld document

7e962a2

Merge branch 'mclag' of https://github.com/shine4chen/SONiC into mclag

c163dc4

shine4chen mentioned this pull request Mar 8, 2019

[mclag]:add mclagsyncd sonic-net/sonic-swss#811

Merged

shine added 3 commits May 6, 2019 01:40

add PR statement in chapter 9

04fd9d4

Signed-off-by: shine <shine.chen@nephosinc.com>

update mclag hld diagram

596d34e

Signed-off-by: shine.chen <shine.chen@nephosinc.com>

add L2 forwarding description and more use cases

2d6ed33

Signed-off-by: shine.chen <shine.chen@nephosinc.com>

shine4chen force-pushed the mclag branch from b30ef43 to 2d6ed33 Compare June 11, 2019 01:49

shine.chen and others added 13 commits June 11, 2019 20:40

refine warm-reboot description

19056d5

Signed-off-by: shine.chen <shine.chen@nephosinc.com>

update sonic-mclag-hld

fa5f05b

1. sync mac-address from orphan port 2. add some handle for failure scenarios 3. add ICCP GR mechanism for warm-reboot

update picuture file

6e99f57

Update and rename Sonic-mclag-hld-v0.6.md to Sonic-mclag-hld-v0.7.md

85cfffa

update diagram 3

5d7a955

Add files via upload

3f625f0

Update on community review

5e777a8

Signed-off-by: shine <shine.chen@nephosinc.com>

Update Sonic-mclag-hld.md

9294e50

Update Sonic-mclag-hld.md

11b49c5

Update Sonic-mclag-hld.md

7de799e

Update Sonic-mclag-hld.md

3e883a9

Update Sonic-mclag-hld.md

c2a5a92

Update Sonic-mclag-hld.md

9f6494b

jeffreyzfzeng added 3 commits July 23, 2019 13:43

Update Sonic-mclag-hld.md

b946c63

Update Sonic-mclag-hld.md

79b4b44

Update Sonic-mclag-hld.md

b83548e

This was referenced Jul 30, 2019

[teammgrd]during warm-reboot teamd need to recover system-id from saved lacp-pdu sonic-net/sonic-swss#1003

Merged

[aclorch]: add support for acl rule to match out port sonic-net/sonic-swss#810

Merged

minor description revision

e8db26d

shine4chen force-pushed the mclag branch from daa802f to e8db26d Compare August 2, 2019 03:46

add brief introduction of iccpd code

f478fe7

Signed-off-by: shine <shine.chen@nephosinc.com>

rlhui approved these changes Oct 31, 2019

View reviewed changes

adyeung approved these changes Oct 31, 2019

View reviewed changes

jianjundong added 2 commits November 26, 2019 21:51

Update Sonic-mclag-hld.md

21818ce

1. Add ND sync-up description 2. Add command 'mclagdctl config loglevel -l <level>'

Update Sonic-mclag-hld.md

5eea76d

stephenxs reviewed Dec 22, 2019

View reviewed changes

shine4chen and others added 2 commits February 3, 2020 10:04

Merge pull request #5 from Azure/master

a3d23d4

merge master branch

move mclag doc to mclag directory

12f82e2

Signed-off-by: shine.chen <shine.chen@mediatek.com>

rlhui merged commit 806c906 into sonic-net:master Feb 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mclag-HLD document #325

Mclag-HLD document #325

shine4chen commented Jan 23, 2019

shine4chen commented Jan 23, 2019

lguohan commented Feb 28, 2019

shine4chen commented Feb 28, 2019

shine4chen commented Mar 7, 2019

boralt commented Oct 8, 2019

shine4chen commented Oct 12, 2019

stephenxs left a comment

stephenxs Dec 21, 2019

jianjundong Dec 24, 2019

stephenxs Dec 21, 2019

jianjundong Dec 24, 2019

stephenxs Dec 25, 2019

jianjundong Dec 26, 2019

stephenxs Dec 22, 2019

jianjundong Dec 24, 2019

stephenxs Dec 25, 2019

jianjundong Dec 26, 2019

stephenxs Dec 22, 2019

jianjundong Dec 24, 2019

stephenxs Dec 22, 2019

jianjundong Dec 24, 2019

stephenxs Dec 22, 2019

jianjundong Dec 24, 2019

Mclag-HLD document #325

Mclag-HLD document #325

Conversation

shine4chen commented Jan 23, 2019

shine4chen commented Jan 23, 2019

lguohan commented Feb 28, 2019

shine4chen commented Feb 28, 2019

shine4chen commented Mar 7, 2019

boralt commented Oct 8, 2019

shine4chen commented Oct 12, 2019

stephenxs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment