-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mclag-HLD document #325
Mclag-HLD document #325
Conversation
Sorry to submit it long time from community review meeting. Please review it. Nephos plan to submit code soon to catch up with 201903 branch. @lguohan |
please describe why ebfilter is used? |
|
…ffic from peer-link to mclag enable LAG. No SAI changed is required for MCLAG support. 2. mclag docker can start on demand. 3. clarify that redundancy peer link can be used for peer-link broken scenario 4. add diagram number to improve reader experience 5. update document version to 0.4
@lguohan I have revised mclag HLD document per community review meeting opnion. Please help to review and approve it if appropriate. |
Signed-off-by: shine <shine.chen@nephosinc.com>
Signed-off-by: shine.chen <shine.chen@nephosinc.com>
Signed-off-by: shine.chen <shine.chen@nephosinc.com>
Signed-off-by: shine.chen <shine.chen@nephosinc.com>
1. sync mac-address from orphan port 2. add some handle for failure scenarios 3. add ICCP GR mechanism for warm-reboot
Signed-off-by: shine <shine.chen@nephosinc.com>
Signed-off-by: shine <shine.chen@nephosinc.com>
Which community SONIC release this feature is targeting? I didn't see it in October 2019 release planning schedule. |
@boralt It will be listed in 201910 release soon. |
1. Add ND sync-up description 2. Add command 'mclagdctl config loglevel -l <level>'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just post some questions i'm curious about.
doc/Sonic-mclag-hld.md
Outdated
- In the above diagram, PortChannel0001 and PortChannel0002 areis mclag MC-LAG enabled interfaces, status is up. | ||
- The data flow path is presented by the red line. | ||
- The data flow path from PA to CE1: When the traffic reach PEER1, it will match the direct route, such as 10.1.1.0/24, and forwarded through PortChannel0001. | ||
- The data flow path from CE1 to PA: CE1 may send the traffic to PEER1 or PEER2. PEER2 must has route entry that can reach PA. This route entry is installed by routing protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can CE1 connect to RG?
To use BGP isn't an option because it requires both switches in RG act as one switch in terms of BGP from the CE's perspective of view. But it's not true.
Is it static routing or directly connected routing or protocols like VRRP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, there is no need to establish BGP neighbor relationship between CE and PE. If L3 forwarding is adopted, direct connection or static routing is OK.
doc/Sonic-mclag-hld.md
Outdated
- MCLAG domain consists of only two systems. | ||
- Each system only join one MC-LAG domain | ||
- Supports Known Unicast and BUM traffic | ||
- L3 interface on MLAG ports will have vMAC generated from VRRP algorithm using the same IP address assigned to the L3 LIF (logical interface);(Not supported currently) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if VRRP runs in one interface, can it sync information (like vMAC here) for another interface?
for example, in diagram 6.2 VRRP is likely to run in peer link, but it's going to negotiate vMAC for other interfaces like portchannel0001
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If VRRP is used, in diagram 6.2, VRRP needs to run on portchannel0001.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to diagram 6.2 the portchannel0001 is a routed interface rather than a switching interface (which belongs to a vlan). in this case which interface can vrrp on peer1/2 communicate on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the expression is not accurate enough. We may use VRRP algorithm to generate VMAC instead of VRRP protocol.
doc/Sonic-mclag-hld.md
Outdated
### 7.1.4. ARP and ND sync-up between MC-LAG peers | ||
|
||
- If one peer learns an ARP entry, it will send the ARP entry to the other peer via ICCP. For example, PEER1 learns ARP entry of CE1 from PortChannel0001, it will send this ARP to PEER2 via ICCP. PEER2 receives this ARP entry, and install it into Linux kernel, the learned interface name is PortChannel0001. This requires the name of MC-LAG enabled PortChannel interface in both peer devices must be the same. | ||
- ICCP don’t flood ARP entry to peer periodically. To prevent the ARP entry from aging, ICCP uses Netlink socket to monitor ARP reply received by Linux kernel. For example, when an ARP entry in PEER2 is aged, the Linux kernel will send an ARP request via PortChannel0001. CE1 receives the ARP request, and send back one ARP reply. For CE1, PEER1 and PEER2 are viewed as the same device, the ARP reply may send to PEER2 or PEER1. If PEER2 receives the ARP reply, the ARP entry is learned again and information is updated in the kernel. At the same time, PEER2 will notify PEER1 via ICCP sync message. If PEER1 receives the ARP reply, since the ARP entry already exists in the kernel, kernel will use Netlink to send the ARP packet to its applications, ICCP will collect the ARP information from the ARP reply packet and send to PEER2, so PEER2 can update the ARP entry in the Linux kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that its ARP entry aged while the downstream traffic keeping sending if a CE sends traffic to PEER1 while receiving traffic from PEER2? in this case the downstream traffic will lost while arp aged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before ARP entry is aged, Linux kernel will send ARP request, and CE will respond to ARP reply after receiving it. No matter which PE receives the ARP reply, it will be synchronized to the other peer, so ARP entries will not be aged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the traffic is forwarded by ASIC rather than kernel protocol stack, from the kernel's perspective of view it doesn't receive any packet from the host and treats the arp entry as stale. in this case will the kernel still send arp request ahead of aging it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switches are forwarded by ASIC, and ARP entries in Linux kernel may age periodically. In SONIC, the default ARP aging time is 1800s. Even if no any packet is received from the host, the initialization state of ARP entry is reachable, and will send arp request ahead of aging it. If ARP reply is received, the state will change from stale to reachable.
doc/Sonic-mclag-hld.md
Outdated
|
||
- If one peer learns a MAC entry from a MC-LAG enabled PortChannel, it will send this MAC to other peer via ICCP. For example, PEER1 learns MAC entry of CE1 from PortChannel0001, it will send this MAC to PEER2 via ICCP. PEER2 receives this MAC, and installs the MAC into Linux kernel, the learned interface is also PortChannel0001. This means the name of MC-LAG enabled PortChannel interface in both peer devices must be the same. | ||
- If one peer learns a MAC entry from an orphan port, it will also send this MAC to other peer via ICCP. For example, PEER1 learns MAC entry of CE2 from Eth4, it will send this MAC to PEER2 via ICCP. PEER2 receives this MAC, and installs the MAC into Linux kernel, the learned interface is peer link interface PortChannel0002. | ||
- ICCP don't flood MAC entry to peer periodically. To prevent the MAC entry from aging, ICCP defines two flags for each MAC entry, MAC_AGE_LOCAL and MAC_AGE_PEER. MAC_AGE_LOCAL indicates the MAC entry in my device is aged, and MAC_AGE_PEER indicates the same MAC entry in peer device is aged. The MAC entry will be deleted from my FDB only when the two flags are both set for this MAC. For example, if the MAC of CE1 ages out in PEER2, the MAC entry will set MAC_AGE_LOCAL. If this MAC entry is not set MAC_AGE_PEER flag at the same time (because the MAC entry on PEER1 isn't aged, hence it doesn't tell PEER2 to set the flag), it will be installed back to the ASIC. Then PEER2 notifies the MAC age event to PEER1, PEER1 will set MAC_AGE_PEER for the same MAC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow of handling a local aged mac entry is to reinstall it into the ASIC if it is not aged by PEER. will it cause traffic broadcast during the time between mac aged and reinstalled?
seems only MAC_AGE_LOCAL and MAC_AGE_PEER isn't enough. consider the following flow:
- mac is aged in local device, MAC_AGE_LOCAL is set.
- mac is hit locally, is there a way to remove MAC_AGE_LOCAL flag? since the mac has already been in the ASIC FDB, the ASIC won't notify the software when the MAC is hit.
- mac is aged in the remote device, MAC_AGE_PEER set. if the MAC_AGE_LOCAL flag isn't be removed in the step 2, the mac will be aged, which is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in this scenario, Mac will be deleted, but it will be learned by ASIC immediately(millisecond level).
doc/Sonic-mclag-hld.md
Outdated
### 7.2.5. Peer link MAC learning | ||
|
||
- When the MC-LAG enabled interface is up, peer link is the backup link for data traffic. MAC learning must be disabled on peer link to prevent data traffic from forwarding. If the learning is enabled, the same MAC (e.g. MAC of CE1) may be learned via MC-LAG port or peer link, and the output port of this MAC will keep toggling. | ||
- When all local member links in an MC-LAG interface on one peer are down, MAC learning is also disabled in peer link, dynamic MAC entries will be installed to FDB pointing to peer link as the next hop, so traffic destined to those dynamic MAC entries will take the peer link path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if multiple mc-lags share the one peer-link? consider the following sequence:
- portchannel0001 and portchannel0002 share the portchannel0003 as the peer link.
- members of portchannel0001 and portchannel0002 on PEER1 are all down so on PEER1 the MAC entries will be reprogrammed with portchannel0003 as nexthop.
- and then member(s) of portchannel0001 on PEER1 become up. should the MAC entries originally belonging to porthannel0001 be reprogrammed with portchannel0001 as nexthop on PEER1 while remaining the MAC entries originally belonging to portchannel0002 untouched? how to distinguish this two kind of MAC addresses?
or the MAC just be refreshed by regular mac learning mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FDB table entries include MAC address, VLAN and port. Through port, you can distinguish MAC learned by portchannel0001 and portchannel0002.
doc/Sonic-mclag-hld.md
Outdated
- In this scenario, peers may be directly connected, or use other tools such as BFD to detect the status of peer-link(Not supported currently). | ||
- If peer link and peer keepalive link is the same link, peer link down may cause peer connection down. In the case when keepalive connection is down, please see the above section. User should not design the network in this way. | ||
- When peer link is down, as shown above, all the MACs that point to the peer-link will be removed in both peers. Data forwarding for CE continues as usual. If ICCP connection uses this peer link interface, the action is the same as described in "peer connection down". If ICCP connection doesn’t use this peer link interface, this is not a split-brain scenario because the state can still be synchronized by keepalive link. If one MC-LAG enabled port is down, data traffic may get lost since the peer link as a backup path is down. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be an issue that PEER1 and PEER2 share the same ip address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If receives a message with the same IP or MAC address as own, Linux kernel may print an warning message. In addition, no other problems were found.
merge master branch
Signed-off-by: shine.chen <shine.chen@mediatek.com>
add mclag hld document