Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCLAG Enhancements HLD #596

Merged
merged 9 commits into from
Feb 24, 2021
Merged

MCLAG Enhancements HLD #596

merged 9 commits into from
Feb 24, 2021

Conversation

Praveen-Brcm
Copy link
Contributor

MCLAG Enhancements HLD captures the details of improvements made to existing SONiC MCLAG support.

This HLD captures details of MCLAG enhancements.
Removed unwanted text.
@msftclas
Copy link

msftclas commented Apr 14, 2020

CLA assistant check
All CLA requirements met.

@lguohan
Copy link
Contributor

lguohan commented Apr 29, 2020

Can you put in the same folder. https://github.com/Azure/SONiC/tree/master/doc/mclag

@lguohan lguohan added the mclag label Apr 29, 2020
@rck-innovium
Copy link
Collaborator

As discussed in the community meeting, please specify how a given platform can specify whether it uses isolation group or the existing ACL based separation.

## 3.7 SAI
### 3.7.1 Port Isolation
The following SAI definitions will be used and no enhancements necessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add details as to:

  1. how many isolation groups are created?
  2. what its members are? MC Lag member ports?
  3. what is the type of the isolation group members? bridge-port or physical port?
  4. where is the isolation group attached to? And is it at the bridge-port or physical port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Updated the HLD with details.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praveen,

I am not able to follow when members are added to the isolation group. The diagram in Sec 4.2 does not say what notification from ASIC_DB triggers "add members" in step-6.

Is the below understanding correct? Can you please update the HLD to explain this:
A single Isolation Group of type SAI_ISOLATION_GROUP_TYPE_BRIDGE_PORT is created internally by MclagSyncd.
This isolation group is bound to MC_LAG peer links.
All bridge-ports connecting to MHDs are added as members of this isolation group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rck-innovium : Thanks for the comments. While the section 4 captures only flows of the events and no description is added, Your understanding is correct and description is provided in sections 2.1.8, 3.2.2.1 and 3.4.1.1.
Steps 3-9(small green blocks) are part of the event 2 (big green block) processing itself. Event 2 carries the peer_link over which isolation group is bound, with the MCLAG PO members bridge-ports. Hope it clarifies.

### 1.1.4 Warm Boot Requirements
- MCLAG peer nodes should reconcile the local FDB table upon completion of warm boot, as the MAC learn and age updates from peer would be lost during the time ICCP control session is down due to warm boot.

### 1.1.5 Unique IP for supporting L3 protocol over MCLAG VLAN interface Requirements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this unique IP requirement? is there an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets assume MCLAG Interface is member of Vlan 5. In this case, IP and MAC on Vlan interface in both MCLAG nodes are same.

MCLAG Node 1 (Active) : Vlan5 : IP_1 -- MAC_A
MCLAG Node 2 (Standby) : Vlan5 : IP_1 -- MAC_A
MCLAG Client : Vlan5 : IP_3 -- MAC_C

With this we cannot associate L3 Protocols (BGP, BFD etc) with IP_1.

Unique IP requirement will allow L3 Protocols on Vlan5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.6.1.1 Local MAC upstream using STATE_DB FDB_TABLE
Please refer to PR 1259:MCLAG sync FDB MAC from STATE_DB #1259https://github.com/Azure/sonic-swss/pull/1259

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjundong Thanks for pointing to the PR 1259 . Will delete the relevant sections from Enhancements HLD.
Can you please share when can we expect the change to be merged.?

Copy link
Contributor

@jianjundong jianjundong Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveen-Brcm
In PR 1259, I commented it must work with PR 'Layer 2 Forwarding Enhancements #885'. In currently implementation, STATIC MAC configured by CLI and MAC MOVE are not supported, they are enhanced in PR 885. To dynamic MAC learnt by ASIC, the changes of PR 1259 are test OK. In theory, mclagsyncd is the consumer of STATE_DB FDB_TABLE, does not care the MAC type.
Merging can be performed automatically with 1 approving review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjundong Thanks for the information shared. Understand the dependency. We will also perform the review on the PR#1259 and provide feedback if any in next couple of days.


Upgrade/downgrade from/to the older version is not supported with new enhanced version.

# 9 Unit Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these swss vs tests, can you clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guohan: Yes, the cases related to configuration validation, Redis-db validations are based on VS tests. traffic related tests are based on spytests.

@lguohan
Copy link
Contributor

lguohan commented May 5, 2020

I do not see the yang model in the design doc, why not mention it?


https://docs.google.com/document/d/1exNQ1po7TYmVtctq59aSUeBsZYKctbHMkwStNVNgSrU/edit

# 6 Warm Boot Support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a peer is going through warmboot, how do we ensure that ICCP keepalives dont time out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICCPd sends a notification to peer MCLAG node that its going through warm-boot. Peer MCLAG node starts a timer for 90 seconds and wont time out the session. After 90 seconds expiry if the session is no up then session down is triggered.

The following SAI definitions will be used and no enhancements necessary

- https://github.com/opencomputeproject/SAI/blob/master/inc/saiisolationgroup.h

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add details about the SAI_FDB_ENTRY_TYPE_STATIC_MACMOVE. As of today, this enum does not exist in SAI.

https://github.com/opencomputeproject/SAI/blob/master/inc/saifdb.h#L47

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. SAI changes will be added as part of common SAI PR. will add details here once available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find the associated PR for SAI change: opencomputeproject/SAI#1024

Updated HLD to address community HLD review comments.
@jianjundong
Copy link
Contributor

In chapter '2.1.6 Aging disable for ICCP learned MAC addresses', the new implementation programs remote MAC addresses with aging disabled.

  1. Maybe not all vendors support STATIC_MACMOVE MAC attribute.
  2. When delete these remote MAC addresses? One method may be received MAC age event from peer. If MAC-A is learned by PE1, PE2 will install MAC-A with aging disabled. If MAC-A in PE1 is aged (like CE send traffic to PE2), and PE2 will delete this MAC address, the result is transient flooding for this MAC.
  3. Can STATIC MAC replace STATIC_MACMOVE MAC? In APP_DB, STATIC MAC is in FDB_TABLE and STATIC_MACMOVE MAC is in MCLAG_FDB_TABLE, this means switch DB TABLE is needed.
  4. If MAC-A is learnt by both PE1 and PE2 before peer connection established, which PE set STATIC_MACMOVE attribute after peer connection is established?
  5. In APP_DB, STATIC MAC and STATIC_MACMOVE MAC are stored in different TABLE. How to deal with this in STATE_DB?
    Chapter ‘3.6.1.2 Removal of FDB table in MclagSyncd’, MclagSyncd is the consumer of FDB_TABLE in STATE_DB .Fdborch is the consumer of several TABLES, such as MCLAG_FDB_TABLE in APP_DB, FDB_TABLE in APP_DB, and FDB notification by ASIC_DB. Fdborch will store all these MACs in STATE_DB, and thus the MAC add event will noify to MclagSyncd, even though this MAC is added by MclagSyncd. If MclagSyncd does not store FDB table, this unwanted event will send back to ICCPd. The same as delete event.

@jianjundong
Copy link
Contributor

If the peer connection is broken, how to deal with the STATIC_MACMOVE MAC? If delete it immediately, the result may be flooding for this MAC. If keep this MAC, when to age and delete it? Maybe one way is starting a timer for this MAC, if the timer is expired, the STATIC_MACMOVE MAC is deleted, this is no difference to aging method.

@jianjundong
Copy link
Contributor

jianjundong commented May 26, 2020

PR 621, #621, chapter '2.1.11 MCLAG system MAC support', By using MCLAG system MAC in LACP , there are no changes to MCLAG system MAC for MCLAG PortChannels during session down and MCLAG member delete on the Active node, which avoids flap of PortChannels.
If the peer connection is broken and enter split-brain scenario (Both peers are healthy but there is no more real time synchronization between the peer devices), the ARPs that learnt from peer may be aged. For example, PE1 learnt ARP-A from PE2, PE1 will send a ARP request to CE before ARP-A in PE1 is aged, and CE may send the ARP reply to PE2. Since the peer connection is broken, the result is ARP-A in PE1 will be aged, and the traffic from PE1 to CE will be blocked.

@jianjundong
Copy link
Contributor

PR 621, #621, chapter '2.1.11 MCLAG system MAC support', When the MCLAG ICCP session is detected as being down by the Standby node, it brings down all the links in its MCLAG port channels. This is to avoid loops and duplicates.
If the active node is down, and the Standby node brings down all the links in its MCLAG port channels, all the ways to CE are blocked.

@Praveen-Brcm
Copy link
Contributor Author

As discussed in the community meeting, please specify how a given platform can specify whether it uses isolation group or the existing ACL based separation.
[Praveen] HLD is updated with suggested details.

@Praveen-Brcm
Copy link
Contributor Author

I do not see the yang model in the design doc, why not mention it?

Thanks Guohan: The yang file link will updated in the HLD when the code PR is submitted.

@Praveen-Brcm
Copy link
Contributor Author

In chapter '2.1.6 Aging disable for ICCP learned MAC addresses', the new implementation programs remote MAC addresses with aging disabled.
@jianjundong please find the responses inline.

  1. Maybe not all vendors support STATIC_MACMOVE MAC attribute.
    [Praveen].fdbOrch to check for platform capability weather to use the STATIC_MACMOVE.
  2. When delete these remote MAC addresses? One method may be received MAC age event from peer. If MAC-A is learned by PE1, PE2 will install MAC-A with aging disabled. If MAC-A in PE1 is aged (like CE send traffic to PE2), and PE2 will delete this MAC address, the result is transient flooding for this MAC.
    [Praveen] Hashing does not change on CE node while sending the traffic and particular stream is expected to reach same PE node, If the hashing changes to due to PO member ports add/delete then the case mentioned here expected which is not a typical even compared to MAC being aged frequently in mac age time interval.
  3. Can STATIC MAC replace STATIC_MACMOVE MAC? In APP_DB, STATIC MAC is in FDB_TABLE and STATIC_MACMOVE MAC is in MCLAG_FDB_TABLE, this means switch DB TABLE is needed.
    [Praveen] Yes, local static MAC configuration overrides the Remote MAC, In this case ICCPD after processing the local static MAC , deletes the remote MAC from MCLAG_FDB_TABLE.
  4. If MAC-A is learnt by both PE1 and PE2 before peer connection established, which PE set STATIC_MACMOVE attribute after peer connection is established?
    [Praveen] Local MAC address always takes precedence, since both MCLAG nodes learned MAC locally no node will use STATIC_MACMOVE initially. Once the traffic hashing is stabilized from CE, one of the MCLAG node ages out local MAC from HW, on that node ICCPD re-programs mac as remote setting STATIC_MACMOVE flag.
  5. In APP_DB, STATIC MAC and STATIC_MACMOVE MAC are stored in different TABLE. How to deal with this in STATE_DB?
    [Praveen] As part of L2 Enhacements, STATE_DB is updated with all local MAC's configured and dynamically learned. Same MAC will be present in only STATE_DB FDB_TABLE(local mac) or MCLAG_FDB_TABLE(remote learned mac).
    Chapter ‘3.6.1.2 Removal of FDB table in MclagSyncd’, MclagSyncd is the consumer of FDB_TABLE in STATE_DB .Fdborch is the consumer of several TABLES, such as MCLAG_FDB_TABLE in APP_DB, FDB_TABLE in APP_DB, and FDB notification by ASIC_DB. Fdborch will store all these MACs in STATE_DB, and thus the MAC add event will noify to MclagSyncd, even though this MAC is added by MclagSyncd. If MclagSyncd does not store FDB table, this unwanted event will send back to ICCPd. The same as delete event.
    [Praveen] MAC added by ICCPD via mclagSyncd to MCLAG_FDB_TABLE will not be present in the STATE_DB FDB_TABLE as STATE_DB FDB_TABLE can only contain locally learned MAC address (configured/dynamic learn). Hence For the MAC addresses added by MclagSyncd to MCLAG_FDB_TABLE FdbOrch should not add the MAC to State_Db and MclagSyncd will not get add/delete events for such MAC addresses.

@Praveen-Brcm
Copy link
Contributor Author

@jianjundong please find the response inline.
If the peer connection is broken, how to deal with the STATIC_MACMOVE MAC? If delete it immediately, the result may be flooding for this MAC. If keep this MAC, when to age and delete it? Maybe one way is starting a timer for this MAC, if the timer is expired, the STATIC_MACMOVE MAC is deleted, this is no difference to aging method.
[Praveen] When the peer connection is down MAC addresses pointing to peer_link gets deleted immediately. MAC addresses pointing to MCLAG PO will be converted as local to enable Aging. ( Similar to the suggestion you made, in this case after converting MAC to local(aging enabled) MAC gets deleted if no traffic received after MAC age interval.

@Praveen-Brcm
Copy link
Contributor Author

@jianjundong : Please ignore this closed PR, it got updated wrongly. Its a work in progress not in scope of current PR #596 .

PR 621, #621, chapter '2.1.11 MCLAG system MAC support', By using MCLAG system MAC in LACP , there are no changes to MCLAG system MAC for MCLAG PortChannels during session down and MCLAG member delete on the Active node, which avoids flap of PortChannels.
If the peer connection is broken and enter split-brain scenario (Both peers are healthy but there is no more real time synchronization between the peer devices), the ARPs that learnt from peer may be aged. For example, PE1 learnt ARP-A from PE2, PE1 will send a ARP request to CE before ARP-A in PE1 is aged, and CE may send the ARP reply to PE2. Since the peer connection is broken, the result is ARP-A in PE1 will be aged, and the traffic from PE1 to CE will be blocked.

@Praveen-Brcm
Copy link
Contributor Author

@jianjundong : Please ignore this closed PR #621, it got updated wrongly. Its a work in progress not in scope of current PR #596 .

PR 621, #621, chapter '2.1.11 MCLAG system MAC support', When the MCLAG ICCP session is detected as being down by the Standby node, it brings down all the links in its MCLAG port channels. This is to avoid loops and duplicates.
If the active node is down, and the Standby node brings down all the links in its MCLAG port channels, all the ways to CE are blocked.

Review comments incorporated
@jianjundong
Copy link
Contributor

jianjundong commented Jun 1, 2020

@Praveen-Brcm
Can STATIC MAC replace STATIC_MACMOVE MAC? In APP_DB, STATIC MAC is in FDB_TABLE and STATIC_MACMOVE MAC is in MCLAG_FDB_TABLE, this means switch DB TABLE is needed.
[Praveen] Yes, local static MAC configuration overrides the Remote MAC, In this case ICCPD after processing the local static MAC , deletes the remote MAC from MCLAG_FDB_TABLE.
[Jianjun] If static MAC is configured, it will overides the STATIC_MACMOVE MAC in ASIC_DB. If then ICCP deletes the remote MAC from MCLAG_FDB_TABLE, maybe the static MAC in ASIC_DB will be deleted. The reason is that MAC type is not the key of struct FdbEntry currently.

@Praveen-Brcm
Copy link
Contributor Author

@jianjundong Please find the response inline at bottom.

@Praveen-Brcm
Can STATIC MAC replace STATIC_MACMOVE MAC? In APP_DB, STATIC MAC is in FDB_TABLE and STATIC_MACMOVE MAC is in MCLAG_FDB_TABLE, this means switch DB TABLE is needed.
[Praveen] Yes, local static MAC configuration overrides the Remote MAC, In this case ICCPD after processing the local static MAC , deletes the remote MAC from MCLAG_FDB_TABLE.
[Jianjun] If static MAC is configured, it will overides the STATIC_MACMOVE MAC in ASIC_DB. If then ICCP deletes the remote MAC from MCLAG_FDB_TABLE, maybe the static MAC in ASIC_DB will be deleted. The reason is that MAC type is not the key of struct FdbEntry currently.
[Praveen] As part of the enhancements FdbOrch now has added additional intelligence to track the origin(source) of MAC learn. When Iccpd deletes the remote MAC address from MCLAG_FDB_TABLE, FdbOrch while processing ignores the delete request since the MAC address is locally provisioned now and no delete to ASIC_DB FDB_TABLE.

In current ICCPd implementation MAC entries are stored in linked lists which can be costly in MAC scale scenarios for lookups. The existing linked list structures will be modified to binary trees for better add, delete and lookup operations. The MAC address entry is stored as 32 byte string in the messaging and local cache, the address will be converted to 6 bytes to optimize the space messages size while syncing the MAC addresses between MCLAG peer nodes.

#### 3.3.3.3 MAC sync optimizations
When MCLAG node learns a new MAC address via ICCP from peer MCLAG node ( MAC learned on remote orphan port ), MAC address is added locally and an age notification is sent back to peer MCLAG node. These age notifications are not required to be sent, additional checks are added to stop sending such unwanted MAC updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the age notification is sent back to the MCLAG node? The main MCLAG HLD doesn't mention this: https://github.com/Azure/SONiC/blob/master/doc/mclag/Sonic-mclag-hld.md#724-mac-sync-up-between-mc-lag-peers If something is not covered in the original HLD could you please add more detailed explanation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsenchyshyn: Thanks for the comments provided. This is implementation specific detail not captured in in original HLD while the original intention of having was not clear. The enhancements captures as the behavior is been modified and do not see the extra notification which is verified. Hope it clarifies. Thanks

- In the current implementation, MAC addresses learned in the control plane from the MCLAG peer are installed as dynamic entries. Therefore, if no local traffic from these MAC addresses is seen for the ageing period then they will age out, causing a notification back to ICCPd. However if the mac is learned from peer then it will immediately re-program MAC . The result is transient flooding for the MAC and some unnecessary control plane overhead.
- When the local MCLAG node learns the MAC addresses from peer MCLAG node the type of the MAC set as dynamic. For the remote MAC addresses learned via ICCP if no traffic received on local MCLAG interface MAC addresses get age out from HW. ICCPd process the age notifications as the MAC address is peer learned re-installs MAC address back to HW. Transient traffic flooding can occur during remote MAC re-installation.
- The process of remote MAC aging and re-installation is repetitive causing un-necessary messaging between modules and processing.
- To suppress the unwanted MAC age events, the new implementation programs remote MAC addresses with aging disabled. For the MAC address learned from ICCPd, FdbOrch to set new SAI attribute SAI_FDB_ENTRY_TYPE_STATIC_MACMOVE while programming, which causes SAI to not age out the MAC, but allows it to move.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE or new entry into sai_fdb_entry_type_t is expected to be added?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comments below, looks like SAI_FDB_ENTRY_TYPE_STATIC_MACMOVE should be renamed to SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE here as well as in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akokhan, that's correct right filed is SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE i have updated the HLD.

6. ICCPd sets new static MAC flag in FDB TLV for local static MAC addresses advertisement.
7. MclagSyncd updates ICCP session and MCLAG remote interface state information to STATE_DB FDB_TABLE and MCLAG_REMOTE_INTF_TABLE
8. MclagSyncd updates MAC addresses learned from peer MCLAG node to new MCLAG FDB table .
9. FdbOrch registers for new MCLAG FDB table updates to process MAC updates from peer MCLAG node, ISOGRP Orch process new updates from MclagSyncd.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per flows numbering in the diagram, looks like the flow 9 covers ISOGRP update through STATE DB only. A separate flow should start from MCLAG FDB (APP DB) to FdbOrch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akokhan: The diagram is overall highlevel design the detailed flows are covered in section 4. ISOGRP Orch specific processing is further covered in the flow diagram 4.2.


```
;New MCLAG UniqueIP Table
key = MCLAG_UNIQUEIP_TABLE|ifname ; Only VLAN interface supported currently
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akokhan , corrected the typo.

@rck-innovium
Copy link
Collaborator

+1

Addressed latest review comments
@rlhui rlhui merged commit 857c05b into sonic-net:master Feb 24, 2021
rlhui pushed a commit to sonic-net/sonic-mgmt-framework that referenced this pull request May 13, 2021
gechiang pushed a commit to sonic-net/sonic-swss that referenced this pull request Jun 1, 2021
* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* updated mclag port isolate platform check function

* MCLAG Unique IP Changes.

* updated mclagsyncd

* resolved compilation issues with master branch

* updated mclagsyncd merge issue

* addressed review comments

* addressed review comments

* fixed build issues with armhf platform

* fixed build issues with armhf platform

* fixed build issue

* fixed build issue

* addressed review comments

* addressed review comments

* removed unused code

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
gitsabari added a commit to gitsabari/sonic-utilities that referenced this pull request Jun 15, 2021
gechiang pushed a commit to sonic-net/sonic-utilities that referenced this pull request Jul 16, 2021
* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* addressed LGTM alert

* UT Fix unique IP configuration

* modified ip address validate function for mclag config verication

* Add soft-reboot reboot type (#1453)

What I did
Add a new reboot named as soft-reboot which can be performed by "kexec -e"

How I did it
Replace the platform reboot with "kexec -e" for the cold reboot case.

How to verify it
Verified the reboot on DUT and check the reboot-cause

* [warm-reboot] Check if warm restart flag is set when issuing a warm-reboot (#1460)

Check if any warm restart flag is set when issuing a warm-reboot. This check avoids starting a warm reboot while another warm restart is in progress. In the scenario where a warm reboot is issued with another warm restart in progress, the warm restart flag may be reset and part of the components have a risk of doing cold reboot.

* Added mclag config commands

* removed unwanted imports

* added mclag tests

* fixed build issue

* corrected mclag test

* corrected mclag test

* corrected mclag test case

* updated testcase for mclag

* updated mclag config

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* fixed alert

* updated mclag test cases

* updated mclag test cases

* updated mclag config

* modified mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test case

* updated mclag config to use swsscommon instead of swssdk

* updated mclag config to use swsscommon

* updated mclag config script file

* fixed mclag test cases to verify config db

* updated mclag test case with config db verify function

* fixed build issue

* updated test case

* updated mclag test case

* addressed review comments

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
Co-authored-by: Tapash Das <48195098+tapashdas@users.noreply.github.com>
Co-authored-by: Sujin Kang <sujkang@microsoft.com>
Co-authored-by: Shi Su <67605788+shi-su@users.noreply.github.com>
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jul 17, 2021
List of commits (newest first):

sonic-net/sonic-utilities@0efd297 (origin/master, origin/HEAD) mclag enhancements as per HLD at sonic-net/SONiC#596 (#1138)
sonic-net/sonic-utilities@e98bbb6 Reworked IP validation in "config interface ip add/remove" command (#1709)
sonic-net/sonic-utilities@866d1d7 [minigraph][port_config] Consume port_config.json while reloading minigraph (#1705)
sonic-net/sonic-utilities@9ae6f6b [debug dump util] Match Infrastructure (#1666)
sonic-net/sonic-utilities@8fe7e26 Coverage uses top level directory as source (#1711)
sonic-net/sonic-utilities@3f0b690 [MPLS][CLI] added config/show CLI for MPLS interface, MPLS CRM threshold config, updated CLI reference manual
sonic-net/sonic-utilities@e8b6c5c [ci] Fix python coverage color bar (#1692)
sonic-net/sonic-utilities@888701b [Mellanox] Remove mstdump from Mellanoxs collect dump script (#1706)
sonic-net/sonic-utilities@4818360 [sonic-package-manager] support warm/fast reboot for extension packages (#1554)
sonic-net/sonic-utilities@793b847 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' (#1679)
sonic-net/sonic-utilities@24fe1ac [show][config] support for interface alias for muxcable commands (#1699)
sonic-net/sonic-utilities@186d851 Pcieutil to load the platform api first instead of using common api (#1672)
sonic-net/sonic-utilities@7a82c06 [Mellanox] Update mellanox dump generation to include SDK dumps (#1640)
sonic-net/sonic-utilities@38f8c06 [sfputil] Expose error status fetched from STATE_DB or platform API to CLI (#1658)
sonic-net/sonic-utilities@c5d00ae [pfcwd] Fix the return code in invalid case (#1691)
sonic-net/sonic-utilities@57dc403 [ci]: Fix config prompt question issue (#1693)
sonic-net/sonic-utilities@5708497 [show] fix show version (#1686)
sonic-net/sonic-utilities@9041ba0 [config] Adding sanity checks for config reload (#1664)
sonic-net/sonic-utilities@2cdadb5 [config]: Create portchannel with LACP key (#1473)
sonic-net/sonic-utilities@6f74ba5 [vnet_route_check] Fix logic for getting VNET routes from ASIC DB (#1653)
sonic-net/sonic-utilities@54fee0f Add range check on portchannel min-links (#1630)
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
List of commits (newest first):

sonic-net/sonic-utilities@0efd297 (origin/master, origin/HEAD) mclag enhancements as per HLD at sonic-net/SONiC#596 (sonic-net#1138)
sonic-net/sonic-utilities@e98bbb6 Reworked IP validation in "config interface ip add/remove" command (sonic-net#1709)
sonic-net/sonic-utilities@866d1d7 [minigraph][port_config] Consume port_config.json while reloading minigraph (sonic-net#1705)
sonic-net/sonic-utilities@9ae6f6b [debug dump util] Match Infrastructure (sonic-net#1666)
sonic-net/sonic-utilities@8fe7e26 Coverage uses top level directory as source (sonic-net#1711)
sonic-net/sonic-utilities@3f0b690 [MPLS][CLI] added config/show CLI for MPLS interface, MPLS CRM threshold config, updated CLI reference manual
sonic-net/sonic-utilities@e8b6c5c [ci] Fix python coverage color bar (sonic-net#1692)
sonic-net/sonic-utilities@888701b [Mellanox] Remove mstdump from Mellanoxs collect dump script (sonic-net#1706)
sonic-net/sonic-utilities@4818360 [sonic-package-manager] support warm/fast reboot for extension packages (sonic-net#1554)
sonic-net/sonic-utilities@793b847 [show priority-group drop counters] Remove backup with cached PG drop counters after 'config reload' (sonic-net#1679)
sonic-net/sonic-utilities@24fe1ac [show][config] support for interface alias for muxcable commands (sonic-net#1699)
sonic-net/sonic-utilities@186d851 Pcieutil to load the platform api first instead of using common api (sonic-net#1672)
sonic-net/sonic-utilities@7a82c06 [Mellanox] Update mellanox dump generation to include SDK dumps (sonic-net#1640)
sonic-net/sonic-utilities@38f8c06 [sfputil] Expose error status fetched from STATE_DB or platform API to CLI (sonic-net#1658)
sonic-net/sonic-utilities@c5d00ae [pfcwd] Fix the return code in invalid case (sonic-net#1691)
sonic-net/sonic-utilities@57dc403 [ci]: Fix config prompt question issue (sonic-net#1693)
sonic-net/sonic-utilities@5708497 [show] fix show version (sonic-net#1686)
sonic-net/sonic-utilities@9041ba0 [config] Adding sanity checks for config reload (sonic-net#1664)
sonic-net/sonic-utilities@2cdadb5 [config]: Create portchannel with LACP key (sonic-net#1473)
sonic-net/sonic-utilities@6f74ba5 [vnet_route_check] Fix logic for getting VNET routes from ASIC DB (sonic-net#1653)
sonic-net/sonic-utilities@54fee0f Add range check on portchannel min-links (sonic-net#1630)
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-utilities that referenced this pull request Aug 10, 2021
* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* addressed LGTM alert

* UT Fix unique IP configuration

* modified ip address validate function for mclag config verication

* Add soft-reboot reboot type (sonic-net#1453)

What I did
Add a new reboot named as soft-reboot which can be performed by "kexec -e"

How I did it
Replace the platform reboot with "kexec -e" for the cold reboot case.

How to verify it
Verified the reboot on DUT and check the reboot-cause

* [warm-reboot] Check if warm restart flag is set when issuing a warm-reboot (sonic-net#1460)

Check if any warm restart flag is set when issuing a warm-reboot. This check avoids starting a warm reboot while another warm restart is in progress. In the scenario where a warm reboot is issued with another warm restart in progress, the warm restart flag may be reset and part of the components have a risk of doing cold reboot.

* Added mclag config commands

* removed unwanted imports

* added mclag tests

* fixed build issue

* corrected mclag test

* corrected mclag test

* corrected mclag test case

* updated testcase for mclag

* updated mclag config

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* fixed alert

* updated mclag test cases

* updated mclag test cases

* updated mclag config

* modified mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test case

* updated mclag config to use swsscommon instead of swssdk

* updated mclag config to use swsscommon

* updated mclag config script file

* fixed mclag test cases to verify config db

* updated mclag test case with config db verify function

* fixed build issue

* updated test case

* updated mclag test case

* addressed review comments

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
Co-authored-by: Tapash Das <48195098+tapashdas@users.noreply.github.com>
Co-authored-by: Sujin Kang <sujkang@microsoft.com>
Co-authored-by: Shi Su <67605788+shi-su@users.noreply.github.com>
judyjoseph pushed a commit to sonic-net/sonic-utilities that referenced this pull request Aug 20, 2021
* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* addressed LGTM alert

* UT Fix unique IP configuration

* modified ip address validate function for mclag config verication

* Add soft-reboot reboot type (#1453)

What I did
Add a new reboot named as soft-reboot which can be performed by "kexec -e"

How I did it
Replace the platform reboot with "kexec -e" for the cold reboot case.

How to verify it
Verified the reboot on DUT and check the reboot-cause

* [warm-reboot] Check if warm restart flag is set when issuing a warm-reboot (#1460)

Check if any warm restart flag is set when issuing a warm-reboot. This check avoids starting a warm reboot while another warm restart is in progress. In the scenario where a warm reboot is issued with another warm restart in progress, the warm restart flag may be reset and part of the components have a risk of doing cold reboot.

* Added mclag config commands

* removed unwanted imports

* added mclag tests

* fixed build issue

* corrected mclag test

* corrected mclag test

* corrected mclag test case

* updated testcase for mclag

* updated mclag config

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* fixed alert

* updated mclag test cases

* updated mclag test cases

* updated mclag config

* modified mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test case

* updated mclag config to use swsscommon instead of swssdk

* updated mclag config to use swsscommon

* updated mclag config script file

* fixed mclag test cases to verify config db

* updated mclag test case with config db verify function

* fixed build issue

* updated test case

* updated mclag test case

* addressed review comments

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
Co-authored-by: Tapash Das <48195098+tapashdas@users.noreply.github.com>
Co-authored-by: Sujin Kang <sujkang@microsoft.com>
Co-authored-by: Shi Su <67605788+shi-su@users.noreply.github.com>
judyjoseph added a commit to sonic-net/sonic-buildimage that referenced this pull request Aug 20, 2021
sonic-swss

e892dda Fix warmboot issue PR##8367 (#1866)
9c6023d Mclag enhacements support code changes. (#1331)

sonic-utilities

5465ea0 [MPLS][CLI] added config/show CLI for MPLS interface, MPLS CRM threshold config, updated CLI reference manual
3bac779  mclag enhancements as per HLD at sonic-net/SONiC#596 (#1138)
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* updated mclag port isolate platform check function

* MCLAG Unique IP Changes.

* updated mclagsyncd

* resolved compilation issues with master branch

* updated mclagsyncd merge issue

* addressed review comments

* addressed review comments

* fixed build issues with armhf platform

* fixed build issues with armhf platform

* fixed build issue

* fixed build issue

* addressed review comments

* addressed review comments

* removed unused code

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 27, 2022
#### How I did it
Added mclag sonic yang file for the MCLAG enhancements  as per HLD: sonic-net/SONiC#596

#### How to verify it
try rest APIs

#### Description for the changelog
Added mclag sonic yang
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
* mclagsyncd enhancements as per HLD at sonic-net/SONiC#596

* addressed LGTM alert

* UT Fix unique IP configuration

* modified ip address validate function for mclag config verication

* Add soft-reboot reboot type (#1453)

What I did
Add a new reboot named as soft-reboot which can be performed by "kexec -e"

How I did it
Replace the platform reboot with "kexec -e" for the cold reboot case.

How to verify it
Verified the reboot on DUT and check the reboot-cause

* [warm-reboot] Check if warm restart flag is set when issuing a warm-reboot (#1460)

Check if any warm restart flag is set when issuing a warm-reboot. This check avoids starting a warm reboot while another warm restart is in progress. In the scenario where a warm reboot is issued with another warm restart in progress, the warm restart flag may be reset and part of the components have a risk of doing cold reboot.

* Added mclag config commands

* removed unwanted imports

* added mclag tests

* fixed build issue

* corrected mclag test

* corrected mclag test

* corrected mclag test case

* updated testcase for mclag

* updated mclag config

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* fixed alert

* updated mclag test cases

* updated mclag test cases

* updated mclag config

* modified mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test cases

* updated mclag test case

* updated mclag test cases

* updated mclag test case

* updated mclag config to use swsscommon instead of swssdk

* updated mclag config to use swsscommon

* updated mclag config script file

* fixed mclag test cases to verify config db

* updated mclag test case with config db verify function

* fixed build issue

* updated test case

* updated mclag test case

* addressed review comments

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
Co-authored-by: Tapash Das <48195098+tapashdas@users.noreply.github.com>
Co-authored-by: Sujin Kang <sujkang@microsoft.com>
Co-authored-by: Shi Su <67605788+shi-su@users.noreply.github.com>
@Minkang-Tsai
Copy link

Minkang-Tsai commented Dec 22, 2023

3.4.1.3 FdbOrch Changes

When a static MAC address is configured locally and same MAC address is learned as advertised, FdbOrch to discard moving the MAC address. When static MAC address is deleted FdbOrch to re-program the advertised MAC address to HW.

If the ICCP learned MAC is static, then any dynamic MAC move will be discarded by FdbOrch.

@Praveen-Brcm
Hi
The above mentioned in the chapter 3.4.1.3, Does it means that Fdborch will not process any of FDB events if MAC origin is MCLAG?

Assume that MCLAG domain has two member PortChannel01 and PortChannel02. MAC-A will be learned on PortChannel01 of peer1.
MAC-A will be learned on PortChannel02 of peer2.
For peer2, Fdborch will get two events which one is from ICCPD(evnet1: add static MAC-A on PortChannel01) and another is from chip(evnet2: learn MAC-A from PortChannel02) when MCLAG session establish.

There is a event order to cause MAC-A is not consistent between ICCPD and chip.
Step1. Fdborch get event1 and call SAI API to create an entry. At the same time, sairedis is processing event2.

                MAC             Event
ICCPD      PortChannel01        
Fdborch    PortChannel01        event1 <----- it will be failed, because entry already exsit in sairedis.
Sairedis   PortChannel02        event2
Chip       PortChannel02        

Step2. Fdborch get event2.

                MAC             Event
ICCPD      PortChannel01        
Fdborch    PortChannel01        event2 <----- it will be failed, because entry already exist in Fdborch and port is different.
Sairedis   PortChannel02        
Chip       PortChannel02 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.