Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mclag enhacements support code changes. #1331

Merged
merged 69 commits into from
Aug 4, 2021

Conversation

Praveen-Brcm
Copy link
Contributor

@Praveen-Brcm Praveen-Brcm commented Jun 24, 2020

How I verified it
Mclag specific VS tests included part of check-in.

Details:
The PR includes the code changes for the MCLAG enhancements support.
This PR must work with submitted PR's in other sonic repositories which are listed below.
> Iccpd: sonic-net/sonic-buildimage#4819
> swss-common: sonic-net/sonic-swss-common#405
> Mclagsyncd: #1349
> Utilities: sonic-net/sonic-utilities#1138
> Mgmt-FW: sonic-net/sonic-mgmt-framework#59
> Mgmt-Common: sonic-net/sonic-mgmt-common#25

@gitsabari gitsabari mentioned this pull request Jul 9, 2020
@chenkelly
Copy link

Hi @Praveen-Brcm
The unique IP enhancements mentioned in this document don't include in the PR.
Do you have any plan for the feature? Is it under development? Thanks very much.

@Praveen-Brcm
Copy link
Contributor Author

@chenkelly

Hi @Praveen-Brcm
The unique IP enhancements mentioned in this document don't include in the PR.
Do you have any plan for the feature? Is it under development? Thanks very much.

Thanks kelly for checking. Yes, we have the plan and the changes will be added soon.

@chenkelly
Copy link

chenkelly commented Jul 20, 2020

Hi @Praveen-Brcm
Thanks. Please help to confirm the issue.
When we check ICCPd modification in sonic-net/sonic-buildimage#4819, can see unique IP message handle and call "iccp_mclagsyncd_mclag_unique_ip_cfg_handler" in https://github.com/Praveen-Brcm/sonic-buildimage/blob/mclag_enhacements/src/iccpd/src/mlacp_link_handler.c#L3467, but message type and function don't be found.
`
else if (msg_hdr->type == MCLAG_SYNCD_MSG_TYPE_CFG_MCLAG_UNIQUE_IP)

    {

        iccp_mclagsyncd_mclag_unique_ip_cfg_handler(sys, &msg_buf[pos]);

    }`

@rathnasabapathyv
Copy link

Can you please resolve conflicts

@gechiang
Copy link
Contributor

@Praveen-Brcm Wonder if we can arrange some time for you to walk through the design/changes together?
Thanks!

@Praveen-Brcm
Copy link
Contributor Author

@Praveen-Brcm Wonder if we can arrange some time for you to walk through the design/changes together?
Thanks!

@gechiang : Sure we can have a call to go over the design(captured in HLD)/code changes. please suggest your convenient time this week .? Thanks.

@gechiang
Copy link
Contributor

@Praveen-Brcm Thanks! my calendar is open from 2 to 5 PM on all workdays except Tuesday. Let me know!
Also, You may want to first resolve the merge conflict and kick off the required tests that failed...

@rlhui
Copy link
Contributor

rlhui commented Nov 16, 2020

retest this please

@Praveen-Brcm
Copy link
Contributor Author

@Praveen-Brcm Thanks! my calendar is open from 2 to 5 PM on all workdays except Tuesday. Let me know!
Also, You may want to first resolve the merge conflict and kick off the required tests that failed...
@gechiang Thank You. will schedule at 3 PM on Wednesday 11/18/2020 .
Sure, I will be working on the conflicts resolution.

@rlhui
Copy link
Contributor

rlhui commented Nov 16, 2020

@Praveen-Brcm - please add vs test cases for mlag. See https://github.com/Azure/sonic-swss/tree/master/tests

@Praveen-Brcm
Copy link
Contributor Author

@Praveen-Brcm - please add vs test cases for mlag. See https://github.com/Azure/sonic-swss/tree/master/tests

@rlhui Thank you for the input.
Test files test_mclag.py and test_mclag_fdb.py are already part of PR.

@gechiang
Copy link
Contributor

@Praveen-Brcm Thanks! my calendar is open from 2 to 5 PM on all workdays except Tuesday. Let me know!
Also, You may want to first resolve the merge conflict and kick off the required tests that failed...
@gechiang Thank You. will schedule at 3 PM on Wednesday 11/18/2020 .
Sure, I will be working on the conflicts resolution.

@Praveen-Brcm 3PM 11/18/2020 works for me. Thanks!

{
observer->update(SUBJECT_TYPE_FDB_FLUSH_CHANGE, &flushUpdate);
}
notify(SUBJECT_TYPE_FDB_CHANGE, &flushUpdate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUBJECT_TYPE_FDB_FLUSH_CHANGE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks: Thanks for the comment. Corrected the type.

{
//If the MAC is dynamic_local change the origin accordingly
//MAC is added/updated as dynamic to allow aging.
storeFdbData.origin = FDB_ORIGIN_LEARN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when modify to FDB_ORIGIN_LEARN, removeFdbEntry() 's condition (fdbData.origin != origin) will always be true, remove remote entry will directly return, this is not expective, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks: Thanks for the comment. Once the MAC is converted to Local, Remote FDB will not be expected from ICCPD and function remoteFdbEntry will not get called. MAC is supposed to be aged out via the event SAI_FDB_EVENT_AGED. Hope this clarifies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but from my testing, removeFdbEntry will be called, and remote fdb entry will remove failed:

Jul 30 07:30:15.847477 sonic INFO swss#orchagent: :- removeFdbEntry: 1633: FdbOrch RemoveFDBEntry: mac=00:10:94:00:00:33 fdb origin is different; found_origin:1 delete_origin:8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks: Since the MAC is updated as local MAC, the origin is set to LOCAL. When the MAC moves from remove to local, ICCPD further updates the MCLAG_FDB_TABLE to delete the MAC entry added earlier. In the case of remove FDB called with different origin than existing origin is always ignored. This is TRUE for any MAC type L2, VxLAN, or MCLAG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so fdb entry syncd from remote peer will not be remoted immediately when remote peer do fdb flush action, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, when remote fdb added, and remote fdb deleted(remote peer flush fdb maybe), local peer will not remove the remote fdb entry by event but by age event, right?

remove

so fdb entry syncd from remote peer will not be remoted immediately when remote peer do fdb flush action, right?
@pettershao-ragilenetworks: Yes, If the ICCPD adds the MAC as local ( in certain scenarios like Iccpd session down..etc) the MAC entry is as good as Local HW learned. Remote peer flush will not cause delete , it can only be deleted by Local AGE or FLUSH events.

but the remote fdb is programed as STATIC, AGE or FLUSH will not remove them, right?


m_entries[entry] = storeFdbData;

string key = "Vlan" + to_string(vlan.m_vlan_info.vlan_id) + ":" + entry.mac.to_string();

if (fdbData.origin != FDB_ORIGIN_VXLAN_ADVERTIZED)
status = sai_fdb_api->create_fdb_entry(&fdb_entry, (uint32_t)attrs.size(), attrs.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set again since line 1362 may do the same action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks: Thanks for the comments. this duplicate piece looks got introduced while rebase/merging. Corrected.

@@ -2,12 +2,14 @@
#define SWSS_OBSERVER_H

#include <list>
#include <algorithm>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny :Thanks, Done.


using namespace std;
using namespace swss;

enum SubjectType
{
SUBJECT_TYPE_ALL_CHANGES,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used? If not, please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny :Thanks, the code used to use was part of optimization for the subject calss, which is no more part of the PR. Removed the attribute.

@@ -16,8 +18,14 @@ enum SubjectType
SUBJECT_TYPE_MIRROR_SESSION_CHANGE,
SUBJECT_TYPE_INT_SESSION_CHANGE,
SUBJECT_TYPE_PORT_CHANGE,
SUBJECT_TYPE_BRIDGE_PORT_CHANGE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is sending this update? Can you point me to code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny : This is been processed in isolationgrouporch.cpp:233. The update is sent from portsOrch part of addBridgePort and removeBridgePort calls. thanks.


m_entries[entry] = storeFdbData;

string key = "Vlan" + to_string(vlan.m_vlan_info.vlan_id) + ":" + entry.mac.to_string();

if (fdbData.origin != FDB_ORIGIN_VXLAN_ADVERTIZED)
if ((fdbData.origin != FDB_ORIGIN_MCLAG_ADVERTIZED) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think should be &&, right?
if ((fdbData.origin != FDB_ORIGIN_MCLAG_ADVERTIZED) &&
(fdbData.origin != FDB_ORIGIN_VXLAN_ADVERTIZED))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks : Thanks for pointing to the error. You're right and is corrected now.

SWSS_LOG_INFO("FdbOrch RemoveFDBEntry: mac=%s fdb origin is different; found_origin:%d delete_origin:%d",
entry.mac.to_string().c_str(), fdbData.origin, origin);
if ((origin == FDB_ORIGIN_MCLAG_ADVERTIZED) && (fdbData.origin == FDB_ORIGIN_LEARN) &&
(port.m_oper_status == SAI_PORT_OPER_STATUS_DOWN) && (gMlagOrch->isMlagInterface(port.m_alias)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question:
when remote fdb added, and remote fdb deleted(flush for example), local peer will not remove the remote fdb entry(sync early)(peerlink up), but by local age event, right?

show mac(sync, port already modify to peerlink)
  No.    Vlan  MacAddress         Port          Type
-----  ------  -----------------  ------------  ------
     1      10  00:10:94:00:00:33  PortChannel2  dynamic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks: If the MAC gets added by ICCPd as remote ( Sync MAC from peer MCLAG node) the entry will be programmed as STATIC ( while allowing MAC move) There is no AGE event expected, also as you rightly said the FLUSH will not let the HW remove the MAC entry. MAC gets removed only when ICCPd sends delete event for MAC entry.

{
//If the MAC is dynamic_local change the origin accordingly
//MAC is added/updated as dynamic to allow aging.
storeFdbData.origin = FDB_ORIGIN_LEARN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so fdb entry syncd from remote peer will not be remoted immediately when remote peer do fdb flush action, right?

@@ -300,6 +389,47 @@ void FdbOrch::update(sai_fdb_event_t type,
}
}

// If MAC is MCLAG remote do not delete for age event, Add the MAC back..
if (existing_entry->second.origin == FDB_ORIGIN_MCLAG_ADVERTIZED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can't be reach since addFdbEntry() will modify origin to FDB_ORIGIN_LEARN to store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pettershao-ragilenetworks : This is to address timing scenarios and platforms which doesnt support SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE. If the MAC is added by ICCPD then the origin is set as FDB_ORIGIN_MCLAG (not to age) If the FdbOrch receives an AGE event ( in timing scenarios or on the platform which doesnt have support for SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE) and MAC entry origin is FDB_ORIGIN_MCLAG, since the MAC is remotely leared it should not get deleted so FdbOrch re-adds the MAC entry.

@Praveen-Brcm
Copy link
Contributor Author

so fdb entry syncd from remote peer will not be remoted immediately when remote peer do fdb flush action, right?
@pettershao-ragilenetworks: Yes, If the ICCPD adds the MAC as local ( in certain scenarios like Iccpd session down..etc) the MAC entry is as good as Local HW learned. Remote peer flush will not cause delete , it can only be deleted by Local AGE or FLUSH events.

@Praveen-Brcm
Copy link
Contributor Author

@gechiang , @prsunny , @akokhan ,@qiluo-msft ,@pettershao-ragilenetworks : Can you please help approve the PR and merge. Thanks .

@gechiang gechiang requested a review from prsunny August 2, 2021 23:49
@adyeung
Copy link

adyeung commented Aug 3, 2021

@prsunny @akokhan @qiluo-msft @pettershao-ragilenetworks, please be reminded to approve the code PR today, this is blocking the other code PRs from sonic-buildimage

@gechiang gechiang merged commit 7aca82d into sonic-net:master Aug 4, 2021
@gechiang
Copy link
Contributor

gechiang commented Aug 4, 2021

@Praveen-Brcm Please submit a submodule PR for SWSS so that this change is included in submodule.

attrs.push_back(attr);

if ((fdbData.origin == FDB_ORIGIN_VXLAN_ADVERTIZED) && (fdbData.type == "dynamic"))
if ((fdbData.origin == FDB_ORIGIN_VXLAN_ADVERTIZED) || (fdbData.origin == FDB_ORIGIN_MCLAG_ADVERTIZED)
|| (fdbData.type == "dynamic"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THIS modification does not look RIGHT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan : Thanks for pointing out. Thats right as part of WB, when the MAC addresses are added via the bake function, the attribute SAI_FDB_ENTRY_ATTR_ALLOW_MAC_MOVE is set wrongly.
Will raise a PR and fix this today. Thanks.

judyjoseph pushed a commit that referenced this pull request Aug 20, 2021
* Mclag enhacements support code changes.
* Adding change to allow MCLAG remote MAC move.
* Added support for adding mclag remote mac to kernel, on top of PR-1276
* Updating the change from PR1276 and PR885.
* Adding new orchfiles to mock_tests
* MCLAG Unique IP support changes.
* Removed dependency with PR 885.
* Adding observer support for mlagorch.
* Fixed FDB notifiation issue
* Fixing the test_mclag_fdb type attributes.
* Remove as the change may not be supported on non-brcm for PortChannel settings.
* Removing the isolation group handling from Mlagorch, Isolation group now will be added/updated only via mclagsyncd updates.
* Added back the update function.

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
* Mclag enhacements support code changes.
* Adding change to allow MCLAG remote MAC move.
* Added support for adding mclag remote mac to kernel, on top of PR-1276
* Updating the change from PR1276 and PR885.
* Adding new orchfiles to mock_tests
* MCLAG Unique IP support changes.
* Removed dependency with PR 885.
* Adding observer support for mlagorch.
* Fixed FDB notifiation issue
* Fixing the test_mclag_fdb type attributes.
* Remove as the change may not be supported on non-brcm for PortChannel settings.
* Removing the isolation group handling from Mlagorch, Isolation group now will be added/updated only via mclagsyncd updates.
* Added back the update function.

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
@chenkelly
Copy link

chenkelly commented Mar 2, 2022

Hi @Praveen-Brcm
The mclagsyncd set "traffic_disable" in APP_LAG_TABLE_NAME
https://github.com/Azure/sonic-swss/blob/master/mclagsyncd/mclaglink.cpp#L1308

but portsorch don't handle this field in doLagTask
https://github.com/Azure/sonic-swss/blob/master/orchagent/portsorch.cpp#L3580
Should we support this traffic distribution mode for LAG?
Is it under development?
Thanks.

@gechiang
Copy link
Contributor

gechiang commented Mar 2, 2022

@chenkelly can you please open a new issue to track this? Please show a real example on how to configure it and produce the doLagTask issue as you described in that new issue creation and I will assign it to BRCM team to investigate/fix.
Thanks!

@chenkelly
Copy link

@gechiang
Please check #2167. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.