Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECMP acceleration for physical i/f down events #351

Merged
merged 6 commits into from
Mar 22, 2018

Conversation

nikos-github
Copy link
Contributor

These changes introduce support for ecmp acceleration for physical i/f down events. The design document was reviewed a couple of months back and checked into the repo.

The functionality has been verified through manual testing in a physical topology having 2 T0 switches (asw01, asw02) connected 4-way to 4 T1 switches (csw01, csw02, csw03, csw04). IXIA was connected in the topology to both T0 switches. Prefixes were advertised to asw02 and traffic was towards asw01. The ecmp acceleration was tested by bring down the links in different combinations on the T1 devices and checking the membership through logs on asw01 as well as having IXIA measure the traffic loss duration.

Nikos.-

@msftclas
Copy link

msftclas commented Oct 13, 2017

CLA assistant check
All CLA requirements met.

{
SWSS_LOG_ENTER();

for (auto nhop = m_syncdNextHops.begin(); nhop != m_syncdNextHops.end(); ++nhop) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, for any interface change, this will scan the entire nexthop cache and check for interface match. Does this have any performance issues in case of large nexthop set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big do you expect the nexthop set to be? Typical data center won't have more than 100.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Since it is handled in the same context as link down event, my concern was if it could result in some delays/side effects. The other option would have been acting on the link down notification from kernel. Closing this as the assumption is, size of nexthop set would be comparatively smaller.

FdbOrch *gFdbOrch;
NeighOrch *neigh_orch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like global variables follow name prefixed with "g". Do you want to follow this since it is now moved to global space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can change that.

}
SWSS_LOG_NOTICE("Set operation status %s to host interface %s",
up ? "UP" : "DOWN", it->second.m_alias.c_str());
if (neigh_orch->ifChangeInformNextHop(it->second.m_alias, up) == false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just would like to mention, having the nexthop operation in the same context where we handle link up/down events, may create some delays in subsequent link up/down operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is per design document. Please refer to it.

}

next_hop_id = next_hop_group_entry.nhopgroup_members[nhop];
status = sai_next_hop_group_api->remove_next_hop_group_member(next_hop_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, adding a nexthop_group_member to SAI above and then check for IFDOWN to remove the group_member from here seems inefficient. I think you could check for this in the first for-loop ("for (auto it : next_hop_set)" and create the next_hop_ids vector accordingly. What do you think?

Copy link
Contributor Author

@nikos-github nikos-github Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The orch agent needs to always have a copy of the nexthop since the owner is a routing protocol. Please refer to the design doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next_hop_id which was created then deleted, but saved here is only used again when the removeNextHopGroup(). I think what @prsunny suggested is valid. orch agent needs to have a copy of nexthop, but not the obsoleted next_hop_id.

@nikos-github did I miss anything?

for (auto nhop = next_hop_group_entry.nhopgroup_members.begin();
nhop != next_hop_group_entry.nhopgroup_members.end(); ++nhop) {

if (m_neighOrch->isNextHopFlagSet(nhop->first, NHFLAGS_IFDOWN)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check flags here. I think the request is to delete the nexthop group and irrespective of whether member is "down" or "up", it should be deleted. Also, if not deleted here, when will this be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the flag is set, the member has already been removed. Calling the SAI API will cause a crash with the current sonic design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

m_syncdNextHops[ipAddress] = next_hop_entry;

m_intfsOrch->increaseRouterIntfsRefCount(alias);

return true;
}

bool
NeighOrch::setNextHopFlag (const IpAddress &ipaddr, const uint32_t nh_flag)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style doesn't meet SONiC.

Copy link
Contributor Author

@nikos-github nikos-github Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SONiC has an established code style? Can you provide a pointer to it please where current sonic code style requirements are laid out/specified?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SONiC has established code style. However it is not described in a document as in many other opensource projects. If you take a look to .*h, *.cpp all they look similar. Code that provided here looks completely different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@nikos-github nikos-github Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The code here is more readable and makes it easier to identify when any tool is used in the code base (even grep), where the function definition is. If you do insist against those improvements (which should be sonic-wide), I guess I will have to change it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objections to change code style. But it should be done in all files at once and not in couple places of modified files. In other case we will have a mess. I also think that new code style should be discussed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. You should apply the same diligence to code coming from your team.

@oleksandrivantsiv oleksandrivantsiv dismissed their stale review November 2, 2017 18:55

Next week I'm OOO and won't be able to review.

@lguohan
Copy link
Contributor

lguohan commented Nov 9, 2017

did we have test cases review this feature?

@nikos-github
Copy link
Contributor Author

nikos-github commented Nov 9, 2017

We had agreed to let the feature in and test cases plus review will follow. Why are we coming back a month later asking for something we had agreed on a course of action already and blocking the commit? Why don't you outline what your expectations are in order for the code to go in and whether with testing you are referring to automation testing of this feature to MSFT test framework or whether you are referring to feature testing and test cases that I performed?

{
SWSS_LOG_ENTER();

for (auto nhop = m_syncdNextHops.begin(); nhop != m_syncdNextHops.end(); ++nhop) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Since it is handled in the same context as link down event, my concern was if it could result in some delays/side effects. The other option would have been acting on the link down notification from kernel. Closing this as the assumption is, size of nexthop set would be comparatively smaller.

for (auto nhop = next_hop_group_entry.nhopgroup_members.begin();
nhop != next_hop_group_entry.nhopgroup_members.end(); ++nhop) {

if (m_neighOrch->isNextHopFlagSet(nhop->first, NHFLAGS_IFDOWN)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@lguohan
Copy link
Contributor

lguohan commented Nov 30, 2017

I do not see the test cases defined in the design document. https://github.com/Azure/SONiC/blob/gh-pages/doc/sonic-ecmp-acceleration.docx

This is mostly control plane changes, I think integrating into the swss integration test should be ok, but we need to have the test cases designed and reviewed before merge.

Copy link
Contributor

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments

@nikos-github
Copy link
Contributor Author

@lguohan If you are referring to UT test cases, they have been defined in the document. Xin hasn't uploaded yet the updated version I have sent her. If you are referring test cases for the framework, those are being implemented and should be ready soon. Once they are, we will send for review.

@lguohan
Copy link
Contributor

lguohan commented Dec 7, 2017

@Nikos-Li, please add swss integration test for this PR. This is a fundamental feature. We need integration test for this one to make sure we won't have regression.

@lguohan
Copy link
Contributor

lguohan commented Mar 15, 2018

retest this please

@lguohan
Copy link
Contributor

lguohan commented Mar 15, 2018

@Nikos-Li , build fails for vs seems to be related to crm pr.

@nikos-github
Copy link
Contributor Author

@lguohan It's not a build issue.

@nikos-github
Copy link
Contributor Author

@lguohan All checks pass.

@lguohan
Copy link
Contributor

lguohan commented Mar 18, 2018

there is potential race condition since ifChangeInformNextHop is called in the context of port notification thread. ifChangeInformNexthop further called validnexthopinNextHopGroup/invalidnexthopinNextHopGroup, to iterate over m_syncdNextHopGroups and modify the next hop group member without lock.

orchagent is essential single thread application, we need to schedule the port notification event to the main loop.

@nikos-github
Copy link
Contributor Author

@lguohan validnexthopinNextHopGroup/invalidnexthopinNextHopGroup doesn't add/delete group members at the orchagent level. It does so only in syncd and there is a mutex on the db. I don't see the racing condition you are referring to.

@lguohan
Copy link
Contributor

lguohan commented Mar 19, 2018

there are now two threads accessing m_syncdNextHopGroups, one is from main thread, now you add another one from port notification. data needs to be protected.

@nikos-github
Copy link
Contributor Author

@lguohan membership of m_syncdNextHopGroups is not modified.

@lguohan
Copy link
Contributor

lguohan commented Mar 19, 2018

does not matter, the main thread could modify, you can still get into trouble.

@lguohan
Copy link
Contributor

lguohan commented Mar 19, 2018

did we ever tested the case when all nexthop are down and a route points to a nexthop group with zero nexthop group member?

Signed-off-by: Guohan Lu <lguohan@gmail.com>
@nikos-github
Copy link
Contributor Author

@lguohan Yes 8 months ago during UT. Shuotian also tested this case and verified.

@lguohan
Copy link
Contributor

lguohan commented Mar 19, 2018

hmm, sai changed from 1.0 to 1.2. Do we have any further test when sai 1.2 is released?

@nikos-github
Copy link
Contributor Author

nikos-github commented Mar 19, 2018

@lguohan Did Broadcom SAI implementation change the behavior of the remove_next_hop_group_member API and all of a sudden it deletes the nexthop groups that have no members? I don't see something like that in the code and if that was the case, then the subsequent call to remove the group on a delete after all the members are deleted, would have resulted in a crash and would be visible even without my code changes. A lot more things would be broken by now.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
@lguohan
Copy link
Contributor

lguohan commented Mar 19, 2018

retest this please

@lguohan lguohan merged commit 8e635fd into sonic-net:master Mar 22, 2018
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
lukasstockner pushed a commit to genesiscloud/sonic-swss that referenced this pull request Apr 2, 2023
* Fix issue in cmis.get_transceiver_bulk_status

1. In case it fails to read EEPROM, either self.get_rx_power() or self.get_tx_power() can be a list of 'N/A'. Need to test it before calling self.mw_to_dbm
2. It should be a valid case for either self.get_rx_power() or self.get_tx_power() to return None. Handle other fields instead of returning None in this case

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Address comments: distinguish scenarios between not supporting and reading failure

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Adjust unit test case

Signed-off-by: Stephen Sun <stephens@nvidia.com>

* Remove redundant code

Signed-off-by: Stephen Sun <stephens@nvidia.com>

---------

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants