-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECMP acceleration for physical i/f down events #351
Conversation
orchagent/neighorch.cpp
Outdated
{ | ||
SWSS_LOG_ENTER(); | ||
|
||
for (auto nhop = m_syncdNextHops.begin(); nhop != m_syncdNextHops.end(); ++nhop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking, for any interface change, this will scan the entire nexthop cache and check for interface match. Does this have any performance issues in case of large nexthop set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big do you expect the nexthop set to be? Typical data center won't have more than 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Since it is handled in the same context as link down event, my concern was if it could result in some delays/side effects. The other option would have been acting on the link down notification from kernel. Closing this as the assumption is, size of nexthop set would be comparatively smaller.
orchagent/orchdaemon.cpp
Outdated
FdbOrch *gFdbOrch; | ||
NeighOrch *neigh_orch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like global variables follow name prefixed with "g". Do you want to follow this since it is now moved to global space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can change that.
orchagent/portsorch.cpp
Outdated
} | ||
SWSS_LOG_NOTICE("Set operation status %s to host interface %s", | ||
up ? "UP" : "DOWN", it->second.m_alias.c_str()); | ||
if (neigh_orch->ifChangeInformNextHop(it->second.m_alias, up) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just would like to mention, having the nexthop operation in the same context where we handle link up/down events, may create some delays in subsequent link up/down operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is per design document. Please refer to it.
} | ||
|
||
next_hop_id = next_hop_group_entry.nhopgroup_members[nhop]; | ||
status = sai_next_hop_group_api->remove_next_hop_group_member(next_hop_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, adding a nexthop_group_member to SAI above and then check for IFDOWN to remove the group_member from here seems inefficient. I think you could check for this in the first for-loop ("for (auto it : next_hop_set)" and create the next_hop_ids vector accordingly. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The orch agent needs to always have a copy of the nexthop since the owner is a routing protocol. Please refer to the design doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next_hop_id which was created then deleted, but saved here is only used again when the removeNextHopGroup(). I think what @prsunny suggested is valid. orch agent needs to have a copy of nexthop, but not the obsoleted next_hop_id.
@nikos-github did I miss anything?
orchagent/routeorch.cpp
Outdated
for (auto nhop = next_hop_group_entry.nhopgroup_members.begin(); | ||
nhop != next_hop_group_entry.nhopgroup_members.end(); ++nhop) { | ||
|
||
if (m_neighOrch->isNextHopFlagSet(nhop->first, NHFLAGS_IFDOWN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you check flags here. I think the request is to delete the nexthop group and irrespective of whether member is "down" or "up", it should be deleted. Also, if not deleted here, when will this be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the flag is set, the member has already been removed. Calling the SAI API will cause a crash with the current sonic design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
orchagent/neighorch.cpp
Outdated
m_syncdNextHops[ipAddress] = next_hop_entry; | ||
|
||
m_intfsOrch->increaseRouterIntfsRefCount(alias); | ||
|
||
return true; | ||
} | ||
|
||
bool | ||
NeighOrch::setNextHopFlag (const IpAddress &ipaddr, const uint32_t nh_flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style doesn't meet SONiC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SONiC has an established code style? Can you provide a pointer to it please where current sonic code style requirements are laid out/specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SONiC has established code style. However it is not described in a document as in many other opensource projects. If you take a look to .*h, *.cpp all they look similar. Code that provided here looks completely different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. The code here is more readable and makes it easier to identify when any tool is used in the code base (even grep), where the function definition is. If you do insist against those improvements (which should be sonic-wide), I guess I will have to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objections to change code style. But it should be done in all files at once and not in couple places of modified files. In other case we will have a mess. I also think that new code style should be discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. You should apply the same diligence to code coming from your team.
Next week I'm OOO and won't be able to review.
did we have test cases review this feature? |
We had agreed to let the feature in and test cases plus review will follow. Why are we coming back a month later asking for something we had agreed on a course of action already and blocking the commit? Why don't you outline what your expectations are in order for the code to go in and whether with testing you are referring to automation testing of this feature to MSFT test framework or whether you are referring to feature testing and test cases that I performed? |
orchagent/neighorch.cpp
Outdated
{ | ||
SWSS_LOG_ENTER(); | ||
|
||
for (auto nhop = m_syncdNextHops.begin(); nhop != m_syncdNextHops.end(); ++nhop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Since it is handled in the same context as link down event, my concern was if it could result in some delays/side effects. The other option would have been acting on the link down notification from kernel. Closing this as the assumption is, size of nexthop set would be comparatively smaller.
orchagent/routeorch.cpp
Outdated
for (auto nhop = next_hop_group_entry.nhopgroup_members.begin(); | ||
nhop != next_hop_group_entry.nhopgroup_members.end(); ++nhop) { | ||
|
||
if (m_neighOrch->isNextHopFlagSet(nhop->first, NHFLAGS_IFDOWN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
I do not see the test cases defined in the design document. https://github.com/Azure/SONiC/blob/gh-pages/doc/sonic-ecmp-acceleration.docx This is mostly control plane changes, I think integrating into the swss integration test should be ok, but we need to have the test cases designed and reviewed before merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as comments
@lguohan If you are referring to UT test cases, they have been defined in the document. Xin hasn't uploaded yet the updated version I have sent her. If you are referring test cases for the framework, those are being implemented and should be ready soon. Once they are, we will send for review. |
@Nikos-Li, please add swss integration test for this PR. This is a fundamental feature. We need integration test for this one to make sure we won't have regression. |
retest this please |
@Nikos-Li , build fails for vs seems to be related to crm pr. |
@lguohan It's not a build issue. |
@lguohan All checks pass. |
there is potential race condition since ifChangeInformNextHop is called in the context of port notification thread. ifChangeInformNexthop further called validnexthopinNextHopGroup/invalidnexthopinNextHopGroup, to iterate over m_syncdNextHopGroups and modify the next hop group member without lock. orchagent is essential single thread application, we need to schedule the port notification event to the main loop. |
@lguohan validnexthopinNextHopGroup/invalidnexthopinNextHopGroup doesn't add/delete group members at the orchagent level. It does so only in syncd and there is a mutex on the db. I don't see the racing condition you are referring to. |
there are now two threads accessing m_syncdNextHopGroups, one is from main thread, now you add another one from port notification. data needs to be protected. |
@lguohan membership of m_syncdNextHopGroups is not modified. |
does not matter, the main thread could modify, you can still get into trouble. |
did we ever tested the case when all nexthop are down and a route points to a nexthop group with zero nexthop group member? |
Signed-off-by: Guohan Lu <lguohan@gmail.com>
@lguohan Yes 8 months ago during UT. Shuotian also tested this case and verified. |
hmm, sai changed from 1.0 to 1.2. Do we have any further test when sai 1.2 is released? |
@lguohan Did Broadcom SAI implementation change the behavior of the remove_next_hop_group_member API and all of a sudden it deletes the nexthop groups that have no members? I don't see something like that in the code and if that was the case, then the subsequent call to remove the group on a delete after all the members are deleted, would have resulted in a crash and would be visible even without my code changes. A lot more things would be broken by now. |
Signed-off-by: Guohan Lu <lguohan@gmail.com>
retest this please |
* Fix issue in cmis.get_transceiver_bulk_status 1. In case it fails to read EEPROM, either self.get_rx_power() or self.get_tx_power() can be a list of 'N/A'. Need to test it before calling self.mw_to_dbm 2. It should be a valid case for either self.get_rx_power() or self.get_tx_power() to return None. Handle other fields instead of returning None in this case Signed-off-by: Stephen Sun <stephens@nvidia.com> * Address comments: distinguish scenarios between not supporting and reading failure Signed-off-by: Stephen Sun <stephens@nvidia.com> * Adjust unit test case Signed-off-by: Stephen Sun <stephens@nvidia.com> * Remove redundant code Signed-off-by: Stephen Sun <stephens@nvidia.com> --------- Signed-off-by: Stephen Sun <stephens@nvidia.com>
These changes introduce support for ecmp acceleration for physical i/f down events. The design document was reviewed a couple of months back and checked into the repo.
The functionality has been verified through manual testing in a physical topology having 2 T0 switches (asw01, asw02) connected 4-way to 4 T1 switches (csw01, csw02, csw03, csw04). IXIA was connected in the topology to both T0 switches. Prefixes were advertised to asw02 and traffic was towards asw01. The ecmp acceleration was tested by bring down the links in different combinations on the T1 devices and checking the membership through logs on asw01 as well as having IXIA measure the traffic loss duration.
Nikos.-