Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libteam] Support multiple member ports for LACP fallback #10823

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lukasstockner
Copy link
Contributor

Why I did it

Fixes #7555 and #8866.

While the LACP fallback HLD correctly describes the behavior for LAGs with multiple ports, the actual implementation only worked for single-port LAGs.

When multiple ports are present, the fallback mode would select all ports, therefore making it useless for the purpose described in the HLD (PXE-booting).

How I did it

This commit adds a patch to implement the desired behavior:
If all non-disabled ports are defaulted, one port goes into fallback mode and gets selected for the LAG. As soon as any port receives a LACPDU, the fallback port goes back to defaulted.
The fallback port is selected by higher port priority or, if both have equal priority (the default), by lower port number (as described in the HLD).

How to verify it

See #7555 - start a DVS container, create a Portchannel with two members and fallback enabled, create a bond on the non-DVS end of the interfaces, bring it up and down and observe the state of the Portchannel.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Not sure if this counts as a fix or feature, but I'd argue fix since this behavior was described in the HLD all along and the fallback is not particularly useful without it.

Description for the changelog

Support multiple member ports for LACP fallback.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

IMG-1132

@lukasstockner
Copy link
Contributor Author

@saiarcot895 @vaibhavhd based on recent PRs, you might be the right people to ask for a review here? Any help would be appreciated.

@puffc
Copy link
Contributor

puffc commented Jul 7, 2023

@rodnymolina @liat-grozovik Could you please review this PR?

@saiarcot895 saiarcot895 self-requested a review July 10, 2023 19:09
@puffc
Copy link
Contributor

puffc commented Jul 27, 2023

@saiarcot895 Ping

+ }
+ if (lacp_port->state == PORT_STATE_CURRENT || lacp_port->state == PORT_STATE_EXPIRED) {
+ /* If at least one port is currently not defaulted, don't do any fallback. */
+ have_selected_port = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this to something like have_active_port or similar? have_selected_port sounds like a port was selected in this loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 Acutally "selected" is a terminology used in IEEE 802.3ad. When an aggregator port received LACPDU, the RX state machine will transit to "CURRENT" state and the selection logic update the state machine variable "selected" to "SELECTED", "UNSELECTED" or "STANDBY". See below excerpted from IEEE 802.3ad:
"
A value of SELECTED indicates that the Selection Logic has selected an appropriate
Aggregator. A value of UNSELECTED indicates that no Aggregator is currently selected. A
value of STANDBY indicates that although the Selection Logic has selected an appropriate
Aggregator, aggregation restrictions currently prevent the Aggregation Port from being enabled
as part of the aggregation, and so the Aggregation Port is being held in a standby condition.
This variable can only be set to SELECTED or STANDBY by the operation of the Aggregation
Port’s Selection Logic. It can be set to UNSELECTED by the operation of the Aggregation
Port’s Receive machine, or by the operation of the Selection Logic associated with another
Aggregation Port.
"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that selected is perfectly accurate here, so I've renamed this to do_fallback, which is a better description of the purpose anyways - it's supposed to track whether we want to use fallback logic or not.

@saiarcot895
Copy link
Contributor

Note that you'll need to resolve the merge conflicts as well.

+ }
+ if (lacp_port->state == PORT_STATE_CURRENT || lacp_port->state == PORT_STATE_EXPIRED) {
+ /* If at least one port is currently not defaulted, don't do any fallback. */
+ have_selected_port = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 Acutally "selected" is a terminology used in IEEE 802.3ad. When an aggregator port received LACPDU, the RX state machine will transit to "CURRENT" state and the selection logic update the state machine variable "selected" to "SELECTED", "UNSELECTED" or "STANDBY". See below excerpted from IEEE 802.3ad:
"
A value of SELECTED indicates that the Selection Logic has selected an appropriate
Aggregator. A value of UNSELECTED indicates that no Aggregator is currently selected. A
value of STANDBY indicates that although the Selection Logic has selected an appropriate
Aggregator, aggregation restrictions currently prevent the Aggregation Port from being enabled
as part of the aggregation, and so the Aggregation Port is being held in a standby condition.
This variable can only be set to SELECTED or STANDBY by the operation of the Aggregation
Port’s Selection Logic. It can be set to UNSELECTED by the operation of the Aggregation
Port’s Receive machine, or by the operation of the Selection Logic associated with another
Aggregation Port.
"

@@ -12,3 +12,4 @@
0012-Increase-min_ports-upper-limit-to-1024.patch
0013-set-port-to-disabled-state-during-removal.patch
0014-dont-move-the-port-state-from-disabled-when-admin-state-is-down.patch
0015-Extend-LACP-fallback-support-to-multiple-ports.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasstockner Need your help to modify this line because 0015 had already been used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased now, sorry for the delay.

Fixes sonic-net#7555 and sonic-net#8866.

While the LACP fallback HLD correctly describes the behavior for LAGs with
multiple ports, the actual implementation only worked for single-port LAGs.

When multiple ports are present, the fallback mode would select all ports,
therefore making it useless for the purpose described in the HLD (PXE-booting).

Therefore, this commit adds a patch to implement the desired behavior:
If all non-disabled ports are defaulted, *one* port goes into fallback mode and
gets selected for the LAG. As soon as any port receives a LACPDU, the fallback
port goes back to defaulted.
The fallback port is selected by higher port priority or, if both have equal
priority (the default), by lower port number (as described in the HLD).

Signed-off-by: Lukas Stockner <lstockner@genesiscloud.com>
@puffc
Copy link
Contributor

puffc commented Aug 15, 2023

@saiarcot895 Please review this PR again. Thanks.

@saiarcot895
Copy link
Contributor

/azpw run Azure.sonic-buildimage

@robertlperry
Copy link

Hi @yxieca and @zhangyanzhao this is the PR I mention it today meeting. If it can be backported to release 202305, that would be great. If not, the earliest release available is fine. Thank you.

@robertlperry
Copy link

Hi @saiarcot895, is there anything needed to move forward with this PR. I can confirm this PR works and fixes the issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LAG] LACP fallback feature not working as expected
5 participants