-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deterministic link bring-up feature for SFF compliant modules #383
Conversation
@prgeor @mihirpat1 could you please take a look at the PR? |
if mask == 0: | ||
self.log_notice("{}: No change is needed for tx_disable value".format(lport)) | ||
continue | ||
if api.tx_disable_channel(mask, target_tx_disable_flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we bring a module to high-power before enabling TX ? In CMIS State machine it is done as a part of CMIS Data Path SM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, non-cmis/ccmis optics are brought to High Power mode by default by platform.
The missing part is: enabling high power class by setting page 0 byte 93 bit 2(if advertised power class >=5). Yes, this handling will be added to sff_mgr in separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "platform", it means there could be different implementation. So, does SFF standard require "platform" to set module to high power? If not, I prefer to add a check here and make sure module is in high power
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "platform", it means there could be different implementation. So, does SFF standard require "platform" to set module to high power?
I didn't see standard mentions who (PI or platform) should do it.
In today's sonic PI code, there's no place to turn module to high power mode. Then I assume platform side does something (either via SW or HW), otherwise how 100G optics (which needs high power mode to be functional) can come up on today's system for all platforms/vendors.
I prefer to add a check here and make sure module is in high power
Yes, that was part of the plan for high power handling in sff_mgr, basically sff_mgr will do two things:
- check and make sure module in high power mode if needed
- check and enable high power class for module if needed (power class >= 5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raised issue to track high power handling #414
else: | ||
self.log_error("{}: Failed to {} TX with channel mask: {}".format( | ||
lport, "disable" if target_tx_disable_flag else "enable", bin(mask))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to add the Custom module SI parameters here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, custom optics SI is supposed to be handled by sff_mgr. The detail is still in discussion. This will be added via separate PR also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, when high-power and custom SI are supported the following sequence should be ensured for bringing a module up:
- rx_output_disable=1
- high-power
- Set customer SI (per lane speed)
- rx_output_disable=0
- tx_disable=0
Agree ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, when high-power and custom SI are supported the following sequence should be ensured for bringing a module up:
- rx_output_disable=1
- high-power
- Set customer SI (per lane speed)
- rx_output_disable=0
- tx_disable=0
Agree ?
At the time of high-power and custom SI are supported, 2, 3, 5 would be ensured.
sff spec doesn't seem to mention rx_output_disable/enable as mandatory, and they were not in the original discussion of sff_mgr. But 1 and 4 can be discussed or taken care of via subsequent PRs.
As long as we have the main structure of sff_mgr in place, it would be easy for people to add on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised issue to track SI setting: #415
@mihirpat1 could you review? |
@longhuan-cisco could you check the conflict |
sure, addressed the conflict. |
if mask == 0: | ||
self.log_notice("{}: No change is needed for tx_disable value".format(lport)) | ||
continue | ||
if api.tx_disable_channel(mask, target_tx_disable_flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "platform", it means there could be different implementation. So, does SFF standard require "platform" to set module to high power? If not, I prefer to add a check here and make sure module is in high power
AFAIK, those doesn't have ability of doing tx_disable/enable |
Updated the diff and raised two other PR's based on latest comments from Prince's. the other PRs: |
please resolve conflicts |
conflicts resolved. |
@prgeor Please merge if the review is completed. |
@longhuan-cisco why AOC is outside scope?
|
@prgeor This was answered at #383 (comment)
|
@longhuan-cisco there is some conflict, can you rebase? |
@prgeor I didn't see conflicts, it just said branch "out-of-date". Anyway I updated it to latest master. please check. |
This PR is a dependency of sonic-net/sonic-platform-daemons#383 HLD of sff_mgr: sonic-net/SONiC#1371 Why I did it Add enable_xcvrd_sff_mgr flag support for sff_mgr
This PR is a dependency of sonic-net/sonic-platform-daemons#383 HLD of sff_mgr: sonic-net/SONiC#1371 Why I did it Add enable_xcvrd_sff_mgr flag support for sff_mgr
…nic-net#383) * SFF manager for handling QSFP+/QSFP28 transceiver modules
…nic-net#383) * SFF manager for handling QSFP+/QSFP28 transceiver modules
HLD update is in different PR (link: sonic-net/SONiC#1371)
Description
According to Interface-Link-bring-up-sequence.md, add a new thread sff_mgr under xcvrd to provide deterministic link bringup feature for SFF compliant modules (100G/40G).
By default sff_mgr is disabled, to enable sff_mgr on a platform, add 'enable_xcvrd_sff_mgr' and set it to true in pmon_daemon_ctrl json file.
Motivation and Context
Scope of sff_mgr: 100G/40G optics (copper/aoc not in the scope)
Why sff_mgr
The goal of sff_mgr is to make sure SFF compliant modules are brought up in a deterministc way, meaning TX is enabled only after host_tx_ready becomes True, and TX will be disabled when host_tx_ready becomes False. This will help eliminate link stability issue and potential interface flap, also turning off TX reduces the power consumption and avoid any lab hazard for admin shut interface.
What sff_mgr does
sff_mgr is a new thread inside Xcvrd. It will only turn on module Tx if both admin_status=up and host_tx_ready=true.
sff_mgr will skip if neither of these below events happens:
transceiver insertion event
(including bootup and hot plugin),host_tx_ready change event
andadmin_status change event
. All other cases are ignored.To detect these events, sff_mgr listens to below DB tables:
host_tx_ready
field for host_tx_ready change event.type
field for insesrtion event.admin_status
field for admin_status change event, and info such asindex
/channel
/etc.For platforms/vendors:
This feature is enabled on per platform basis. There could be cases where vendor(s)/platform(s) may take time to shift from existing codebase to the model (work-flows).
By default this feature is disabled. No impact for the platforms in current deployment. To enable it, a flag needs to be added to pmon_daemon_control.json and set to true.
How Has This Been Tested?
Additional Information (Optional)