-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CMIS]Improved 400G link bring up sequence #254
Conversation
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
keys = port_tbl.getKeys() | ||
|
||
if lport in keys: | ||
cmd = 'sonic-db-cli STATE_DB hget "PORT_TABLE|{}" host_tx_ready'.format(lport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pmon is global container, how this wok under multi NPU? I mean sonic-db-cli does not require to look namesapce db ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pmon is global container, how this wok under multi NPU? I mean sonic-db-cli does not require to look namesapce db ?
@prgeor updated that he would make this thing namespace aware i.e. xcvrd would subscribe/listen to all namespace's stateDB and then act accordingly. This piece yet to be added.
Prince may comment/update further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sacnaik @shyam77git please check the updated code
# D.2.2 Software Deinitialization | ||
api.set_datapath_deinit(host_lanes) | ||
api.set_lpmode(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are changing generic code. Isn't this set_lpmode is required function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each Datapath is independent, this code will bring the whole module to low power 8 times if we have 8 data paths being initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like this PR will perform CMIS application selection one by one for each datapath(i.e. logical port on the MAC) instead of a one-shot configuration for all the datapatches on a transceiver.
What's unclear to me is as follows:
- Is there a switch ASIC available to support asynchronous mixed port mode in a port macro? (i.e. Is it possible to have both 1x100G-NZR + 2x100G-PAM4 on a single QSFPDD port, from MAC perspective?)
- Is there a CMIS transceiver available to support asynchronous mixed port mode in a single module?
In the above scenario, each datapath will require a dedicated clock divisor while they all share the same reference clock source. As far as I know, this is NOT a supported port mode in Broadcom ASIC, but I'm not sure if this is also applicable to other vendors, could you please clarify if there is a such module available for testing? Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xcvr is lisenting to various port change events related to a ports speed, lane mapping etc and these events across various logical ports in the case of breakout port (say one physical port mapping to 4 logical ports) will arrive asynchronously, so we need to handle these events independently, even though it belong to same physical port.
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_READY | ||
continue | ||
|
||
# Skip if it's not a paged memory device | ||
if api.is_flat_memory(): | ||
self.log_notice("{}: skipping CMIS state machine for flat memory xcvr".format(lport)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be periodically logging this, then it is not right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the state machine moves to CMIS_STATE_READY after this, and doesnot come again here unless the state is changed.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
# NOTE: Some CMIS compliant modules may have 'auto-squelch' feature where | ||
# the module won't take datapaths to Activated state if host tries to enable | ||
# the datapaths while there is no good Tx signal from the host-side. | ||
if not is_host_tx_ready(lport): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be implemented as a event handler?
- Subscribe to PORT_TABLE in state-db, then filter to host_tx_ready field.
- Based on the host_tx_ready status, set the appropriate CMIS module state will be set here in the event handler.
- Since the main CMIS task worker keep checking for CMIS state change for all ports, once the point (2) is set, the CMIS task worker will work on the appropriate state.
- we can introduce a new state between CMIS_STATE_AP_CONF and CMIS_STATE_DP_INIT, say 'CMIS_STATE_WAIT_FOR_HOST' . so during module insertion the CMIS task worker will set to this state once it completes CMIS_STATE_AP_CONF and this state will be changed to CMIS_STATE_DP_INIT only in the host_tx_ready handler. this way it will be synchornized.
Keeping the host_tx_ready processing as event handler will have following benefit.
- Keeping the host_tx_ready out of CMIS task worker to event handler will benefit other speed to follow the deterministic link bringup approach.
- It will avoid continuous polling of host-tx-ready using Redis-DB query, which will save time when scaled to fully populated 48/72 ports.
The host_tx_ready event handler can invoke the xcvr_API_factory API say 'tx_disable' which will set appropriate CMIS state in case of CMIS module and it will disable/enable laser/Tx appropriately in case of non-cmis module, which can be leveraged eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaganbal-a I think event handling may NOT work when Xcvrd restarts. Consider this:-
- 400G Port link is up.
- Xcvrd crashes and in the meantime someone plugs in another 400G module
- Xcvrd won't get another 'host_tx_ready' event because 'host_tx_ready' doesn't depend upon whether module is present or not. So, Xcvrd will keep waiting for 'host_tx_ready' or unless user does admin shut followed by admin up.
Another similar case as above
- Xcvrd is running and got 'host_tx_ready' but module is absent
- Xcvrd crashes and comes up
- Module is plugged in but Xcvrd is not going to receive 'host_tx_ready' and link won't come up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prince
It is captured as part of design that the host_tx_ready will be updated regardless of transceiver presence, and it is xcvrd job to check the host_tx_ready during module Init during insertion (a local host_tx_ready dictionary will help here but not polling), so the DP state will be moved to DP_INIT/DP_ACTIVATE state based on host_tx_ready. Above mentioned cases will be covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaganbal-a Xcvrd process don't have a persistent storage to capture the 'host_tx_ready' state, so it depends upon the Redis-DB i.e it needs to read the 'host_tx_ready' state from the DB. Let Xcvrd read the state when it requires rather than being notified. Because if we have to take the approach of notification, then we need to handle the above Xcvrd exit case separately by reading the DB which can be avoided if Xcvrd can read the DB when required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let Xcvrd read the state when it requires rather than being notified.
Can you clarify as to what is meant by 'when it requires'?
Does it mean continuous polling of host-tx-ready using Redis-DB query OR what?
When interface/port moves from config shut to no shut or vice-versa, won't it a notification of host_tx_ready from redisDB otherwise how would it xcvrd be notified of this? repeated/continuous polling indefinitely without any event happening in the system is waste of cpu cycles / system resources.
Also config interface shut/no shut are rare events along with optics removal/insertion
Also, per HLD we decided to go with notification based mechanism of host_tx_ready.
To handle xcvrd restart case, as Jagan mentioned above, better to check the host_tx_ready during module Init during insertion as source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shyam77git @jaganbal-a please take a look again as i updated the code.
@ds952811 can you review |
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
@@ -870,9 +872,11 @@ def __init__(self, port_mapping): | |||
self.task_process = None | |||
self.port_dict = {} | |||
self.port_mapping = copy.deepcopy(port_mapping) | |||
self.xcvr_table_helper = XcvrTableHelper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If XcvrTableHelper
is implemented as a class member, it should be initialized in task_worker
instead of __init__
.
This is because CmisManagerTask
is a subprocess wrapper, which means a subprocess will be forked once task_run
is called.
If it is initialized in __init__
, the flow is
- parent creates the
XcvrTableHelper
along with all the low level objects, like the sockets based on which the tables are accessed - and then the child is forked, all resources opened by the parent are duplicated
- now there are two copies of all the resources but the parent's one will not really be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, initializing it in __init__
and then duplicating it from parent to child should also work. But it looks like taking a detour to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs what is the concern here? only the worker thread is going to use the socket, not the parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xcvr_table_helper
is initialized in the parent and then duplicated in the client. but it will never be used by the parent.
so the resource allocated in the parent is wasted.
- if it is initialized in
__init__
:xcvr_table_helper
initialized in parent. resource allocated for it but not used.- duplicated in the child during fork
- if it is initialized in
task_worker
xcvr_table_helper
is initialized in the child before it starts to work.- no resource was allocated in the parent.
as I mentioned, it also works in the current way. so I'm ok if you do not decide to change it.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
return | ||
# 'index' can be -1 if STATE_DB|PORT_TABLE | ||
# TODO:Make sure doesnot breakt non-CMIS | ||
#ptype = _wrapper_get_sfp_type(pport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe get_sfp_type() returns the sfp_type present in the port and not the port cage type.
So can you rename the variable and uncomment the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_sfp_type() returns only the port/cage type, and it's a constant initialized at chassis.init()
e.g.
https://github.com/Azure/sonic-buildimage/blob/master/platform/broadcom/sonic-platform-modules-dell/z9332f/sonic_platform/chassis.py#L132
And a QSFP28 could be attached to the QSFPDD port on the box, and it's better to skip the CMIS init sequence for it, hence please uncomment the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ds952811 form-factor of the transceiver is not the right check to determine whether the module follows CMIS or not. Even QSFP form-factor may follow CMIS sonic-net/sonic-platform-common#246 that follows CMIS
@@ -1150,6 +1182,14 @@ def task_worker(self): | |||
self.CMIS_STATE_REMOVED]: | |||
continue | |||
|
|||
# Handle the case when Xcvrd was NOT running when 'host_tx_ready' or 'admin_status' | |||
# was updated or this is the first run so reconcile the above two attributes | |||
if 'host_tx_ready' not in self.port_dict[lport]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will not get hit if the port is CMIS_STATE_READY, so with config-interface shutdown the event will be returned in the above if condition and the modules will not be put in DP-DEACTIVATE.
If any runtime Tx fault handler being implemented and the host_tx_ready is set to false, then the event will be missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMIS state reset happens if any port configuration is changed including admin or host_tx_ready status here :https://github.com/Azure/sonic-platform-daemons/blob/master/sonic-xcvrd/xcvrd/xcvrd.py#L919
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
def task_worker(self): | ||
self.log_notice("Starting...") | ||
self.log_notice("Starting CmisTaskMgr with pid:{}".format(self.task_process.pid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you checked the syslog of pmon#xcvrd, the pid is available over there, on the other hand, we already have the 'CMIS' prefix in the CmisManagerTask.log_notice(), hence this is not necessary
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
|
||
# APPL_DB for CONFIG updates, and STATE_DB for insertion/removal | ||
sel, asic_context = port_mapping.subscribe_port_update_event(['APPL_DB', 'STATE_DB']) | ||
sel, asic_context = port_mapping.subscribe_port_update_event() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While APPL_DB is for monitoring config updates arrived at swss#orchagent, the STATE_DB is for the transceiver insertion/removal detection, and it's not necessary to have both listeners registered all the time.
May I have a clarification on why we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
@@ -1284,6 +1327,13 @@ def task_worker(self): | |||
try: | |||
# CMIS state transitions | |||
if state == self.CMIS_STATE_INSERTED: | |||
if self.port_dict[lport]['host_tx_ready'] != 'True' or \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change it 'true' and 'false' for host_tx_ready handling through out the file as we discussed.
Also change it in test_xcvrd.py
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
if port_change_event.port_dict is not None and 'host_tx_ready' in port_change_event.port_dict: | ||
self.port_dict[lport]['host_tx_ready'] = port_change_event.port_dict['host_tx_ready'] | ||
if port_change_event.port_dict is not None and 'admin_status' in port_change_event.port_dict: | ||
self.port_dict[lport]['admin_status'] = port_change_event.port_dict['admin_status'] | ||
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_INSERTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the CMIS state to inserted and invoke reset_cmis() only when there is a change in value for the following fields speed / lanes / host_tx_ready or admin_status.
With current change, the cmis state is set to INSERTED even for oper_state state change in value in the DB for a Port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the port change event doesn't have a mechanism to filter out the event based upon the corresponding PORT_TABLE. So, until we have this mechanism to filter out, we need this code here otherwise Xcvrd will miss optics jack out/in
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
if port_change_event.port_dict is not None and 'host_tx_ready' in port_change_event.port_dict: | ||
self.port_dict[lport]['host_tx_ready'] = port_change_event.port_dict['host_tx_ready'] | ||
if port_change_event.port_dict is not None and 'admin_status' in port_change_event.port_dict: | ||
self.port_dict[lport]['admin_status'] = port_change_event.port_dict['admin_status'] | ||
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_INSERTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is redundant, reset_cmis() is doing this.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
if 'admin_status' in port_change_event.port_dict and 'oper_status' in port_change_event.dict: | ||
# At times 'admin_status' is NOT the same in the PORT_TABLE of APPL_DB and STATE_DB | ||
# We dont have better way to check if 'admin_status' is from APPL_DB or STATE_DB so this | ||
# check is put temporarily to list only to APPL_DB's admin_status and ignore that of STATE_DB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of 'list only' it should be 'listen only'
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
@@ -994,7 +994,10 @@ def on_port_update_event(self, port_change_event): | |||
self.port_dict[lport]['lanes'] = port_change_event.port_dict['lanes'] | |||
if 'host_tx_ready' in port_change_event.port_dict: | |||
self.port_dict[lport]['host_tx_ready'] = port_change_event.port_dict['host_tx_ready'] | |||
if 'admin_status' in port_change_event.port_dict: | |||
if 'admin_status' in port_change_event.port_dict and 'oper_status' in port_change_event.dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in port_change_event.dict: -> in port_change_event.port_dict:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if port_change_event.event_type == port_change_event.PORT_SET: | ||
if pport >= 0: | ||
self.port_dict[lport]['index'] = pport | ||
if port_change_event.port_dict is not None and 'speed' in port_change_event.port_dict: | ||
if 'speed' in port_change_event.port_dict and port_change_event.port_dict['speed'] != 'N/A': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a trace for each event that is processed.
* Improved 400G link bring up sequence * Event based handling * Remove unused functions * Force DP to remain in DpInitialized State state on admin shutdown * Added test case * Skip CMIS task manager based upon flag * Addressed review comments * Fix xcvrd crash * Fix test failure * Listen only to 'APPL_DB's admin_status * Fix typo
Description
Improved 400G link bring up sequence HLD
Motivation and Context
Deterministically improve the link bring up of 400G link that are using CMIS compliant transceivers.
How Has This Been Tested?
Tested on
Additional Information (Optional)