Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix xcvrd to support 400G ZR optic #293

Merged
merged 9 commits into from
Oct 20, 2022
4 changes: 2 additions & 2 deletions sonic-xcvrd/tests/test_xcvrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,7 @@ def test_SfpStateUpdateTask_handle_port_change_event(self, mock_table_helper):
port_change_event = PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_ADD)
wait_time = 5
while wait_time > 0:
task.on_port_config_change(port_change_event)
task.on_port_config_change(None, port_change_event)
if task.port_mapping.logical_port_list:
break
wait_time -= 1
Expand All @@ -717,7 +717,7 @@ def test_SfpStateUpdateTask_handle_port_change_event(self, mock_table_helper):
port_change_event = PortChangeEvent('Ethernet0', 1, 0, PortChangeEvent.PORT_REMOVE)
wait_time = 5
while wait_time > 0:
task.on_port_config_change(port_change_event)
task.on_port_config_change(None, port_change_event)
if not task.port_mapping.logical_port_list:
break
wait_time -= 1
Expand Down
47 changes: 35 additions & 12 deletions sonic-xcvrd/xcvrd/xcvrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -923,6 +923,7 @@ class CmisManagerTask:

CMIS_MAX_RETRIES = 3
CMIS_DEF_EXPIRED = 60 # seconds, default expiration time
CMIS_DEF_EXPIRED_ZR = 200 # seconds, expiration time for ZR module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we read the module advertised datapath init time and use it so that we have it working for both ZR and non-ZR without hardcoding this here?

Copy link

@shyam77git shyam77git Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid hard-coding of timer values is tracked via following git issues:
#290
#294

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To read CMIS dp state duration, I create another PR in sonic-platform-common (sonic-net/sonic-platform-common#312).

Copy link

@shyam77git shyam77git Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abohanyang - How is #312 (above) different from #290 ?
Looks same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shyam77git , it's to address the same issue. but some of my code changes (sonic-net/sonic-platform-common#312 ) are in a different repo. I think we need a separate PR for that part.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abohanyang
Is #312 sufficient to take care of what's mentioned in #290 or anything more required?
Note that #294 to take care for xcvrd side changes.

Copy link
Contributor Author

@abohanyang abohanyang Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. 312 contains the implementation of API, get_datapath_init_duration. I just commit more changes in this PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. 312 contains the implementation of API, get_datapath_init_duration. I just commit more changes in this PR.

Thanks for the clarification.
This PR now is going to take care of git issue #293 - I'm reviewing this further

CMIS_MODULE_TYPES = ['QSFP-DD', 'QSFP_DD', 'OSFP']
CMIS_NUM_CHANNELS = 8

Expand Down Expand Up @@ -986,22 +987,38 @@ def on_port_update_event(self, port_change_event):
return

if port_change_event.event_type == port_change_event.PORT_SET:
need_update = False
if pport >= 0:
self.port_dict[lport]['index'] = pport
if self.port_dict[lport].get('index') != pport:
self.port_dict[lport]['index'] = pport
need_update = True
if 'speed' in port_change_event.port_dict and port_change_event.port_dict['speed'] != 'N/A':
self.port_dict[lport]['speed'] = port_change_event.port_dict['speed']
if self.port_dict[lport].get('speed') != port_change_event.port_dict['speed']:
self.port_dict[lport]['speed'] = port_change_event.port_dict['speed']
need_update = True
if 'lanes' in port_change_event.port_dict:
self.port_dict[lport]['lanes'] = port_change_event.port_dict['lanes']
if self.port_dict[lport].get('lanes') != port_change_event.port_dict['lanes']:
self.port_dict[lport]['lanes'] = port_change_event.port_dict['lanes']
need_update = True
if 'host_tx_ready' in port_change_event.port_dict:
self.port_dict[lport]['host_tx_ready'] = port_change_event.port_dict['host_tx_ready']
if self.port_dict[lport].get('host_tx_ready') != port_change_event.port_dict['host_tx_ready']:
self.port_dict[lport]['host_tx_ready'] = port_change_event.port_dict['host_tx_ready']
need_update = True
if 'admin_status' in port_change_event.port_dict:
self.port_dict[lport]['admin_status'] = port_change_event.port_dict['admin_status']
if self.port_dict[lport].get('admin_status') != port_change_event.port_dict['admin_status']:
self.port_dict[lport]['admin_status'] = port_change_event.port_dict['admin_status']
need_update = True
if 'laser_freq' in port_change_event.port_dict:
self.port_dict[lport]['laser_freq'] = int(port_change_event.port_dict['laser_freq'])
if self.port_dict[lport].get('laser_freq') != int(port_change_event.port_dict['laser_freq']):
self.port_dict[lport]['laser_freq'] = int(port_change_event.port_dict['laser_freq'])
need_update = True
if 'tx_power' in port_change_event.port_dict:
self.port_dict[lport]['tx_power'] = float(port_change_event.port_dict['tx_power'])
if self.port_dict[lport].get('tx_power') != float(port_change_event.port_dict['tx_power']):
self.port_dict[lport]['tx_power'] = float(port_change_event.port_dict['tx_power'])
need_update = True

self.force_cmis_reinit(lport, 0)
if need_update:
self.force_cmis_reinit(lport, 0)
else:
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_REMOVED

Expand Down Expand Up @@ -1073,6 +1090,12 @@ def get_cmis_application_desired(self, api, channel, speed):

return (appl_code & 0xf)

def get_cmis_expired(self, api):
if api.is_coherent_module():
return self.CMIS_DEF_EXPIRED_ZR
else:
return self.CMIS_DEF_EXPIRED

def is_cmis_application_update_required(self, api, channel, speed):
"""
Check if the CMIS application update is required
Expand Down Expand Up @@ -1458,7 +1481,7 @@ def task_worker(self):
# TODO: Make sure this doesn't impact other datapaths
api.set_lpmode(False)
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_AP_CONF
self.port_dict[lport]['cmis_expired'] = now + datetime.timedelta(seconds=self.CMIS_DEF_EXPIRED)
self.port_dict[lport]['cmis_expired'] = now + datetime.timedelta(seconds=self.get_cmis_expired(api))
elif state == self.CMIS_STATE_AP_CONF:
# TODO: Use fine grained time when the CMIS memory map is available
if not self.check_module_state(api, ['ModuleReady']):
Expand Down Expand Up @@ -1495,7 +1518,7 @@ def task_worker(self):
continue

# TODO: Use fine grained time when the CMIS memory map is available
abohanyang marked this conversation as resolved.
Show resolved Hide resolved
self.port_dict[lport]['cmis_expired'] = now + datetime.timedelta(seconds=self.CMIS_DEF_EXPIRED)
self.port_dict[lport]['cmis_expired'] = now + datetime.timedelta(seconds=self.get_cmis_expired(api))
self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_DP_INIT
elif state == self.CMIS_STATE_DP_INIT:
if not self.check_config_error(api, host_lanes, ['ConfigSuccess']):
Expand All @@ -1516,7 +1539,7 @@ def task_worker(self):
# D.1.3 Software Configuration and Initialization
api.set_datapath_init(host_lanes)
# TODO: Use fine grained timeout when the CMIS memory map is available
abohanyang marked this conversation as resolved.
Show resolved Hide resolved
self.port_dict[lport]['cmis_expired'] = now + datetime.timedelta(seconds=self.CMIS_DEF_EXPIRED)
self.port_dict[lport]['cmis_expired'] = now + datetime.timedelta(seconds=self.get_cmis_expired(api))
Copy link

@shyam77git shyam77git Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abohanyang : get_cmis_expired() - is this a temporary function defined to validate this fix - #293?

practically, get_datapath_init_duration() and get_datapath_deinit_duration() to be invoked instead.
Would you be taking care of this as part of this PR? or shall we take care of this in xcvrd.py via git issue #294 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I just commit more changes in this PR to call get_datapath_init_duration and get the expiration time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I just commit more changes in this PR to call get_datapath_init_duration and get the expiration time.

OK.
@abohanyang : can you please attach UT cases, logs for this case (400G ZR optics FSM working via xcrd.py as orchestrator with these timeouts all the way to interface in 'operational up' state)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.port_dict[lport]['cmis_state'] = self.CMIS_STATE_DP_TXON
elif state == self.CMIS_STATE_DP_TXON:
if not self.check_datapath_state(api, host_lanes, ['DataPathInitialized']):
Expand Down Expand Up @@ -1937,7 +1960,7 @@ def task_stop(self):
self.task_stopping_event.set()
os.kill(self.task_process.pid, signal.SIGKILL)

def on_port_config_change(self , port_change_event):
def on_port_config_change(self, stopping_event, port_change_event):
abohanyang marked this conversation as resolved.
Show resolved Hide resolved
if port_change_event.event_type == port_mapping.PortChangeEvent.PORT_REMOVE:
self.on_remove_logical_port(port_change_event)
self.port_mapping.handle_port_change_event(port_change_event)
Expand Down