Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcvrd crashes when MediaInterfaceIDApp is not defined #489

Open
AnoopKamath opened this issue Aug 2, 2024 · 6 comments
Open

xcvrd crashes when MediaInterfaceIDApp is not defined #489

AnoopKamath opened this issue Aug 2, 2024 · 6 comments
Assignees

Comments

@AnoopKamath
Copy link
Contributor

AnoopKamath commented Aug 2, 2024

The xcvrd crashes when MediaInterfaceIDApp is not defined.
There should be an error message, but xcvrd should not crash.
As a result, all other valid modules are not initialized because xcvrd crashing.

This issue was introduced after #457.
The issue is seen with Passive Copper ELB modules

Jul 12 07:28:05.978816 sonic ERR pmon#xcvrd[32]: Exception occured at SfpStateUpdateTask thread due to KeyError(1)
Jul 12 07:28:05.981925 sonic ERR pmon#xcvrd[32]: Traceback (most recent call last):
Jul 12 07:28:05.981925 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 2208, in run
Jul 12 07:28:05.981925 sonic ERR pmon#xcvrd[32]:     self.task_worker(self.task_stopping_event, self.sfp_error_event)
Jul 12 07:28:05.981925 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 2018, in task_worker
Jul 12 07:28:05.982022 sonic ERR pmon#xcvrd[32]:     self.init()
Jul 12 07:28:05.982022 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1936, in init
Jul 12 07:28:05.982058 sonic ERR pmon#xcvrd[32]:     self.retry_eeprom_set = self._post_port_sfp_info_and_dom_thr_to_db_once(port_mapping_data, self.xcvr_table_helper, self.main_thread_stop_event)
Jul 12 07:28:05.982058 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1891, in _post_port_sfp_info_and_dom_thr_to_db_once
Jul 12 07:28:05.982119 sonic ERR pmon#xcvrd[32]:     rc = post_port_sfp_info_to_db(logical_port_name, port_mapping, xcvr_table_helper.get_intf_tbl(asic_index), transceiver_dict, stop_event)
Jul 12 07:28:05.982119 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 442, in post_port_sfp_info_to_db
Jul 12 07:28:05.982217 sonic ERR pmon#xcvrd[32]:     port_info_dict = _wrapper_get_transceiver_info(physical_port)
Jul 12 07:28:05.982217 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 272, in _wrapper_get_transceiver_info
Jul 12 07:28:05.982217 sonic ERR pmon#xcvrd[32]:     return platform_chassis.get_sfp(physical_port).get_transceiver_info()
Jul 12 07:28:05.982325 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/sfp_optoe_base.py", line 24, in get_transceiver_info
Jul 12 07:28:05.982325 sonic ERR pmon#xcvrd[32]:     return api.get_transceiver_info() if api is not None else None
Jul 12 07:28:05.982325 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/api/public/cmis.py", line 171, in get_transceiver_info
Jul 12 07:28:05.982348 sonic ERR pmon#xcvrd[32]:     xcvr_info['media_lane_count'] = self.get_media_lane_count()
Jul 12 07:28:05.982348 sonic ERR pmon#xcvrd[32]:   File "/usr/local/lib/python3.9/dist-packages/sonic_platform_base/sonic_xcvr/api/public/cmis.py", line 784, in get_media_lane_count
Jul 12 07:28:05.982365 sonic ERR pmon#xcvrd[32]:     return appl_advt[appl]['media_lane_count'] if len(appl_advt) >= appl else 0
Jul 12 07:28:05.982422 sonic ERR pmon#xcvrd[32]: KeyError: 1
Jul 12 07:28:05.982422 sonic ERR pmon#xcvrd[32]: Xcvrd: exception found at child thread SfpStateUpdateTask due to KeyError(1)
Jul 12 07:28:05.982422 sonic ERR pmon#xcvrd[32]: Exiting main loop as child thread raised exception!

Failed to get desired application
Jul 12 07:28:06.025757 sonic INFO pmon#supervisord 2024-07-12 07:28:06,025 INFO exited: xcvrd (terminated by SIGKILL; not expected)

Please check eeprom of module byte 0x72 - HostInterfaceIDApp8, where MediaInterfaceIDApp8 is "00h Undefined"

 Lower page 0h
        00000000 18 40 00 07 00 00 00 00  00 00 00 00 00 00 17 00 |.@..............|
        00000010 82 00 00 00 00 00 00 00  17 80 00 00 00 00 00 00 |................|
        00000020 00 00 00 00 00 00 00 01  00 00 00 00 00 00 00 00 |................|
        00000030 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
        00000040 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
        00000050 00 00 00 00 00 03 00 00  00 00 00 00 00 00 00 00 |................|
        00000060 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
        00000070 00 00 11 00 88 00 00 00  00 00 00 00 00 00 00 00 |................|

        Upper page 0h
        00000080 18 43 49 53 43 4f 20 20  20 20 20 20 20 20 20 20 |.CISCO          |
        00000090 20 00 06 f6 36 38 2d 31  30 33 32 30 35 2d 30 32 | ...68-103205-02|
        000000a0 20 20 20 20 32 20 46 41  42 32 36 31 31 30 30 43 |    2 FAB261100C|
        000000b0 51 20 20 20 20 20 32 32  31 30 31 38 20 20 00 00 |Q     221018  ..|
        000000c0 00 00 00 00 00 00 00 00  e0 78 00 00 00 00 00 00 |.........x......|
        000000d0 00 00 00 00 00 00 00 00  00 00 00 00 00 00 f9 00 |................|
        000000e0 1b 00 07 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
        000000f0 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|

@AnoopKamath
Copy link
Contributor Author

AnoopKamath commented Aug 2, 2024

@prgeor @mihirpat1 @tshalvi
Can you please check this?

@tshalvi
Copy link
Contributor

tshalvi commented Aug 6, 2024

PR #457 did slightly modify the functionality of accessing the application advertisement list data, but the way 'Undefined' values are treated was not changed at all by this PR.

Before the changes in this PR were merged, if a module had any application with missing data in its EEPROM, the application lookup would terminate at that application, without proceeding to the next one (which might have had the complete data required for this function). With my changes applied, in such a scenario, applications with missing data would be skipped, allowing the application lookup process to proceed to the next application in order to obtain a complete list of applications that have full data in the module's EEPROM.

Therefore, I don't think this issue is related to my changes. However, if you'd like @prgeor @mihirpat1 @AnoopKamath, I can investigate further to find a proper solution. Could you please provide the PN of the module where you observed this issue?

@rajann
Copy link
Contributor

rajann commented Aug 9, 2024

PR #457 did slightly modify the functionality of accessing the application advertisement list data, but the way 'Undefined' values are treated was not changed at all by this PR.

Before the changes in this PR were merged, if a module had any application with missing data in its EEPROM, the application lookup would terminate at that application, without proceeding to the next one (which might have had the complete data required for this function). With my changes applied, in such a scenario, applications with missing data would be skipped, allowing the application lookup process to proceed to the next application in order to obtain a complete list of applications that have full data in the module's EEPROM.

Therefore, I don't think this issue is related to my changes. However, if you'd like @prgeor @mihirpat1 @AnoopKamath, I can investigate further to find a proper solution. Could you please provide the PN of the module where you observed this issue?

    Vendor Name: CISCO           
    Vendor OUI: 00-06-f6
    Vendor PN: 68-103205-02   

@tshalvi
Copy link
Contributor

tshalvi commented Aug 12, 2024

We currently don't have this module in our lab. However, we can order it, but it might take some time.

I suspect the issue is because of this line


The value 'Undefined' is not handled here or in one of the other if statements that follow within the get_application_advertisement() function.

Seems like the 'Undefined' value should be treated like it is here:

if val in [None, 'Unknown', 'Undefined']:

Please try changing the code as suggested and let me know if it resolves the issue.

@rajann
Copy link
Contributor

rajann commented Aug 13, 2024

Thanks. Will try and get back.

@tshalvi
Copy link
Contributor

tshalvi commented Aug 20, 2024

@AnoopKamath @rajann ,
Could you please provide more details about this module, such as a datasheet or any information that could help us locate it? We need to order it, but our Lab team has been unable to find it. It seems like Google doesn't recognize 'CISCO 68-103205-02'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants