-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SmartSwitch] Add a new API for the DPU chassis to query dataplane and midplane states #507
[SmartSwitch] Add a new API for the DPU chassis to query dataplane and midplane states #507
Conversation
…ies in setup.py (sonic-net#106) Remove dependence on the 'enum' package, as we are currently transitioning from Python 2 to Python 3 and there are installation conflict issues between the `enum` package and the `enum34` package. Add 'sonic-py-common' as dependencies in setup.py for xcvrd, also add spaces around "equals" signs.
…onfig (sonic-net#108) Add check to make sure that the initializeGlobalConfig is invoked only in multi-asic platforms. Additionaly remove the initializeGlobalConfig() call in the DomUpdate thread and SFPUpdate process. This is because initializeGlobalConfig() is already invoked and initialized in the parent Xcvrd daemon which is available to the child thread/process.
…d on Python version (sonic-net#107) Add dependence on 'enum' package back to xcvrd (basically reverting most of sonic-net/sonic-platform-daemons#106). However, in setup.py, we only install the enum34 package if the version of Python we are installing for is < 3.4. Thus, when installing the Python 3 xcvrd package in Python 2.7, the Python 2 version of enum34 will be installed. However, if installing the Python 3 xcvrd package on Python 3.7, enum34 will not be installed, causing xcrvd to import the 'enum' module from the standard library. This should prevent any conflicts which arise when 'enum34' is ever installed on Python versions >= 3.4 by preventing this situation.
…d status updates with xcvrd. (sonic-net#105) * [xcvrd] support for integrating y cable within xcvrd This PR provides the necessary infrastructure to initialize the Y cable Ports inside SONIC with xcvrd as the platform daemon. Particularly there are two parts of integration: While xcvrd initializes , there is within config_db for Y cable presence. This is done by checking the key-value pairs for presence of mux_cable identifier as a key. Once a Y cable is found to be attached to a port, State DB is updated with the corresponding data for the Y cable Port. Once the init process is done, and a Y cable presence is established, A thread is run to periodically monitor changes to APPL DB MUX_CABLE_COMMAND table for updates, and also one that periodically checks for a change events, If an update is found, the corresponding changes are done on MUX using sonic_y_cable package and corresponding changes are updated in STATE_DB What is the motivation for this PR? To add the necessary infrastructure for Credo Y cable integration within SONIC How did you do it? Added the necessary changes and a new xcvrd_utilities sub directory for utilities of y_cable code. Reorganized the setup.py and sonix-xcvrd code to this form sonic-xcvrd/setup.py sonic-xcvrd/src/init.py sonic-xcvrd/scripts/xcvrd → sonic-xcvrd/src/xcvrd.py sonic-xcvrd/src/xcvrd_utilities/init.py sonic-xcvrd/src/xcvrd_utilities/y_cable_helper.py Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Introducing chassisd to monitor status of cards on a modular chassis HLD: sonic-net/SONiC#646 **-What I did** Introducing a new process to monitor status of control, line and fabric cards. **-How I did it** Support of monitoring of line-cards and fabric-cards. This runs in the main thread periodically. It updates the STATE_DB with the status information. 'show platform chassis-modules' will read from the STATE_DB Support of handling configuration of moving the cards to administratively up/down state. The handling happens as part of a separate thread that waits on select() for config event from a CHASSIS_MODULE table in CONFIG_DB.
PSUd changes to computer power-budget for Modular chassis HLD: sonic-net/SONiC#646 PSUd will introduce power requirements calculations. Platform APIs are introduced to provide consumers and total consumed power. Number of PSUs will help provide total supplied power **Output of STATE-DB:** ``` "CHASSIS_INFO|chassis_power_budget 1": { "expireat": 1603182970.639244, "ttl": -0.001, "type": "hash", "value": { "SUPERVISOR consumed_power": "80.0", "FABRIC-CARD consumed_power": "185.0", "FAN consumed_power": "999", "LINE-CARD consumed_power": "1000.0", "PSU supplied_power": "9000.0" } }, ```
Enhance thermalctld to write to chassis state-DB on a modular chassis HLD: sonic-net/SONiC#646 In a modular chassis, the thermal information from all line-cards will be updated to the chassis state-DB in the control-card. Additionally, minimum and maximum temperatures will be recorded. The fan control algorithm used by certain vendors will require this information.
Added changes in the sonic_xcvrd directory of sonic-platform-daemons, changed src dir to xcvrd dir for package generation and changed the setup.py to include the package xcvrd Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
…riable (sonic-net#112) Previously, chassisd and thermalctld assumed that the swsscommon library would not be installed in the unit testing environment. This is not a valid assumption, and would cause unit tests to fail if swsscommon was available in the unit test environement, because it would get imported, but there would be no Redis DB to communicate with. This PR uses environment variables, which are set by the unit tests themselves, to determine whether to load the real or mock libraries. This solution is similar to what is done in sonic-utilities.
…tup_function() (sonic-net#114) Since these tests are run via unittest infrastructure, and not via Pytest, `setup_function()` is not the proper location to set these variables.
…onic-net#117) Previously, psud assumed that the swsscommon library would not be installed in the unit testing environment. This is not a valid assumption, and would cause unit tests to fail if swsscommon was available in the unit test environment, because it would get imported, but there would be no Redis DB to communicate with. This PR uses environment variables, which are set by the unit tests themselves, to determine whether to load the real or mock libraries. This solution is similar to what is done in sonic-utilities.
…variable (sonic-net#120) When `PSUD_UNIT_TESTING` and `THERMALCTLD_UNIT_TESTING` variables don`t set we have the next problems: ``` psud Traceback (most recent call last): psud File "/usr/local/bin/psud", line 21, in <module> psud if os.environ["PSUD_UNIT_TESTING"] == "1": psud File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__ psud raise KeyError(key) psud KeyError: 'PSUD_UNIT_TESTING' ``` ``` thermalctld Traceback (most recent call last): thermalctld File "/usr/local/bin/thermalctld", line 19, in <module> thermalctld if os.environ["THERMALCTLD_UNIT_TESTING"] == "1": thermalctld File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__ thermalctld raise KeyError(key) thermalctld KeyError: 'THERMALCTLD_UNIT_TESTING' ``` Also fixed the same issue in `chassisd`. Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
…or physical entity mib (sonic-net#102) * Update pmon daemons for SONiC Physical Entity MIB feature
Fixes the following crash introduced by sonic-net/sonic-platform-daemons#102: ``` 01:33:00 ______________________ test_updater_thermal_check_min_max ______________________ 01:33:00 01:33:00 def test_updater_thermal_check_min_max(): 01:33:00 chassis = MockChassis() 01:33:00 01:33:00 thermal = MockThermal() 01:33:00 chassis.get_all_thermals().append(thermal) 01:33:00 01:33:00 chassis.set_modular_chassis(True) 01:33:00 chassis.set_my_slot(1) 01:33:00 temperature_updater = TemperatureUpdater(SYSLOG_IDENTIFIER, chassis) 01:33:00 01:33:00 temperature_updater.update() 01:33:00 slot_dict = temperature_updater.chassis_table.get('Thermal 1') 01:33:00 > assert slot_dict['minimum_temperature'] == str(thermal.get_minimum_recorded()) 01:33:00 E TypeError: 'NoneType' object has no attribute '__getitem__' 01:33:00 01:33:00 tests/test_thermalctld.py:341: TypeError ``` Signed-off-by: Petro Bratash <petrox.bratash@intel.com> Signed-off-by: Petro Bratash <petrox.bratash@intel.com>
Without this change, leds were only set when an event happened. Given that power supplies are assumed present by default, leds would never be set to `green`. Instead they would have been left in the state the platform initialization left them (e.g `off`)
…alizeGlobalConfig (sonic-net#130) The check for multiAsic before calling initializeGlobalConfig was done in xcvrd earlier. Adding now to the other processes in sonic-platform-daemons as well.
…r conditions/events (sonic-net#129) * [xcvrd] Fix y_cable state update to unknown on erroraneous events This PR provides the support for replacing the state DB updates from 'failure' to 'unknown' in case there is an error event in the functioning of Y cable What is the motivation for this PR? the schema agreed upon with linkmgr and orchagent interaction with xcvrd, is that if there is an error event xcvrd need to fill the state DB with 'unknown' as the state value rather than 'failure', this PR handles that How did you do it? identified error scenario's in the code and made the changes Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
…on (sonic-net#131) Summary: This PR provides replaces the logic to check mux_direction on the y_cable by checking the mux_direction register instead of actively linked and routing TOR register Approach added the changes in y_cable_helper.py by replacing the API What is the motivation for this PR? check_mux_direction is required as per design to replace the active_linked_tor_side active_linked_tor_side -> check_mux_direction check_mux_direction will be utlized as for establishing mux direction explicitly Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Updating for completeness on how mock objects need to be imported ``` mprabhu@565bc0455e84:/sonic/src/sonic-platform-daemons/sonic-psud$ python2 setup.py test running pytest running egg_info writing sonic_psud.egg-info/PKG-INFO writing top-level names to sonic_psud.egg-info/top_level.txt writing dependency_links to sonic_psud.egg-info/dependency_links.txt reading manifest file 'sonic_psud.egg-info/SOURCES.txt' writing manifest file 'sonic_psud.egg-info/SOURCES.txt' running build_ext ==================================================================================== test session starts ===================================================================================== platform linux2 -- Python 2.7.16, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 rootdir: /sonic/src/sonic-platform-daemons/sonic-psud, inifile: pytest.ini plugins: cov-2.6.0 collected 3 items tests/test_psud.py ... [100%] ---------- coverage: platform linux2, python 2.7.16-final-0 ---------- Name Stmts Miss Cover ---------------------------------- scripts/psud 355 216 39% Coverage HTML written to dir htmlcov Coverage XML written to file coverage.xml ================================================================================== 3 passed in 0.16 seconds ================================================================================== ```
…t#132) python2 is end of life and SONiC is going to support python3. This PR is to change code in xcvrd, psud, thermalctld and syseeprom to make it compatible with both python3 and python2.
Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace. Done using `autopep8 --in-place --max-line-length 120` and some manual tweaks.
…obe in mux cable driver (sonic-net#134) Summary: This PR provides removes the delete logic on command probe message received from linkmgr after processing the message What is the motivation for this PR? the delete message tends to create an error scenario if many probe messages come and redis-api fails to retrieve the message contents Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
…net#133) Summary: This PR provides the necessary infrastructure to add pytest support and integration in sonic-xcvrd submodule. This PR also adds unit tests for xcvrd daemon. What is the motivation for this PR? To add the pytest unittest support in sonic-platform-daemon, sonix-xcvrd daemon as well as add some unit tests Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Enhance chassisd to monitor midplane status of the cards in modular chassis HLD: sonic-net/SONiC#646 -What I did Add monitoring of the midplane or internal ethernet network between supervisor and line-card modules. -How I did it Along with status monitoring, also monitor the midplane reachability between supervisor and modules. It updates the STATE_DB with the status information. 'show chassis-modules midplane-status' will read from the STATE_DB
Why I did this? xcvrd unit test failed when building it with python3: ``` 17:23:50 _____________________ ERROR collecting tests/test_xcvrd.py _____________________ 17:23:50 tests/test_xcvrd.py:36: in <module> 17:23:50 class TestXcvrdScript(object): 17:23:50 tests/test_xcvrd.py:41: in TestXcvrdScript 17:23:50 @patch('xcvrd.xcvrd.logical_port_name_to_physical_port_list', MagicMock(return_value=[0])) 17:23:50 E NameError: name 'patch' is not defined ``` How I did this? import the package patch
…-net#137) - Initialize self.presence and other variables in PsuStatus dunder init to False instead of True. - Import datetime module. - Discussions related to this issue can be seen in sonic-net/sonic-platform-daemons#136
- Add 100% unit test coverage of `PsuStatus` class in psud. - Add skeleton of class to test `DaemonPsud` class - Add test case for `get_psu_key()` and `try_get()` helper functions - Add checks to import 'mock' from the 'unittest' package if running with Python 3 Overall psud unit test coverage increases from 39% to 51%. Previous unit test coverage: ``` ----------- coverage: platform linux, python 3.7.3-final-0 ----------- Name Stmts Miss Cover ---------------------------------- scripts/psud 381 233 39% Coverage HTML written to dir htmlcov Coverage XML written to file coverage.xml ``` Unit test coverage with this patch: ``` ----------- coverage: platform linux, python 3.7.3-final-0 ----------- Name Stmts Miss Cover ---------------------------------- scripts/psud 381 185 51% Coverage HTML written to dir htmlcov Coverage XML written to file coverage.xml ```
Report Pytest unit test coverage for thermalctld. Current coverage: ``` ----------- coverage: platform linux, python 3.7.3-final-0 ----------- Name Stmts Miss Cover ----------------------------------------- scripts/thermalctld 424 113 73% Coverage HTML written to dir htmlcov Coverage XML written to file coverage.xml ``` - Also add check to import 'mock' from the 'unittest' package if running with Python 3
- Refactor ledd: - Remove useless try/catch from around imports - Move argument parsing out of `DaemonLedd.run()` method and into `main()` function, a more appropriate location - Fix LGTM alert for unreachable code - Add unit tests and report coverage: - Test passing good and bad command-line arguments to ledd process Unit test coverage with this patch: ``` ----------- coverage: platform linux, python 3.7.3-final-0 ----------- Name Stmts Miss Cover ---------------------------------- scripts/ledd 66 34 48% Coverage HTML written to dir htmlcov Coverage XML written to file coverage.xml ```
…onic-net#497) * [CMIS] Skip re-init flow for SW-controlled ports in case of fastboot Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com> * Change the log message Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> --------- Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com> Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> Co-authored-by: vadymhlushko-mlnx <vadymh@nvidia.com>
…ernet application (sonic-net#501)
…media_settings.json (sonic-net#471) * [xcvrd] Modify to support regular expression when parsing the key in media_settings.json * fix unit test error * add unit test for getting media settings value with regular expression * define get_media_settings() * apply the suggestion for if condition
…able (sonic-net#511) * Initialize application specific fields as 'N/A' in TRANSCEIVER_INFO table Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Changed a debug log to warning * Modified log_error to log_warning * Added comment for updating DB after xcvrd restart --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com>
…ng swsscommon table within the context (sonic-net#509) * [ycabled][active-active] Fix in gRPC channel callback logic by creating swsscommon table within the context Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * fix UT Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add more tests Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * typo Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add port Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add logging Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> * add tests Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com> --------- Signed-off-by: Vaibhav Dahiya <vdahiya@microsoft.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
…#521) Signed-off-by: Mihir Patel <patelmi@microsoft.com>
…ne log messages with physical slot number (#530) * [chassis][pmon][chassid] Enhance the chassid module on-line or off-line with physical slot num --------- Signed-off-by: mlok <marty.lok@nokia.com>
…(#529) * [PMON][psud] Fix the repeated NOTICE log message on Chassis platform Signed-off-by: mlok <marty.lok@nokia.com> * Fix the Unit test --------- Signed-off-by: mlok <marty.lok@nokia.com>
* [xcvrd] Add logs to improve debugging in xcvrd Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Fixed unit-test failure * Improved code coverage * Changed warning to notice --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com>
…s (#533) * Enhance media_settings_parser for 100G xcvr and DPB etc * Revert space change * Cover corner cases * Change log message level * Fix docstring and update name of get_speed_lane_count_and_subport * Address comment * Change to re.fullmatch for lane_speed key
…ng custom NPU SI settings (#541) * Xcvrd crash and restart should not cause link flap on platforms needing custom SI settings Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Improved code coverage --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com>
sonic_platform_base/chassis_base.py
Outdated
@@ -280,26 +280,64 @@ def get_module_index(self, module_name): | |||
# SmartSwitch methods | |||
############################################## | |||
|
|||
def get_dpu_id(self, name): | |||
def get_dpu_id(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv since this is a base class can we make use get_module_index()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_module_index
has a different meaning. It returns An integer, the index of the ModuleBase object in the module_list
. get_dpu_id
returns the physical ID of the DPU (its position)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv can you rebase, looks like this API already exist https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_platform_base/chassis_base.py#L283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the get_dpu_id
already exists. I changed the parameter from name
to **kwargs
. This API will have a different meaning for switch and DPU. Please check the description
sonic_platform_base/chassis_base.py
Outdated
""" | ||
return False | ||
|
||
def get_dpu_dataplane_state(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv do we need to have dpu specified in function name or keep it generic as get_dataplane_state()
since this is an abstract base class
sonic_platform_base/chassis_base.py
Outdated
""" | ||
raise NotImplementedError | ||
|
||
def get_dpu_controlplane_state(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv do we need to have dpu specified in function name or keep it generic as get_controlplane_state()
since this is an abstract base class
94a1b5e
to
70b3de5
Compare
@rameshraghupathy can you review |
sonic_platform_base/chassis_base.py
Outdated
""" | ||
return False | ||
|
||
def get_dataplane_state(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_state_info(self): API already covers this. Please refer to the HLD. This API appears to be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy we discussed this last week. This is needed for the DPU chassisd to populate the data plane and control plane states from the DPU to the CHASSIS_STATE_DB. This API is defined on the chassis level. What you are referring to is the module-level API that will run on the NPU side and has a completely different meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv Got it
sonic_platform_base/chassis_base.py
Outdated
""" | ||
raise NotImplementedError | ||
|
||
def get_controlplane_state(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_state_info(self): API already covers this. Please refer to the HLD. This API appears to be redundant.
70b3de5
to
dbdee66
Compare
|
Description
Add a definition for a new DPU chassis API required for querying DPU dataplane and midplane states.
Motivation and Context
A new API is required to enable querying of the DPU dataplane and midplane states. These states will be monitored by the chassisd service running on the DPU and pushed to the CHASSIS_STATE_DB upon any changes. This will allow the NPU to subscribe to DPU state changes.
How Has This Been Tested?
The API tests will be added in scope of chassisd changes.
Additional Information (Optional)