Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for the procedures for insertion/hot swap of Switch Fabric Module(SFM) by using "config chassis modules shutdown/startup" commands #18578

Closed
wants to merge 0 commits into from

Conversation

JunhongMao
Copy link
Contributor

@JunhongMao JunhongMao commented Apr 5, 2024

Why I did it

For the Nokia SONiC chassis procedures for insertion/hot swap of Switch Fabric Module(SFM),
the previous solution was using the below commands.

sudo nokia_cmd set shutdown-sfm <SFM-Num/Physical-Slot>

This PR along with the below PR intend to add the below commands for the equivalent operations.
nokia/sonic-platform#6

sudo config chassis modules shutdown/startup <module name>
Work item tracking
  • Microsoft ADO (number only):

How I did it

  1. Add chassis_module_config.py and its service. The service starts up automatically. The example is below.
sudo systemctl status chassis-module.service
● chassis-module.service - Chassis module up & down operation
     Loaded: loaded (/lib/systemd/system/chassis-module.service; enabled-runtime; vendor preset: enabled)
     Active: active (running) since Fri 2024-04-05 19:57:25 UTC; 1h 5min ago
   Main PID: 8856 (python3)
      Tasks: 1 (limit: 38314)
     Memory: 16.2M
     CGroup: /system.slice/chassis-module.service
             └─8856 /usr/bin/python3 /usr/local/bin/chassis_module_config.py

Apr 05 19:57:25 ixre-cpm-chassis15 systemd[1]: Started Chassis module up & down operation.
  1. When the cli command "sudo config chassis modules startup/shutdown" runs, calls chassis_module_set_admin_state.py to do the related operations.

How to verify it

The below test was carried out on FABRIC-CARD3 module on the supervisor card.
1. Shutdown
sudo config chassis modules shutdown FABRIC-CARD3

2. Check the status to see if the FABRIC-CARD3 was down.
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3             Unavailable                4          Empty            down          N/A

 
3. Start up the module
sudo config chassis modules startup FABRIC-CARD3

4. Check the status
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3                    SFM4                4         Online              up  01214400362

5. To test if the operation is still valid when the system reboot. For example, first shut down, 
then after saving config and reboot, the module should keep shutdown status. 
$ sudo config save
Existing files will be overwritten, continue? [y/N]: y

Then check the status to see if the FABRIC-CARD3 was down.
$ show chassis modules status
        Name             Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  ----------------------  ---------------  -------------  --------------  -----------
...
FABRIC-CARD3             Unavailable                4          Empty            down          N/A


Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • 202205

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

[Unit]
Description=Chassis module up & down operation
ConditionPathExists=/etc/sonic/chassisdb.conf
Requires=database.service updategraph.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updategraph.service is a 202205 service which does not exist in 202405 (master). This is for 202205 cherry-pick purpose. We need another PR to change this in master after this PR fix has been picked up by 202205. Any better strategy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlok-nokia , can you comment on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longe applicable

@mlok-nokia
Copy link
Contributor

@arlakshm @judyjoseph For SFM module shutdown/startup process, we need to create a chassis_module_config.service which calls the chassis_module_config.py to subscribes and listen to the CHASSIS_MODULE table in CONFIG_DB. But the service requires/after the updategraph.service (in 202205). But updategraph.service file has been replaced by config-setup.service in Master branch. Now, we created a PR in Master and with the Check Mark applicable to 202205. Should we still use updategraph.service in the PR and fix it after the 202205 cherry-pick?

@judyjoseph
Copy link
Contributor

@JunhongMao I understand that with this PR and nokia/sonic-platform#6, trying to have the shut/start of SFM + swss/syncd processes in the nokia platform submodule.

Can we make this a bit more generic, like when user issue "sudo config chassis modules shutdown FABRIC-CARD3", we can have the implementation in sonic-utilities to start/stop swss/syncd systemd service + call nokia platform API to power up/down the corresponding card ?

In this way this command will have a sonic common implementation with a platform hook to really power up/down SFM.

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Apr 9, 2024

@JunhongMao I understand that with this PR and nokia/sonic-platform#6, trying to have the shut/start of SFM + swss/syncd processes in the nokia platform submodule.

Can we make this a bit more generic, like when user issue "sudo config chassis modules shutdown FABRIC-CARD3", we can have the implementation in sonic-utilities to start/stop swss/syncd systemd service + call nokia platform API to power up/down the corresponding card ?

In this way this command will have a sonic common implementation with a platform hook to really power up/down SFM.

Hi Judy, The following reasons is why we need to define a service file to subscribe the "CHASSIS_MODULE" tables to shutdown/startup a SFM and related swss/syncd services is that - when users shutdown a SFM and save the config file, then reboot the chassis. When chassis is booting and loading config, we need to keep the SFM and swss/syncd in the down state based on the configuration.
Second, number of swss/syncd is associated with a particular SFM module could be different in different Vendor. That is why we were thinking let the Vendor API to shutdown/startup related swss/syncd and SFM card is more flexible and straight forward.

Should we have a call to talk about this? Thanks.

@@ -0,0 +1,80 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we another service to handle CLI command?
We can call the platform script from the CLI handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this solution, chassis_module_config.py needs to subscribe the "CHASSIS_MODULE" tables to shutdown/startup a SFM and related swss/syncd services other than directly calls the platform script from the CLI handler. This solution has some considerations, please refer to #18578 (comment). Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we another service to handle CLI command? We can call the platform script from the CLI handler?

@arlakshm For the case -- If user shutdowns the SFM and save the configuration file. And reboot the whole chassis, do we want to keep the SFM down after the system is up? If we don't need to handle this case, we can directly call the platform script from the CLI handler.

@arlakshm
Copy link
Contributor

@JunhongMao and @mlok-nokia, as discussed offline with update the PR will latest proposal.

@JunhongMao
Copy link
Contributor Author

JunhongMao commented Apr 23, 2024

This PR
#18578

has been replaced by the below new PRS:
nokia/sonic-platform#6
sonic-net/sonic-utilities#3283
sonic-net/sonic-platform-daemons#475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants