Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sonic-host-services changes for gNOI Warm Reboot #191

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rkavitha-hcl
Copy link

@rkavitha-hcl rkavitha-hcl commented Nov 29, 2024

Adding sonic-host-services changes for warm reboot .
Adding HALT method support for sonic-host-services

Copy link

linux-foundation-easycla bot commented Nov 29, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@kishanps
Copy link

kishanps commented Dec 6, 2024

@github76543 Joh, can you PTAL and signoff.

Copy link
Contributor

@hdwhdw hdwhdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please look at ff73070 and see if this is something you can reuse?

@kishanps
Copy link

Can you please look at ff73070 and see if this is something you can reuse?

Thanks @hdwhdw for the reference. The reboot dbus service also needs a request/response framework which is what this PR does and IIUC @vvolam went with the other one as a stop gap solution.

Adding @github76543 (John) for additional inputs.

@hdwhdw
Copy link
Contributor

hdwhdw commented Dec 27, 2024

@kishanps thanks for clarifying. If so consider renaming the service to something more general than gnoi_reboot. Maybe 'async_system'? Having one module for each gnoi service can clutter the dbus codebase.

Also does it make sense to add your api to systemd service and call it async reboot, alongside @vvolam API?

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kishanps
Copy link

@kishanps thanks for clarifying. If so consider renaming the service to something more general than gnoi_reboot. Maybe 'async_system'? Having one module for each gnoi service can clutter the dbus codebase.

Also does it make sense to add your api to systemd service and call it async reboot, alongside @vvolam API?

@hdwhdw @github76543 @rkavitha-hcl @jaanah-hcl

I discussed with John, Dawei Huang & @vvolam and we all agree that reboot will be a separate dbus service and hence rename gnoi_reboot to just reboot. And remove the commit id ff73070 alongwith this PR to avoid the duplication.

@mssonicbld
Copy link

/azp run

Copy link

Pull request contains merge conflicts.

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link

/azp run

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than these comments, LGTM. Thanks

"stderr: %s", MOD_NAME, stdout, stderr)
return

"""Wait for the reboot to complete. Here, we expect that SONiC Host Service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this check!!

This check is wrong for HALT method, in case of HALT, gnmi container will be alive and only pmon and syncd containers will be killed for now. Can we modify the logic accordingly to below logic?

Just wait for 30 or 60 secs timeout and if pmon container is not killed, HALT method is failed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvolam I suppose the reboot -p will keep gnmi and bring down all other containers. In that case, should we just ensure that gnmi is the only container up in case of HALT ? Maybe keep the timeout the same to take care of other use cases where more containers need to be brought down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kishanps Yes, we can do that because we always ensure gnmi container is up in case of HALT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkavitha-hcl could you fix this check for HALT case, as the gnmi container will still be running?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kavitha is out sick, @jaanah-hcl Can you pls take care of this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvolam One more question on reboot -p, does it keep framework container also up along with gnmi container ? Don't you need framework container for the reboot status after the HALT execution ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kishanps As I mentioned, it only kills syncd and pmon containers as of now and all remaining containers stay up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 parts here

  • What are the containers you want to kill ? For smartswitch, pmon & syncd may suffice but if its a regular switch you probably need to kill other running containers also. In which case, you may want to kill all containers except gnmi & framework (tied to the next part)
  • Do you need framework container after HALT to query reboot status ? I don't know the use-case of HALT, so you may be the better person to call on that. If you intend to make reboot status call, then you need framework container also to give you back the status.

Copy link
Contributor

@hdwhdw hdwhdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait until the approval of @vvolam as well.

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam
Copy link
Contributor

vvolam commented Feb 27, 2025

@kishanps @rkavitha-hcl Could you fix build failures as well?

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rkavitha-hcl
Copy link
Author

@kishanps @rkavitha-hcl Could you fix build failures as well?

Build is fixed and branch is rebased

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants