Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] align platform_reboot to use "hardware reboot" #3320

Merged
merged 3 commits into from
Aug 19, 2019

Conversation

mykolaf
Copy link
Collaborator

@mykolaf mykolaf commented Aug 9, 2019

Signed-off-by: Mykola Faryma mykolaf@mellanox.com

- What I did
Aligned Mellanox specific platform_reboot script to perform a power cycle instead of calling /sbin/reboot. Also added verbosity to the fw upgrade script call.

Motivation: switches were reported "stuck" after a reboot call, most vendors use "hardware reboot" in their platform_reboot scripts. We now go with the same approach of a more harsh reboot.

- How I did it

- How to verify it

  • manual reboot
  • continuous reboot test

- Description for the changelog

Mellanox platform_reboot to use power cycle

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
@mykolaf
Copy link
Collaborator Author

mykolaf commented Aug 12, 2019

retest this please

@@ -31,4 +31,6 @@ if [[ "${EXIT_CODE}" != "${EXIT_SUCCESS}" ]]; then
fi
fi

exec /sbin/reboot $@
echo 1 > /bsp/system/pwr_cycle
Copy link
Collaborator

@qiluo-msft qiluo-msft Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to fail? If you stop the BSP service, the file will disappear, and finally reboot will not work. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed to use the sysfs path itself.

@qiluo-msft
Copy link
Collaborator

        -f|--force)

This option is ignored in new code. I remember our discussion concluded that 'force/hardware-reboot' is the default option. When user want to 'normal/graceful-reboot', he/she should provide an option.


Refers to: device/mellanox/x86_64-mlnx_msn2700-r0/platform_reboot:13 in 9b655b1. [](commit_id = 9b655b1, deletion_comment = False)

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

@qiluo-msft qiluo-msft requested review from jleveque and yxieca August 12, 2019 22:49
Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>
@mykolaf
Copy link
Collaborator Author

mykolaf commented Aug 13, 2019

@qiluo-msft We decided to align with other vendors in using hw-reboot as the default option. Introducing graceful reboot option was put into backlog. The -f option only provides ability to overcome fw-upgrade failure.

@mykolaf
Copy link
Collaborator Author

mykolaf commented Aug 19, 2019

retest broadcom please

@jleveque jleveque merged commit 8d5e37d into sonic-net:201811 Aug 19, 2019
@keboliu
Copy link
Collaborator

keboliu commented Sep 11, 2019

@jleveque @mykolaf with this change can we know the real reboot reason? User issued a software reboot command, but actually it is triggered by a power cycle behind the scenes, and the "show reboot-cause" will tell the user that the reboot cause is not a "software reboot" but a "power cycle", seems quite confusing, would like to hear from you.

@jleveque
Copy link
Contributor

@keboliu: This change definitely has the potential to provide the incorrect reboot reason. There needs to be accounting performed on the platform side to prevent this (e.g., if there is a way to check whether the software triggered the hardware reset via some register).

@keboliu
Copy link
Collaborator

keboliu commented Sep 12, 2019

@jleveque @stephenxs I think we may need to change current logic, maybe combine check software reboot cause and hardware reboot cause? in this case, we do have a hardware reboot cause - power cycle, with current logic the software reboot cause check will be skipped.

@stephenxs
Copy link
Collaborator

@jleveque @stephenxs I think we may need to change current logic, maybe combine check software reboot cause and hardware reboot cause? in this case, we do have a hardware reboot cause - power cycle, with current logic the software reboot cause check will be skipped.

If most of the vendors implement reboot by using powercycle, is it better to update the logic of process-reboot-cause to checking /host/reboot-cause/reboot-cause.txt first and then checking the platform-dependent hardware reboot cause?

@mykolaf
Copy link
Collaborator Author

mykolaf commented Sep 17, 2019

@qiluo-msft

-f|--force)
This option is ignored in new code.

I don't understand, how this option is ignored in new code?
All the logic related to it is left in place, see line L#29 of changed file.

@mykolaf mykolaf deleted the platform_reboot_201811 branch February 18, 2020 13:14
mssonicbld added a commit that referenced this pull request Nov 15, 2024
…lly (#20748)

#### Why I did it
src/sonic-swss
```
* f650a3bd - (HEAD -> master, origin/master, origin/HEAD) [ACL] Add support to match on Tunnel Termination (#3320) (2 days ago) [Vivek]
* 7db69c1b - VOQ: Set the ECMP group size to 128. (#3351) (3 days ago) [Deepak Singhal]
* 956ebd60 - Handler Port oper down error status notification (#3350) (6 days ago) [Prince George]
```
#### How I did it
#### How to verify it
#### Description for the changelog
aidan-gallagher pushed a commit to aidan-gallagher/sonic-buildimage that referenced this pull request Nov 16, 2024
…lly (sonic-net#20748)

#### Why I did it
src/sonic-swss
```
* f650a3bd - (HEAD -> master, origin/master, origin/HEAD) [ACL] Add support to match on Tunnel Termination (sonic-net#3320) (2 days ago) [Vivek]
* 7db69c1b - VOQ: Set the ECMP group size to 128. (sonic-net#3351) (3 days ago) [Deepak Singhal]
* 956ebd60 - Handler Port oper down error status notification (sonic-net#3350) (6 days ago) [Prince George]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants