Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot] Fix failures of warm reboot on disconnect of ssh session #1529

Merged
merged 3 commits into from
Nov 25, 2021

Conversation

ghost
Copy link

@ghost ghost commented Mar 26, 2021

  • Starting the script in background mode and detaching the process
    from terminal session to prevent failures, caused by closing or
    sudden disconnecting of terminal session.
  • Redirecting output of the script to the file to prevent failures,
    when the script tries to write an output to file descriptor of
    the nonexistent terminal session.
  • Adding new parameter to script to be able to explicitly run the
    script in foreground mode with output to the terminal.
  • Updating the command reference doc according to the changes.

Signed-off-by: Maksym Belei Maksym_Belei@jabil.com

What I did

Resolves sonic-net/sonic-buildimage#7127
Fixed failures of warm reboot, when the SSH session is being disconnected.
As the script will now be executed in background mode by default, added parameter to explicitly run it as usual, in foreground mode.
Updated command reference according to the changes.

How I did it

By restarting the script in background mode with detaching it from the terminal session.
All the output has redirected to file /var/log/warm-reboot.txt for warm-reboot case, or /var/log/fast-reboot.txt for fast-reboot, depends on REBOOT_TYPE. This will prevent crashes of the script in case, when it will try to write some data to the file descriptor of the disconnected terminal session.

How to verify it

  1. Connect to the switch with SSH;
  2. Execute sudo warm-reboot -v;
  3. See the current progress of warm reboot with cat /var/log/warm-reboot.txt;
  4. Close SSH connection before warm reboot finish;
    Warm reboot should finish successfully, in spite of status of the SSH session.

New command output (if the output of a command-line utility has changed)

The script will be running in background detached mode with output to the file. The related log will be shown in terminal before restarting in background mode:

admin@sonic:~$ sudo warm-reboot
Detaching the process from the terminal session. Redirecting output to /var/log/warm-reboot.txt.

All the usual logs will be written to warm-reboot.txt.

@ghost ghost force-pushed the fix_warmboot_ssh branch from 9439573 to 726a54c Compare March 26, 2021 14:21
@ghost ghost marked this pull request as ready for review March 26, 2021 14:23
@ghost
Copy link
Author

ghost commented Mar 30, 2021

retest this please

@liat-grozovik liat-grozovik requested a review from yxieca March 30, 2021 18:49
@liat-grozovik
Copy link
Collaborator

retest this please

@liat-grozovik
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

# re-run the script in background mode with detaching from the terminal session
if [[ x"${DETACH}" == x"yes" ]]; then
echo "Detaching the process from the terminal session. Redirecting output to ${LOG_PATH}."
$0 "$@" -d 1>$LOG_PATH 2>$1 & disown %%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2>$1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means redirect stderr to stdout. stdout in our case is directed to the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you wanted to do so, you can use 2>&1 or change the whole thing to

$0 "$@" -d &>$LOG_PATH & disown %%

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -21,6 +21,8 @@ PLATFORM_PLUGIN="${REBOOT_TYPE}_plugin"
LOG_SSD_HEALTH="/usr/local/bin/log_ssd_health"
SSD_FW_UPDATE="ssd-fw-upgrade"
TAG_LATEST=yes
DETACH=yes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -d should mean detach and default should be DETACH=no

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response. The idea of the changes is to prevent failures of warm-reboot/fast-reboot in cases when the ssh session is broken/closed. I made detached mode as default and leave possibility to run the scripts in non-detached mode with -d if the user wants. If to make as you propose, the user could forget to add -d parameter, what may lead to failure of the reboot. Due to this, I prefer to leave parameter -d as is. By the way, parameter -t made in similar way, so, I thought that there is no problem to let -d to run the process in non-detached mode, not wise versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksymbelei95 I don't think changing default behavior is the right thing. This blocking behavior has been used until now in tests and production. Changing the default behavior could have unexpected consequences. For special scenario, it should not be default.

@ghost ghost force-pushed the fix_warmboot_ssh branch from 726a54c to f44771e Compare April 12, 2021 06:37
@ghost ghost requested a review from yxieca April 12, 2021 09:55
@liat-grozovik
Copy link
Collaborator

@yxieca can you please further review and see if all comments were addressed and it can be merged?

@stepanblyschak
Copy link
Contributor

@maksymbelei95 @liat-grozovik @yxieca What is needed in order to have this fix merged?

@liat-grozovik
Copy link
Collaborator

liat-grozovik commented Sep 2, 2021

it LGTM. i would suggest to add unit test if possible. @yxieca ?

@stepanblyschak
Copy link
Contributor

@maksymbelei95 Could you please resolve the conflict?

* Starting the script in background mode and detaching the process
  from terminal session to prevent failures, caused by closing or
  sudden disconnecting of terminal session.
* Redirecting output of the script to the file to prevent failures,
  when the script tries to write an output to file descriptor of
  the nonexistant terminal session.
* Adding new parameter to script to be able to explicitly run the
  script in foreground mode with output to the terminal.
* Updating the command reference doc according to the changes.

Signed-off-by: Maksym Belei <Maksym_Belei@jabil.com>
@pshulik
Copy link

pshulik commented Oct 20, 2021

@yxieca, Could you please review the latest merge

@yxieca
Copy link
Contributor

yxieca commented Oct 22, 2021

@yxieca, Could you please review the latest merge

I still see detach becomes default behavior. I think the default behavior should be the original behavior: blocking.

Copy link
Contributor

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change default behavior.

@MaratGubaiev
Copy link
Contributor

Could anybody please rerun the tests?
@yxieca, @liat-grozovik

@liat-grozovik
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dprital
Copy link
Collaborator

dprital commented Nov 25, 2021

@yxieca I see that this PR was approved by you. Can you please merge it ?

@liat-grozovik liat-grozovik merged commit e63f47e into sonic-net:master Nov 25, 2021
abdosi pushed a commit that referenced this pull request Dec 8, 2021
#1529)

Starting the script in background mode and detaching the process from terminal session to prevent failures, caused by closing or sudden disconnecting of terminal session.
Redirecting output of the script to the file to prevent failures, when the script tries to write an output to file descriptor of
the nonexistent terminal session.
Adding new parameter to script to be able to explicitly run the script in foreground mode with output to the terminal.
Updating the command reference doc according to the changes.

- What I did
Resolves sonic-net/sonic-buildimage#7127
Fixed failures of warm reboot, when the SSH session is being disconnected.
As the script will now be executed in background mode by default, added parameter to explicitly run it as usual, in foreground mode.
Updated command reference according to the changes.

- How I did it
By restarting the script in background mode with detaching it from the terminal session.
All the output has redirected to file /var/log/warm-reboot.txt for warm-reboot case, or /var/log/fast-reboot.txt for fast-reboot, depends on REBOOT_TYPE. This will prevent crashes of the script in case, when it will try to write some data to the file descriptor of the disconnected terminal session.

- How to verify it
1. Connect to the switch with SSH;
2. Execute sudo warm-reboot -v;
3. See the current progress of warm reboot with cat /var/log/warm-reboot.txt;
4. Close SSH connection before warm reboot finish;
    Warm reboot should finish successfully, in spite of status of the SSH session.

- New command output (if the output of a command-line utility has changed)
The script will be running in background detached mode with output to the file. The related log will be shown in terminal before restarting in background mode:

admin@sonic:~$ sudo warm-reboot
Detaching the process from the terminal session. Redirecting output to /var/log/warm-reboot.txt.

All the usual logs will be written to warm-reboot.txt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[warm-reboot] warm-reboot breaks if ssh session in which it was started drops
6 participants