-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[warm-reboot] warm-reboot breaks if ssh session in which it was started drops #7127
Comments
Investigating the issue. |
Investigation result. @Hedgehog-Guru, could I ask you to clarify what does the next your comment means: This is degradation in comparison to the 201911 version? I have tested the issue on 201911 and the behavior is the same as on master branch. |
@maksymbelei95 thanks for the PR. I think the same issue will happen with the fastboot or any command that may take time and the ssh session can be closed in the middle. |
@liat-grozovik, ok, I will note this for myself. |
@maksymbelei95 , according to my analysis the issue does not exist under 201911 branch. |
@maksymbelei95 This might be more general issue. As I understand any SONiC command will be killed when SSH session drops. @Hedgehog-Guru Need an analisis of disruptive commands that could lead to switch failure in case SSH session that invokes the command drops (e.g does config reload leaves switch in bad shape if interupted?) |
#1529) Starting the script in background mode and detaching the process from terminal session to prevent failures, caused by closing or sudden disconnecting of terminal session. Redirecting output of the script to the file to prevent failures, when the script tries to write an output to file descriptor of the nonexistent terminal session. Adding new parameter to script to be able to explicitly run the script in foreground mode with output to the terminal. Updating the command reference doc according to the changes. - What I did Resolves sonic-net/sonic-buildimage#7127 Fixed failures of warm reboot, when the SSH session is being disconnected. As the script will now be executed in background mode by default, added parameter to explicitly run it as usual, in foreground mode. Updated command reference according to the changes. - How I did it By restarting the script in background mode with detaching it from the terminal session. All the output has redirected to file /var/log/warm-reboot.txt for warm-reboot case, or /var/log/fast-reboot.txt for fast-reboot, depends on REBOOT_TYPE. This will prevent crashes of the script in case, when it will try to write some data to the file descriptor of the disconnected terminal session. - How to verify it 1. Connect to the switch with SSH; 2. Execute sudo warm-reboot -v; 3. See the current progress of warm reboot with cat /var/log/warm-reboot.txt; 4. Close SSH connection before warm reboot finish; Warm reboot should finish successfully, in spite of status of the SSH session. - New command output (if the output of a command-line utility has changed) The script will be running in background detached mode with output to the file. The related log will be shown in terminal before restarting in background mode: admin@sonic:~$ sudo warm-reboot Detaching the process from the terminal session. Redirecting output to /var/log/warm-reboot.txt. All the usual logs will be written to warm-reboot.txt.
#1529) Starting the script in background mode and detaching the process from terminal session to prevent failures, caused by closing or sudden disconnecting of terminal session. Redirecting output of the script to the file to prevent failures, when the script tries to write an output to file descriptor of the nonexistent terminal session. Adding new parameter to script to be able to explicitly run the script in foreground mode with output to the terminal. Updating the command reference doc according to the changes. - What I did Resolves sonic-net/sonic-buildimage#7127 Fixed failures of warm reboot, when the SSH session is being disconnected. As the script will now be executed in background mode by default, added parameter to explicitly run it as usual, in foreground mode. Updated command reference according to the changes. - How I did it By restarting the script in background mode with detaching it from the terminal session. All the output has redirected to file /var/log/warm-reboot.txt for warm-reboot case, or /var/log/fast-reboot.txt for fast-reboot, depends on REBOOT_TYPE. This will prevent crashes of the script in case, when it will try to write some data to the file descriptor of the disconnected terminal session. - How to verify it 1. Connect to the switch with SSH; 2. Execute sudo warm-reboot -v; 3. See the current progress of warm reboot with cat /var/log/warm-reboot.txt; 4. Close SSH connection before warm reboot finish; Warm reboot should finish successfully, in spite of status of the SSH session. - New command output (if the output of a command-line utility has changed) The script will be running in background detached mode with output to the file. The related log will be shown in terminal before restarting in background mode: admin@sonic:~$ sudo warm-reboot Detaching the process from the terminal session. Redirecting output to /var/log/warm-reboot.txt. All the usual logs will be written to warm-reboot.txt.
…n (#1529) Starting the script in background mode and detaching the process from terminal session to prevent failures, caused by closing or sudden disconnecting of terminal session. Redirecting output of the script to the file to prevent failures, when the script tries to write an output to file descriptor of the nonexistent terminal session. Adding new parameter to script to be able to explicitly run the script in foreground mode with output to the terminal. Updating the command reference doc according to the changes. - What I did Resolves sonic-net/sonic-buildimage#7127 Fixed failures of warm reboot, when the SSH session is being disconnected. As the script will now be executed in background mode by default, added parameter to explicitly run it as usual, in foreground mode. Updated command reference according to the changes. - How I did it By restarting the script in background mode with detaching it from the terminal session. All the output has redirected to file /var/log/warm-reboot.txt for warm-reboot case, or /var/log/fast-reboot.txt for fast-reboot, depends on REBOOT_TYPE. This will prevent crashes of the script in case, when it will try to write some data to the file descriptor of the disconnected terminal session. - How to verify it 1. Connect to the switch with SSH; 2. Execute sudo warm-reboot -v; 3. See the current progress of warm reboot with cat /var/log/warm-reboot.txt; 4. Close SSH connection before warm reboot finish; Warm reboot should finish successfully, in spite of status of the SSH session. - New command output (if the output of a command-line utility has changed) The script will be running in background detached mode with output to the file. The related log will be shown in terminal before restarting in background mode: admin@sonic:~$ sudo warm-reboot Detaching the process from the terminal session. Redirecting output to /var/log/warm-reboot.txt. All the usual logs will be written to warm-reboot.txt.
Description
If ssh session in which warm-reboot was started drops, warm-reboot breaks on syncd stage, and the switch become inoperable.
This is degradation in comparison to the 201911 version.
Steps to reproduce the issue
Describe the results you received
After the ssh session disconnect in rcon sessoin we can observe that warm-reboot stops on synd shutdown stage and system became unavailable till reboot.
Describe the results you expected
Warm-reboot should be resistant to ssh disconnects and not stops even in case of ssh session disconnection(ssh client disruption by any reason(server hangs up for example), and not lead to system crash.
For a data center, it is dangerous because it is impossible to confidently update the system
Output of show version
This is degradation in comparison to the 201911 release
sonic_dump_r-qa-sw-eth-2322_20210322_135917.tar.gz
The text was updated successfully, but these errors were encountered: