-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When obs stops streaming, there is a certain probability that it will trigger srs to receive the sig=15 signal and exit. #1634
Comments
According to the above logs and operations, find a reproducible path where SRS will receive a SIGTERM exit immediately after the screenshot of FFMPEG exits and OBS is stopped.
In fact, the process PID of the exiting FFMPEG for the screenshot is 10, while the new FFMPEG process for restarting is 11. When the streaming is stopped at this point, the signal is sent to 11. It seems that after forking, SIGTERM is immediately sent, which may cause the parent process to receive the signal before the child process is fully started. To avoid this issue, it can be modified to wait for about 10ms after starting before sending the signal.
|
There is another improvement to be made. Before restarting the FFMPEG process, it can first check if it has already stopped. Currently, it waits for 3 seconds, starts the FFMPEG subprocess, and then checks if it has stopped, resulting in the need to immediately stop the process after starting it.
As shown in the log above, if you stop the stream, there will be no need to fork FFMPEG and send SIGTERM to stop it, avoiding such unnecessary hassle.
|
There is another improvement point, FFMPEG's output does not support time and date format, while HLS and DVR support this format.
From what I see in the configuration of this issue, when using FFMPEG to capture screenshots, it requires passing a timestamp.
Currently, due to the lack of variable replacement, there is still only one screenshot.
|
Hi, I want to ask, has this issue been resolved? Then I saw the reason you mentioned: "It seems that the SIGTERM signal is sent immediately after forking, and at this time the subprocess may not have fully started, so the parent process receives the signal. Changing it to wait for about 10ms after starting can avoid this problem. But I tried to reproduce it by forking and immediately sending SIGTERM to the child process, but I couldn't reproduce the phenomenon of the parent process receiving the signal. May I ask if you have any methods to reproduce it? Although the parent process and the child process will share the same signal handler function SrsServer::on_signal(int signo) after forking and before executing, when the child process calls this signal handler function and sets the variable signal_gracefully_quit = true, a copy-on-write operation will be performed. Then I printed the variable signal_gracefully_quit in both the parent and child processes, and in the parent process it is false, while in the child process it is true.
|
SRS3 has been resolved, you can see the detailed changes in the commit above. The main issue is that after starting the FFMPEG process in SRS, it is immediately stopped. Since the FFMPEG subprocess has not yet started, a SIGTERM signal is sent to it, causing the parent process (SRS process) to receive this signal.
|
Disturbing you again. From the logs, it can be seen that at the same moment on 2020-08-06 21:58:55.27, forking was successful, followed immediately by executing kill(303001, SIGTERM). This is the phenomenon you mentioned, the "must-have" path: "Based on the above logs and operations, find a must-have path, when the screenshot FFMPEG exits, immediately stop OBS, then SRS will receive SIGTERM to exit." This path causes SRS to fork first and then immediately kill the child process FFmpeg. But when forking first and then immediately killing the child process, the signal goes to the parent process. What is the underlying principle behind this? Why does it cause SRS to, after forking at 2020-08-06 21:58:55.270329, obtain the child process's pid=303001, and then send kill SIGTERM at 2020-08-06 21:58:55.270608, resulting in both the child process and the parent process receiving and handling this signal? The question I want to figure out is: Thank you once again for your valuable time. I hope to receive your reply.
|
Description'
Please ensure that the markdown structure is maintained.
On Alibaba Cloud ECS machine, deploying SRS in a Docker environment, using OBS for streaming, there is a certain probability that SRS will receive a sig=15 signal and exit when OBS stops streaming (manually or due to network reconnection).
On the morning of March 11, 2020, the experimental data was repeatedly tested and the results are as follows:
ossrs/srs:v3.0-b2
ossrs/srs:3
CentOS 7.5 public image
CE 19.03.7
There are a total of 10 logs, each of which is very similar. The following are six of them. The first four are the log contents after the second cycle when SRS exits, and the last two are the log contents after the first cycle when SRS exits.
Replay
How to replay bug?
Steps to replay the bug:
tail -f srs.log
.Expected behavior:
Expected behavior:
TRANS_BY_GPT3
The text was updated successfully, but these errors were encountered: