[Support]: Wifi camera causes "frigate.capture" process to be reaped by OOM-killer and is never restarted #10814

leccelecce · 2024-04-03T16:51:23Z

leccelecce
Apr 3, 2024

Describe the problem you are having

Occasionally, something seems to happen that causes some of all of my cameras to drop offline for a few seconds.
Mostly they all recover cleanly. However, one particular camera has a habit of not recovering. The frigate.capture process for that camera then looks to very rapidly grow in size to 5GB+, before being killed by the Linux OOM killer. As far as I can tell, Frigate never attempts to restart the capture process, so that camera is then permanently stuck unable to detect.

hikcam8 = default stream from camera for record
hikcam8_resized = stream reduced to 1080p @ fps for detect

It starts with an IO timeout streaming RTSP from the camera to go2rtc:

21:11:55 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:48994->192.168.30.98:554: i/o timeout"

5 seconds later the resized version of the stream throws an EOF
21:12:00 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error=EOF url="ffmpeg:hikcam8#video=h264#width=1920#height=1080#raw=-fpsmax 5#hardware=vaapi"

3 seconds after that the record process loses the connection.
21:12:03 ffmpeg.indoors_wifi2.record ERROR : rtsp://127.0.0.1:8554/hikcam8: Connection timed out

~ 30 seconds later, the frigate.capture process has ballooned in memory usage and been killed.
21:12:32 Memory cgroup out of memory: Killed process 2264357 (frigate.capture) total-vm:5283008kB, anon-rss:3304660kB, file-rss:8976kB, shmem-rss:60kB, UID:0 pgtables:7028kB oom_score_adj:0

Interested if a) the capture processes could be monitored and restarted if necessary b) there is any way to protect against a capture process consuming so much memory when the feed has presumably disappeared.

Version

0.13.2-6476F8A

Frigate config file

go2rtc:
  streams:
    hikcam8:
    - rtsp://********@hikcam8.local.lan:554/Streaming/Channels/101
    hikcam8_resized:
    - ffmpeg:hikcam8#video=h264#width=1920#height=1080#raw=-fpsmax 5#hardware=vaapi

Relevant log output

...
2024-03-30 21:12:03.931629636  [2024-03-30 21:12:03] ffmpeg.indoors_wifi2.record    ERROR   : [segment @ 0x555debf80600] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
2024-03-30 21:12:03.931748590  [2024-03-30 21:12:03] ffmpeg.indoors_wifi2.record    ERROR   : [segment @ 0x555debf80600] Non-monotonous DTS in output stream 0:0; previous: 0, current: 0; changing to 1. This may result in incorrect timestamps in the output file.
2024-03-30 21:12:03.931833076  [2024-03-30 21:12:03] ffmpeg.indoors_wifi2.record    ERROR   : [rtsp @ 0x555debf6b500] RTP: PT=60: bad cseq 33d4 expected=2289
2024-03-30 21:12:03.931909945  [2024-03-30 21:12:03] ffmpeg.indoors_wifi2.record    ERROR   : [rtsp @ 0x555debf6b500] Undefined type (31)
2024-03-30 21:12:03.931980863  [2024-03-30 21:12:03] ffmpeg.indoors_wifi2.record    ERROR   : [rtsp @ 0x555debf6b500] RTP: PT=60: bad cseq f9a3 expected=4b69
2024-03-30 21:12:03.932049878  [2024-03-30 21:12:03] ffmpeg.indoors_wifi2.record    ERROR   : rtsp://127.0.0.1:8554/hikcam8: Connection timed out
2024-03-30 21:12:03.932118270  [2024-03-30 21:12:03] watchdog.indoors_wifi2         INFO    : Terminating the existing ffmpeg process...
2024-03-30 21:12:03.932190434  [2024-03-30 21:12:03] watchdog.indoors_wifi2         INFO    : Waiting for ffmpeg to exit gracefully...
2024-03-30 21:12:13.948634645  [2024-03-30 21:12:13] watchdog.indoors_wifi2         INFO    : indoors_wifi2 exceeded fps limit. Exiting ffmpeg...
2024-03-30 21:12:13.948854872  [2024-03-30 21:12:13] watchdog.indoors_wifi2         INFO    : Waiting for ffmpeg to exit gracefully...
...



GO2RTC Log:

2024-03-30 21:10:55.223026432  21:10:55.222 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:58408->192.168.30.98:554: i/o timeout" url=rtsp://frigate:*******@hikcam8.local.lan:554/Streaming/Channels/101
2024-03-30 21:11:55.701453961  21:11:55.701 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:48994->192.168.30.98:554: i/o timeout" url=rtsp://frigate::*******@hikcam8.local.lan:554/Streaming/Channels/101
2024-03-30 21:12:00.855711192  21:12:00.855 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error=EOF url="ffmpeg:hikcam8#video=h264#width=1920#height=1080#raw=-fpsmax 5#hardware=vaapi"
2024-03-30 21:12:13.958810293  21:12:13.958 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:48906->192.168.30.77:8554: i/o timeout" url=rtsp://:*******@192.168.30.77:8554/profile0
2024-03-30 21:12:13.974419095  21:12:13.974 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:51010->192.168.30.98:554: i/o timeout" url=rtsp://:*******@hikcam8.local.lan:554/Streaming/Channels/101
2024-03-30 21:12:13.984674881  21:12:13.984 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:37594->192.168.30.95:554: i/o timeout" url=rtsp://:*******@hikcam5.local.lan:554/Streaming/Channels/101
2024-03-30 21:12:13.991298375  21:12:13.991 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:48604->192.168.30.92:554: i/o timeout" url=rtsp://********@hikcam2.local.lan:554/Streaming/Channels/101
2024-03-30 21:12:13.998425147  21:12:13.998 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error="read tcp 172.31.6.2:44472->192.168.30.97:554: i/o timeout" url=rtsp://:*******@hikcam7.local.lan:554/Streaming/Channels/101
2024-03-30 21:12:19.189121102  21:12:19.188 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error=EOF url="ffmpeg:hikcam8#video=h264#width=1920#height=1080#raw=-fpsmax 5#hardware=vaapi"
2024-03-30 21:12:19.212183981  21:12:19.212 WRN github.com/AlexxIT/go2rtc/internal/streams/producer.go:171 > error=EOF url="ffmpeg:hikcam7#video=h264#width=1920#height=1080#raw=-fpsmax 5#hardware=vaapi"

FFprobe output from your camera

FFPROBE OUTPUT
Stream 0:
Return Code: 0

Video:

Codec: H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
Resolution: 2688x1520
FPS: 20/1

Stream 1:
Return Code: 0

Video:

Codec: H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
Resolution: 1920x1080
FPS: 5/1

Frigate stats

No response

Operating system

Debian

Install method

Docker Compose

Coral version

USB

Network connection

Wired

Camera make and model

Hikvision DS-2CD2442FWD

Any other information that may be helpful

No response

NickM-27 · 2024-04-03T17:02:16Z

NickM-27
Apr 3, 2024
Collaborator Sponsor

I don't think it makes sense to have a watchdog process for the capture process which itself is just running a watchdog for the ffmpeg process.

Most likely what is happening here is the ffmpeg process itself is erorring out continuously but refused to end (we can see this as 2024-03-30 21:12:13.948854872 [2024-03-30 21:12:13] watchdog.indoors_wifi2 INFO : Waiting for ffmpeg to exit gracefully...). In that time where it is supposed to be ending but is not, something like the LogPipe (which has a max length) or some other part is filling up with memory as it is stuck waiting for an sp.Timeout that seems to never come.

0 replies

leccelecce · 2024-08-17T11:28:08Z

leccelecce
Aug 17, 2024
Author

Just to state I'm still experiencing this on 0.14.0. It's with one of my Wifi cameras which I think can occasionally drop connection or enough packets to confuse ffmpeg.

I may try around the source code a bit for anything obvious I can test in a fork. It's a bit of an annoying issue because the huge spike in memory (I had one go to 14GB yesterday) triggers all kinds of alerts on my server monitoring, as then it starts writing to swap etc plus it looks like the server generally starts slowing down on detections

0 replies

leccelecce · 2024-11-13T22:47:23Z

leccelecce
Nov 13, 2024
Author

Just to bump this back up. I actually had it occur today on an internal wired camera - I disconnect the network cable to swap it out for a longer one, so it was disconnected for all of about a minute. Shortly after, noticed alerts on my Frigate box because the OOM-killer had killed the frigate.capture process at 21GB of RAM use.

2024-11-13 18:39:50 - ethernet connection dropped

...
2024-11-13 18:40:03.801243118  [2024-11-13 18:40:03] ffmpeg.indoors_wifi2.record    ERROR   : [segment @ 0x556685369280] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
2024-11-13 18:40:03.801272301  [2024-11-13 18:40:03] ffmpeg.indoors_wifi2.record    ERROR   : [segment @ 0x556685369280] Non-monotonous DTS in output stream 0:0; previous: 0, current: 0; changing to 1. This may result in incorrect timestamps in the output file.
2024-11-13 18:40:03.801343870  [2024-11-13 18:40:03] ffmpeg.indoors_wifi2.record    ERROR   : rtsp://127.0.0.1:8554/hikcam8: Connection timed out
2024-11-13 18:40:03.801558341  [2024-11-13 18:40:03] watchdog.indoors_wifi2         INFO    : Terminating the existing ffmpeg process...
2024-11-13 18:40:03.801813402  [2024-11-13 18:40:03] watchdog.indoors_wifi2         INFO    : Waiting for ffmpeg to exit gracefully...
2024-11-13 18:40:08.722224978  [2024-11-13 18:40:08] frigate.output.output          WARNING : Failed to retrieve many frames for indoors_wifi2 from SHM, consider increasing SHM size if this continues.
2024-11-13 18:40:08.731904163  [2024-11-13 18:40:08] frigate.output.output          WARNING : Failed to retrieve many frames for indoors_wifi2 from SHM, consider increasing SHM size if this continues.
2024-11-13 18:40:08.742746863  [2024-11-13 18:40:08] frigate.output.output          WARNING : Failed to retrieve many frames for indoors_wifi2 from SHM, consider increasing SHM size if this continues.
2024-11-13 18:40:08.753117456  [2024-11-13 18:40:08] frigate.output.output          WARNING : Failed to retrieve many frames for indoors_wifi2 from SHM, consider increasing SHM size if this continues.
2024-11-13 18:40:08.763347692  [2024-11-13 18:40:08] frigate.output.output          WARNING : Failed to retrieve many frames for indoors_wifi2 from SHM, consider increasing SHM size if this continues.
2024-11-13 18:40:33.823851331  [2024-11-13 18:40:33] watchdog.indoors_wifi2         INFO    : indoors_wifi2 exceeded fps limit. Exiting ffmpeg...
2024-11-13 18:40:33.823931898  [2024-11-13 18:40:33] watchdog.indoors_wifi2         INFO    : Waiting for ffmpeg to exit gracefully...
2024-11-13 18:41:10.809623471  [2024-11-13 18:41:10] watchdog.indoors_wifi2         INFO    : FFmpeg did not exit. Force killing...
2024-11-13 18:41:10.814656752  [2024-11-13 18:41:10] frigate.video                  ERROR   : indoors_wifi2: Unable to read frames from ffmpeg process.
2024-11-13 18:41:10.824424299  [2024-11-13 18:41:10] frigate.video                  ERROR   : indoors_wifi2: ffmpeg process is not running. exiting capture thread...
...

[Wed Nov 13 18:41:10 2024] ffmpeg invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[Wed Nov 13 18:41:10 2024] Memory cgroup out of memory: Killed process 897186 (frigate.capture) total-vm:21474860kB, anon-rss:12947732kB, file-rss:7464kB, shmem-rss:0kB, UID:0 pgtables:26060kB oom_score_adj:0

I'm assuming this must be something specific to my setup as it doesn't seem to be reported much, if something as simple as a camera going offline could cause it...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Support]: Wifi camera causes "frigate.capture" process to be reaped by OOM-killer and is never restarted #10814

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

[Support]: Wifi camera causes "frigate.capture" process to be reaped by OOM-killer and is never restarted #10814

leccelecce Apr 3, 2024

Describe the problem you are having

Version

Frigate config file

Relevant log output

FFprobe output from your camera

Frigate stats

Operating system

Install method

Coral version

Network connection

Camera make and model

Any other information that may be helpful

Replies: 3 comments

NickM-27 Apr 3, 2024 Collaborator Sponsor

leccelecce Aug 17, 2024 Author

leccelecce Nov 13, 2024 Author

leccelecce
Apr 3, 2024

NickM-27
Apr 3, 2024
Collaborator Sponsor

leccelecce
Aug 17, 2024
Author

leccelecce
Nov 13, 2024
Author