-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repeated failures with Reolink cameras #144
Comments
Errors like this one?
That's not actually a panic, despite it using the word in the middle. I think this log message is complaining about the thing I mentioned in this Apr 29th comment. I think at the Retina layer I can add a feature that just ignores these (except maybe logging a packet/byte count every minute or something). |
Yes. I apologize, here's what I faced. I have web UI rows which are showing intermittent files for a camera, sometime several hours in succession are sequentially recorded, then there is a gap. I looked at your troubleshooting readme and when I watched the logs go scrolling by, I immediately focused on creating a log file for the first 2 minutes. After doing that, I searched in it for the word "panic" and when the count returned with over 2,000 found, I just assumed something is horribly wrong and I'll let you look. I had forgotten at the problem you described that Reolink cameras have. What I'll do now is generate a new log file and try to whittle down the entries that has documented gaps in recordings. I'll try to craft a Perl regular expression to cull the entries that have to do with sleep so we can see if there is also something else. The backtrace on each error makes filtering the log a challenge. (I'm hoping I can halt the back tracing for an initial run by setting to "0" or removing it, but I suspect in this Docker instance, it's wired to be active.) I'll see if I can assess what a log run has statistically: are they all sleeps?, and then report back here. Thank you. |
If you feel like playing with a new tool, this sort of analysis is what lnav is good for. |
1232 "Opening output" entries for 3 cameras out of six in 24 minutes. I ran moonfire-nvr for 24 minutes (no RUST_BACKTRACE). I filtered the log to show only those lines with "Opening output".
I can provide the log with the additional entry after an "Opening input" that has what was received; zipped up that log is 4.8 MBs. Question: what can I provide you to give you the ability to assess the problem more thoroughly. I can redo a run, use RUST_BACKTRACE, or whatever modifications to the run time environment in docker you want. The "Garage" may exhibit slower network response time given that they are connected via a Ubiquiti aerial connection. But I do not recall in earlier version that this issue manifested itself to such a degree as it does now. The PeckAlley are on a direct connection. I'm going to follow-up on your suggestion for replacing the SD HC with a USB driven drive, I want to read your posted recommendation more closely. I'm also seeing my 5-disk array is no longer available, the company seems to have disappeared, and there are some bad review, so the USB circuitry for that array may be a contributing factor to problems. For people reading this, the lesson learned is that video is a very high demand on your hardware so cutting corners and pushing the envelope may end up costing more in terms of everyone's time than dollars saved. Note: I did fine the PeckAlleyEast running at 100% CPU Loading using the Reolink admin interface, so I rebooted it. |
I think your earlier log has everything I need for now. All of the errors in there look consistent with something that scottlamb/retina#17 should improve significantly. You'll likely still have occasional disconnects—that seems inevitable with a flaky wifi connection—but I think it's going into a spiral after a disconnect now where after that change it will recover properly. I also (finally) ordered my own Reolink camera so I can keep experimenting with it. It should arrive tomorrow.
The difference is probably the RTSP library change then. You can go back to the old behavior by adding
That's too bad. :-( 2021 is a tough year for ordering hardware too. I think the ODROID-H2+ is an excellent compromise between reliability and performance vs price, power, and space but it's just not available now. When the chip situation gets back to normal, there will likely be some other nice SBCs coming out too, but not much new is being released now and stocks are low / prices are up for stuff that already was released. Some of these problems can be worked around. Eg, I put in the docs several places a recommendation to disable UAS with USB SATA adapters because of problems with my own hardware. Maybe that would help with yours too. |
Okay, I believe if you build from git HEAD (and undo the |
I'll try this evening. I'm on a new image on Raspberry. I successfully built moonfire-nvr, but ran into an error in the npm ui, but it may be a harmless error. Haven't fired up moonfire-nvr in the new environment, yet, I still have to configure and mount my purple drive. I did not try reverting to the ffmpeg in the former environment (on the SD HC card). |
While I flopped around trying to get the web UI working, I was starting and stopping nvr many times. I noticed that when I first started up, everything in the log looked clean. I then Control-C'd to stop. And then I started up within a minute or two. On the 2nd run, it looked like every camera was out of sync. I halted nvr. Waited several minutes, and then when I started up for a third time, things looked good. I did this kind of start/stop with interspersed waits and concluded that one has to wait several minutes before restarting to assure a clean start. I suspect this is because the cameras are still sending packets from the prior closed sessions. I'll let it run through the night and update tomorrow. |
Hmm, I'd hoped that cooling-off period would be no longer necessary. The brand-new Reolink camera I just bought—unlike the one I borrowed from a neighbor a while ago—doesn't display exactly these symptoms. So I haven't directly tested it on a problematic camera myself. |
Aug_19_clean_start_then_failures.zip |
Shoot. Thanks for trying it out, and sorry it's still not working well.
Not sure yet if this is a bug in the client (Moonfire/Retina) or server (Reolink). I'll take a look at it. I might need a packet capture or similar to figure this out. The log shows the following bytes (which indeed aren't the start of a correct RTSP message) but I might need the preceding ones also. Maybe Retina erroneously dropped half a message or something. |
Looks like if Reolink is sending malformed RTSP messages, they wouldn't be the first. There's a gstreamer issue at gst-plugins-base#653 to work around a problem like that. They didn't say what brand; my best guess is Uniview from looking at the URLs in the bug reporter's packet capture. |
As this is now understood as an RTSP-level interop problem with these buggy cameras, let's continue discussion at scottlamb/retina#17 as necessary. Current best solution is to use the new |
Describe the bug
6 Reolink cameras, 3 models, system on Docker image, within a 2 minute session:
2824 "panics" in the log. [edit by scottlamb: the logs refer to panic handler stack frames but the system didn't actually panic.] Recordings are subject to gaps in time, some for 9 minutes of more. The system looks to be working well, but I started to notice some major gaps, e.g. several minutes in the recording sequences.Running moonfire-nvr in docker. Can start and stop and through web interface all seems well, works for the most part. However there are gaps in recordings and the log shows panics.
To Reproduce
Steps to reproduce the behavior:
See below "Consoles"
See log attached and end of this entry
Note: nvr2 differs by having additional line for LVM mount
Expected behavior
no "panics" and no gaps of several minutes in recordings
Bug policy https://github.com/scottlamb/moonfire-nvr/blob/master/guide/troubleshooting.md#panic-errors states:
Errors like the one below indicate a serious bug in Moonfire NVR. Please file a bug if you see one.
(Should the word "panic" be the trigger to file a bug, or when "panic" occurs from writer.rs?)
Screenshots
Here is a screenshot showing a raspberry pi console streaming "date" results overlaying a live stream
via the Moonfire-nvr Live mode displaying the camera's white date time stamp.
Server (please complete the following information):
Raspberry Pi 4 8GB Complete Starter Kit - 32GB Storage (purchased May/June 2020)
Western Digital 12TB WD Purple Surveillance Internal Hard Drive HDD - 7200 RPM, SATA 6 Gb/s, 256 MB Cache, 3.5" - WD121PURZ
[Independent Control Switch] SISUN Aluminum USB 3.0 5 Bay 3.5 inch SATA Hard Drive Enclosure Support 5 x 12TB Drive + 2X USB3.0 HUB (USB 3.0/5 Bay Hard Drive Enclosure-Black)
Rebuilt (on same SD HC) August, 2021; crashed Winter of 2021 - defective SD HC??
If using Docker:
docker ps
+docker images
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ sudo docker ps
[sudo] password for jlpoole:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a76d874a0eae scottlamb/moonfire-nvr:latest "/usr/local/bin/moon…" 38 hours ago Up 37 minutes moonfire-nvr
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest 1a30b4397839 5 weeks ago 4.85kB
scottlamb/moonfire-nvr latest 9732cf869a48 6 weeks ago 582MB
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $
If building from git:
git describe --dirty
+moonfire-nvr --version
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ git describe --dirty
v0.6.4-11-gdad9bdc
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $
Attach a log file. Run with the
RUST_BACKTRACE=1
environment variable set if possible.See Log at end of this entry.
I'm not sure I needed to set RUST_BACKTRACE when launching from Docker as it looks like the Docker
log has backtraces. Nevertheless, I did as requested:
Camera (please complete the following information):
I have six cameras, 2 of three different models:
Device Name: GarageWest
Model: Reolink RLC-420
Build No.: build 17112700
Hardware No.: IPC_3816M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.889_17112700
Details: IPC_3816M110000000000000
Client Version: v1.0.227
2)
Device Name: GarageEast
Model: RLC-420
Build No.: build 17112700
Hardware No.: IPC_3816M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.889_17112700
Details: IPC_3816M110000000000000
Client Version: v1.0.227
Device Name: Peck Alley West
Model: Reolink RLC-420-5MP
Build No.: build 19013001
Hardware No.: IPC_51516M5M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.354_19013001
Details: IPC_51516M5M110000000100000
Client Version: v1.0.239
Device Name: Peck Alley East
Model: Reolink RLC-420-5MP
Build No.: build 19013001
Hardware No.: IPC_51516M5M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.354_19013001
Details: IPC_51516M5M110000000100000
Client Version: v1.0.239
Camera Name: PeckPear
Model: Reolink RLC-410-5MP
UID: 95270002JS5EWXMZ
Build No.build: 20121100
Hardware No.: IPC_515B16M5M
Config Version: v3.0.0.0
Firmware Version: v3.0.0.136_20121100
Details: IPC_515B16M5MS10E1W01100000001
Camera Name: Maple1
Model: Reolink RLC-410-5MP
UID: 95270002JS5IEVPW
Build No.: build 20121100
Hardware No.: IPC_515B16M5M
Config Version: v3.0.0.0
Firmware Version: v3.0.0.136_20121100
Details: IPC_515B16M5MS10E1W01100000001
Desktop (please complete the following information):
NA
Smartphone (please complete the following information):
NA
Additional context
I've been seeing cameras go offline for several minutes, e.g. 9 minutes or more
Below may be helpful if time synchronization is involved:
I have all the reolink cameras using the time sync feature pointing to time-b.nist.gov.
Log File (numbered columns added)
session_numbered.zip
For others using Docker: I accessed the log using this command:
You want to use the "--since=[# of seconds back]s", otherwise you may get a dump of a very long log. Consider the "--since" flag as a "tail" type truncator.
The text was updated successfully, but these errors were encountered: