-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
audio support #34
Comments
I've been thinking through a design that addresses these issues. It's certainly not simple, but I'm gaining confidence we can do it. First, I've rejected the idea of messing with audio durations. I don't want to deal with (lossy, messy, maybe slow) decoding and re-encoding of AAC. I'd rather Moonfire NVR just passes along the original audio as it does video. And obviously I don't want to do weird things to the pitch, so I can't just change the sample rate without redoing that encoding. So what about frequency adjustment? I don't want to give it up. Sometimes over the course of days/weeks, small drift can add up to tens of second which is really noticeable without this adjustment. But I think we can change how it works. Basically, keep the frame indices in terms of uncorrected durations. At the I've also rejected the idea of adjusting the video durations so that IDR frames land exactly on an audio frame boundary. My Dahua camera (when using AAC at 48 kHz, which seems like the most reasonable of its possible encodings if you want to understand voices) seems to have an audio frame roughly every 20 ms. Adjusting a single frame's timing by up to 20 ms seems like it'd be pretty noticeable (visibly choppy). Smearing that adjustment over a whole (typically 1- or 2-second) GOP is probably okay timing-wise (~2% rate change). But it'd be weird for live streams. Right now we I think that means that recordings from the same RTSP session need to overlap. Specifically, either:
I haven't decided between the two yet. I think either could work; it's just a matter of which is easier to understand and implement. Another variation: duplicate the overlapping part of the audio into two recordings, and mark how much needs to be trimmed from the start/end on playback. The extra disk space shouldn't be too noticeable. When composing adjacent recordings into one saved
Given that this bug may be triggered more rarely than I thought, I'm going to say it's not a blocker. I'd definitely like to either have it fixed or move to a new, pure-Rust RTSP library. But I think there's no reason we can't try out audio support in the meantime. It'd be optional anyway; at worst folks turn it off and are no worse than they'd be otherwise.
I think this option will work fine. We can choose a database row-level timebase that's the least common multiple of all the reasonable audio sampling rates (44.1 kHz, 48 kHz) and, if necessary, adjust the epoch so that the Javascript precision limit of 2^53 is far enough in the future. Then we can make the video durations fit to this timebase or a fraction of it. It's certainly not a problem to have a video frame's timing be off by less than a millisecond.
We can still seek to anywhere. We do need edit lists even when starting at the "beginning" of a recording because the video and audio will never begin at exactly the same time. That's apparently just a normal way of doing things when you generate
TBD but doable.
The It seems easiest if we can store the samples in exactly the order we receive them in the RTP stream, rather than having fancy buffering in the write path. But ISO/IEC 14496-12 section H.3.2 mentions though "Audio and video streams may not be perfectly interleaved in terms of presentation times in transmission order [in the incoming RTP stream]." If the IDR frame can be out of order with respect to the audio sample before it, we'll need to do some buffering to put the audio sample in the right recording. That's annoying (and the write path already feels ugly) but fundamentally is still possible. My concern about the blowup was referring to the "slices" part of the
No design mystery here; just work to do. |
This splits the schema and playback path. The recording path still adjusts the frame durations and always says the wall and media durations are the same. I expect to change that in a following commit. I wouldn't be surprised if that shakes out some bugs in this portion.
Found my first bug in the new wall vs media duration distinction: the live viewing stuff is mixing up them up for the durations within a recording, causing a panic.
Currently the API is that the offsets are given to |
This broke with the media vs wall duration split, part of #34.
hi @scottlamb thank you for the nice software, I'm using it with great success. I really miss the possibility to record audio. My cameras encode audio in AAC so there should be no need for transcoding the audio. Any progress with this feature? I'm a developer, maybe I can help you, thanks! |
It'd be nice to support audio. I think there are some non-trivial things to work out, though:
recording
rows. In the video index, keep 90 kHz or even less to save storage space. The final frame is implicitly the rest of the duration of the total recording.audio_index
likevideo_index
..mp4
data structures? how do we represent the interleaving?The text was updated successfully, but these errors were encountered: