-
Notifications
You must be signed in to change notification settings - Fork 31
Review of C++ decoding entry-points #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
My understanding of some of what's going on:
Some refactoring I would like to do is to never store pts as a double in seconds. We have to expose pts as a double in seconds, but we should never store it as such. This matters for our seek request:
It makes sense for us to expose that as an API. But as an implementation, we store the value as-is, as a double. This is our only seek function, so we call it everywhere. This then leads to us converting from int-based pts values we get directly from FFmpeg to seconds as doubles in many places. That has the possibility of precision loss. We instead should store the pts as an int, and then internally we can seek without a conversion to a double. I'm happy to do this refactoring. |
After discussing this we decided to:
These 2 combined should address both the |
We current have 4 public C++ entry points for decoding, and 1 additional private outlier. Below are their high-level call graph:
getFrameAtIndex
->idx2pts
->getDecodedOutputWithFilter
->convertAVFrameToDecodedOutput
getFramesInRange
->getFrameAtIndex
(in a for loop)getFrameDisplayedAtTimestampNoDemux
->getDecodedOutputWithFilter
->convertAVFrameToDecodedOutput
getFramesDisplayedByTimestampInRange
->pts2idx
->getFrameAtIndex
(in a for loop)Note 1 the
pts2idx
andidx2pts
aren't existing functions, they're just some inline conversion logic.Note 2
convertAVFrameToDecodedOutput
is responsible for calling either libavfilter or swscale to convert a single frame to RGB.Then there is a private entry point, which is only exposed as a core API. It is used in the private/deprecated samplers and also used internally:
getFramesAtIndices
->idx2pts
->getDecodedOutputWithFilter
-> "batched libavfilter/swscale" logicNote 3 this "batched libavfilter/swscale" logic is basically the same as what happens in
convertAVFrameToDecodedOutput
, but with a slight optimization for batched output, avoiding an extra copy.Some issues
The current states of things is the result of many locally-optimal and sensible decisions. In aggregate however, it's pretty confusing, especially for someone like me who didn't write most of those. In particular we end up with the following quirks, with increasing degree of importance:
getFramesInRange
calls intogetFrameAtIndex
butgetFramesDisplayedByTimestampInRange
doesn't call intogetFrameDisplayedAtTimestampNoDemux
getFramesDisplayedByTimestampInRange
converts pts to idx and then idx back to ptsNaively, it should make sense for
getFramesInRange
to call intogetFramesAtIndices
. It's not immediately obvious why that isn't the case.It's not clear why
getFramesAtIndices
exists at all considering it's not used by any public API.[maintenance] The "libavfilter/swscale" logic is implemented twice: in
convertAVFrameToDecodedOutput
for single frames, and ingetFramesAtIndices
with a more efficient path for batched frames.[perf] The public entry points do not benefit from that efficient batched "libavfilter/swscale" path since it only exists in
getFramesAtIndices
.Some more issues
For the samplers we have to implement 2 variants of
getFramesAtIndices
(potentially extending it): we need to implement the "sort and dedup" logic of frame indices. This has to be done for both indices and pts, noting that the pts will have to be converted to indices so that the dedup can happen (see #256).This is likely to lead to further fragmentation of the decoding entry points if we don't do anything about it.
Before we move forward with the implementation of these new/extended entry-points, do we:
The text was updated successfully, but these errors were encountered: