Review of C++ decoding entry-points

We current have 4 public C++ entry points for decoding, and 1 additional private outlier. Below are their high-level call graph:

- `getFrameAtIndex` -> `idx2pts` -> `getDecodedOutputWithFilter` -> `convertAVFrameToDecodedOutput`
- `getFramesInRange` -> `getFrameAtIndex` (in a for loop)

- `getFrameDisplayedAtTimestampNoDemux` -> `getDecodedOutputWithFilter` -> `convertAVFrameToDecodedOutput`
- `getFramesDisplayedByTimestampInRange` ->`pts2idx` -> `getFrameAtIndex` (in a for loop)

**Note 1** the `pts2idx` and `idx2pts` aren't existing functions, they're just some inline conversion logic.
**Note 2** `convertAVFrameToDecodedOutput` is responsible for calling either libavfilter or swscale to convert a *single* frame to RGB.


Then there is a private entry point, which is only exposed as a core API. It is used in the private/deprecated samplers and also used internally:

- `getFramesAtIndices` -> `idx2pts` -> `getDecodedOutputWithFilter` -> "batched libavfilter/swscale" logic

**Note 3** this "batched libavfilter/swscale" logic is basically the same as what happens in `convertAVFrameToDecodedOutput`, but with a slight optimization for batched output, avoiding an extra copy.

# Some issues

The current states of things is the result of many locally-optimal and sensible decisions. In aggregate however, it's pretty confusing, especially for someone like me who didn't write most of those. In particular we end up with the following quirks, with increasing degree of importance:

- `getFramesInRange` calls into `getFrameAtIndex` but `getFramesDisplayedByTimestampInRange` doesn't call into `getFrameDisplayedAtTimestampNoDemux`
- `getFramesDisplayedByTimestampInRange` converts pts to idx and then idx back to pts

- Naively, it should make sense for `getFramesInRange` to call into `getFramesAtIndices`. It's not immediately obvious why that isn't the case.
- It's not clear why `getFramesAtIndices` exists *at all* considering it's not used by any public API.
- [maintenance] The "libavfilter/swscale" logic is implemented *twice*: in `convertAVFrameToDecodedOutput` for single frames, and in `getFramesAtIndices` with a more efficient path for batched frames.
- [perf] The public entry points **do not** benefit from that efficient batched "libavfilter/swscale" path since it only exists in `getFramesAtIndices`.

# Some more issues

For the samplers we have to implement 2 variants of `getFramesAtIndices` (potentially extending it): we need to implement the "sort and dedup" logic of frame indices. This has to be done for both indices *and* pts, noting that the pts will have to be converted to indices so that the dedup can happen (see https://github.com/pytorch/torchcodec/issues/256).

This is likely to lead to further fragmentation of the decoding entry points if we don't do anything about it.

-----------------

Before we move forward with the implementation of these new/extended entry-points, do we:

- pay some debt right now and address some of the quirks mentioned above ([perf], [maintenance])
- yolo it, let's just add more stuff and refactor later, maybe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review of C++ decoding entry-points #260

Some issues

Some more issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review of C++ decoding entry-points #260

Description

Some issues

Some more issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions