Description
We create/allocate output frame tensors in different places. In particular, we determine the height
and width
of the output tensor from different sources:
- For batch APIs (CPU and GPU): from the stream metadata, which itself comes from the
CodecContext
- For single frame APIs:
- CPU: swscale and filtergraph: from the
AVFrame
- GPU: from the
CodecContext
- CPU: swscale and filtergraph: from the
The info from the metadata / CodecContext
are available as soon as we add a stream, e.g. right after we instantiate a Python VideoDecoder
. The AVFrame
is only available once we have decoded the frame with ffmpeg (this is the "raw output").
The source of truth really is the AVFrame
. CondecContext
may be wrong, and in particular we now know that some streams may have variable height and width #312.
Details:
- For batch APIs:
torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 165 to 187 in 41c6491
-
For single frames APIs:
torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 887 to 890 in 41c6491
andtorchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 1279 to 1286 in 41c6491
-
For CUDA APIs:
torchcodec/src/torchcodec/decoders/_core/CudaDevice.cpp
Lines 156 to 161 in 41c6491