Towards a clearer logic for determining output height and width #332

NicolasHug · 2024-11-05T11:35:46Z

Background from #269:

We create/allocate output frame tensors in different places. In particular, we determine the height and width of the output tensor from different sources:

For batch APIs (CPU and GPU): from the stream metadata, which itself comes from the CodecContext
For single frame APIs:
- CPU: swscale and filtergraph: from the AVFrame
- GPU: from the CodecContext

The info from the metadata / CodecContext are available as soon as we add a stream, e.g. right after we instantiate a Python VideoDecoder. The AVFrame is only available once we have decoded the frame with ffmpeg (this is the "raw output").

The source of truth really is the AVFrame. CondecContext may be wrong, and in particular we now know that some streams may have variable height and width #312.

What this PR does

it documents the above logic
it makes it easier to understand the above logic and in particular which caller uses which strategy to figure out height and width, by introducing getHeightAndWidthFromOptionsOrMetadata() and getHeightAndWidthFromOptionsOrAVFrame().
it adds a unique frame tensor allocation function: allocateEmptyHWCTensor().

What this PR does not

This PR does not change the logic of how height and width are determined, in any of the updated callers.
This PR does not ensure that height and width of the [pre]allocated tensors are as expected. In other words, this PR doesn't prevent potential segfaults from happening. This will come in a follow-up.

NicolasHug · 2024-11-05T13:34:49Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  int height = 0, width = 0;
+  std::tie(height, width) = getHeightAndWidthFromOptionsOrAVFrame(
+      streams_[streamIndex].options, filteredFrame.get());
+  std::vector<int64_t> shape = {height, width, 3};


So this is the only place where the logic is slightly changed (but I think the behavior is the same): we go through options first. I think this is correct, because the filteredFrame should respect options itself?
In any case, this is what is done for the swscale case as well. If that's incorrect or a change of behavior, LMK.

Looks correct to me, but @ahmadsharif1 should also reason through it.

Maybe add a TORCH_CHECK to make sure the filteredFrame has the expected dimensions?

I'll add the validity checks as follow up!

scotts · 2024-11-05T14:13:17Z

src/torchcodec/decoders/_core/CudaDevice.cpp

@@ -209,8 +197,9 @@ void convertAVFrameToDecodedOutputOnCuda(
      src->format == AV_PIX_FMT_CUDA,
      "Expected format to be AV_PIX_FMT_CUDA, got " +
          std::string(av_get_pix_fmt_name((AVPixelFormat)src->format)));
-  int width = options.width.value_or(codecContext->width);
-  int height = options.height.value_or(codecContext->height);
+  int height = 0, width = 0;


Nit: please declare only one variable per line.

scotts · 2024-11-05T14:14:52Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

-      durationSeconds(torch::empty({numFrames}, {torch::kFloat64})) {}
+    : ptsSeconds(torch::empty({numFrames}, {torch::kFloat64})),
+      durationSeconds(torch::empty({numFrames}, {torch::kFloat64})) {
+  int height = 0, width = 0;


Same nit, and there's a few other places.

scotts · 2024-11-05T14:18:21Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+      options.height.value_or(avFrame->height),
+      options.width.value_or(avFrame->width));
+}
+


I'd actually prefer that we just call both of these functions getHeightAndWidth and allow C++ function overloading to determine which one to call. In C++, because the types of parameters are formally a part of the function signature, it's less common to encode the type of one of the parameters in the name. This, however, is a question of style, and I know @ahmadsharif1 may feel differently.

Read the comment below, I get that you're purposefully marking in the code which strategy we're doing where.

scotts · 2024-11-05T14:25:19Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  auto tensorOptions = torch::TensorOptions()
+                           .dtype(torch::kUInt8)
+                           .layout(torch::kStrided)
+                           .device(device);


Not critical or blocking, but this may be a good time to put in TORCH_CHECKs to make sure the height and width are not negative. I just looked it up, and I think the tensor API can handle that, but that's probably not going to be what we expect. (I know you plan on a follow-up PR with more error-checking, so this sort of thing may be more appropriate there.)

scotts · 2024-11-05T14:34:10Z

Almost forgot to mention: great refactoring, and thanks for spending the time to reason through all of this and make sure what we're doing is sensible!

ahmadsharif1 · 2024-11-05T15:01:42Z

src/torchcodec/decoders/_core/CudaDevice.cpp

-  int width = options.width.value_or(codecContext->width);
-  int height = options.height.value_or(codecContext->height);
+  int height = 0, width = 0;
+  std::tie(height, width) =


IMO this is less maintainable/safe than returning a struct with named members.

Someone could write:

std::tie(width, height) instead of std::tie(height, width) and it would be hard to notice.

ahmadsharif1 · 2024-11-05T15:14:26Z

src/torchcodec/decoders/_core/VideoDecoder.h

+
+std::tuple<int, int> getHeightAndWidthFromOptionsOrAVFrame(
+    const VideoDecoder::VideoStreamDecoderOptions& options,
+    AVFrame* avFrame);


This should be const too

const & would be preferred since this is a mandatory parameter and should never be null

ahmadsharif1 · 2024-11-05T15:18:18Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  int height = 0, width = 0;
+  std::tie(height, width) = getHeightAndWidthFromOptionsOrAVFrame(
+      streams_[streamIndex].options, filteredFrame.get());
+  std::vector<int64_t> shape = {height, width, 3};


Maybe add a TORCH_CHECK to make sure the filteredFrame has the expected dimensions?

NicolasHug added 3 commits November 4, 2024 16:49

WIP

0256e18

WIP

4a9c00c

Comments

e529764

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 5, 2024

NicolasHug changed the title ~~WIP~~ Towards a clearer logic for determining output height and width Nov 5, 2024

NicolasHug marked this pull request as ready for review November 5, 2024 11:43

NicolasHug added 4 commits November 5, 2024 11:47

hardcode CPU

dcb50d5

Merge branch 'main' of github.com:pytorch/torchcodec into allocation

a07fb2d

One more

733e62b

Merge branch 'main' of github.com:pytorch/torchcodec into allocation

52abe9b

NicolasHug commented Nov 5, 2024

View reviewed changes

scotts reviewed Nov 5, 2024

View reviewed changes

One declaration per line

b178250

scotts reviewed Nov 5, 2024

View reviewed changes

scotts approved these changes Nov 5, 2024

View reviewed changes

ahmadsharif1 approved these changes Nov 5, 2024

View reviewed changes

NicolasHug added 3 commits November 5, 2024 15:41

Ensure H W and N are >=0

e386b56

use const AVFrame& avFrame

2cd2059

Use FrameDims class instead of tuple

26e6203

NicolasHug merged commit 373d1c5 into pytorch:main Nov 5, 2024
37 of 40 checks passed

NicolasHug deleted the allocation branch November 5, 2024 16:42

NicolasHug mentioned this pull request Nov 6, 2024

Validation of allocated output tensor shapes #339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Towards a clearer logic for determining output height and width #332

Towards a clearer logic for determining output height and width #332

Uh oh!

NicolasHug commented Nov 5, 2024 •

edited

Loading

Uh oh!

NicolasHug Nov 5, 2024

Uh oh!

scotts Nov 5, 2024

Uh oh!

ahmadsharif1 Nov 5, 2024

Uh oh!

NicolasHug Nov 5, 2024

Uh oh!

scotts Nov 5, 2024

Uh oh!

scotts Nov 5, 2024

Uh oh!

scotts Nov 5, 2024 •

edited

Loading

Uh oh!

scotts Nov 5, 2024

Uh oh!

scotts commented Nov 5, 2024

Uh oh!

ahmadsharif1 Nov 5, 2024

Uh oh!

ahmadsharif1 Nov 5, 2024

Uh oh!

ahmadsharif1 Nov 5, 2024

Uh oh!

Uh oh!

Uh oh!

Towards a clearer logic for determining output height and width #332

Towards a clearer logic for determining output height and width #332

Uh oh!

Conversation

NicolasHug commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background from #269:

What this PR does

What this PR does not

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts commented Nov 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Nov 5, 2024 •

edited

Loading

scotts Nov 5, 2024 •

edited

Loading