Rework HWC / CHW dimension order conversions #277

NicolasHug · 2024-10-18T11:48:30Z

This PR simplifies the HWC -> CHW dimension conversion, and reduces the expected input/output dimension order of some functions.

Batched output tensors are now always created as NHWC. They are converted to NCHW in one single step, instead of converting HWC sub-tensors N times.
As a consequence, the expected shape of input/allocated tensors within convertAVFrameToDecodedOutputOnCPU is now always HWC. And we can now enforce it (we couldn't before).
convertFrameToTensorUsingFilterGraph now always returns a HWC tensor. Similarly, convertFrameToBufferUsingSwsScale now always expects (pointer over) a HWC tensor.
The [N]HWC -> [N]CHW permutation is now centralized in a new function helper.

This reverts commit 8e06aa6.

…ocated_tensors

…rchcodec into pass_preallocated_tensors

…ocated_tensors

…ion_cleanup

NicolasHug · 2024-10-18T12:02:37Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+      // batch NHWC tensors to be permuted only once, instead of permuting HWC
+      // tensors N times.
+      output.frame = MaybeHWC2CHW(streamInfo.options, output.frame);
+    }


I still think there's some smell to this. Whether the tensor was pre-allocated and whether it should be permuted should be orthogonal concepts.
I think it would make sense for all the low-level decoding capabilities (including this function convertAVFrameToDecodedOutputOnCPU ) to only ever accept and return HWC tensors.

And it should be up to the higher-level decoding entry-points (basically the moral equivalent of the public methods) to do the conversion. It's not trivial because getFrameAtIndex is both an entry point and a sub-function of other entry-points. Maybe that also means we should already let all entry-points do their own allocation and always pass pre-allocated tensors?

Agreed on your reasoning and the principles.

One way to square the circle is to split the public facing part of getFrameAtIndex from the actual work being done. We would have a public member function (getFrameAtIndex) and a private member function (getFrameAtIndexInternal, or something like that).

getFrameAtInedex would call getFrameAtIndexInternal for the actual work, and after, it would do the conversion check and the actual conversion if needed.

getFrameAtIndexInternal would just assume HWC tensors and do the real work.

Then all of the internal calls to getFrameAtIndex would become calls to getFrameAtIndexInternal. I've implementing variants this pattern several times before. We would repeat this for any public entry points that also need to do work for other public entry points, as needed.

I am fine with low-level functions only dealing with HWC. AFAICT, most (all?) low-level code deals with HWC because it has better performance.

Yeah, that sounds good. I'll try to implement that in a follow-up PR

FAICT, most (all?) low-level code deals with HWC

That's not quite the case. In main , convertAVFrameToDecodedOutputOnCPU accepts both HWC and CHW - this is actually what this PR is fixing.

It may still return both HWC and CHW (instead of just HWC), and this is what I want to fix as a follow-up

NicolasHug · 2024-10-18T12:03:21Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+            "x",
+            width,
+            "x3, got ",
+            shape);


Any idea how to make this single call shorter 🤔 ?

I think you can do:

TORCH_CHECK( (shape.size() == 3) && (shape.equals({height, width, 3}), "Expected tensor of shape ", height, "x", width, "x3, got ", shape);

But I'm not sure. The main thing I'm not sure about is if an array literal will auto-convert into a the corresponding ArrayRef: https://pytorch.org/cppdocs/api/classc10_1_1_array_ref.html#_CPPv4NK3c108ArrayRef6equalsE8ArrayRef

Ah, I was mainly hoping to avoid the

height, "x", width, "x3, got ", shape);

stack :p

ahmadsharif1 · 2024-10-18T13:23:05Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  auto numDimensions = hwcTensor.dim();
+  auto shape = hwcTensor.sizes();
+  if (numDimensions == 3) {
+    TORCH_CHECK(shape[2] == 3, "Not a HWC tensor: ", shape);


Is this robust if the width/height is 3?

It will have false positive for the extremely rare (and probably degenerate) case where a video width is 3.

This check is the very very best we can do at this stage. The alternative is to not check anything.

src/torchcodec/decoders/_core/VideoDecoder.cpp

ahmadsharif1 · 2024-10-18T14:26:25Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+      // batch NHWC tensors to be permuted only once, instead of permuting HWC
+      // tensors N times.
+      output.frame = MaybeHWC2CHW(streamInfo.options, output.frame);
+    }


I am fine with low-level functions only dealing with HWC. AFAICT, most (all?) low-level code deals with HWC because it has better performance.

ahmadsharif1 · 2024-10-18T14:41:41Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

-         options.height.value_or(*metadata.height),
-         options.width.value_or(*metadata.width)},
-        torch::TensorOptions()
-            .memory_format(torch::MemoryFormat::ChannelsLast)


I am wondering if using this is identical to permuting a NHWC tensor at the end. I am not 100% sure. Do you?

I meant in terms of performance, is it identical? Specifically this is a bit concerning:

We really should have a batch benchmark because the whole point of batch is to do things in a performant way for many frames.

Sorry I don't understand what you mean. What image did you mean to link to?

Note that this PR should e strictly more efficient:

Batched output tensors are now always created as NHWC. They are converted to NCHW in one single step, instead of converting HWC sub-tensors N times.

Sorry the image wasn't uploaded properly:

I am not sure about the performance implications of doing a permute instead of .to(channels_last)

From this page: https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html#:~:text=What%20is%20Channels%20Last,pixel%2Dper%2Dpixel).

Sorry I still don't really understand where you're coming from. Can you please share a link? Is this relevant for this PR?

Again, the change involved in this PR is:

Batched output tensors are now always created as NHWC. They are converted to NCHW in one single step, instead of converting HWC sub-tensors N times

Thanks for the link.

We're not concerned about memory format (contiguous vs channels-last) in this PR. This is a related but distinct concern to the dimension order.

Link in the edited comment above. It's from the channels-last page.

I am not 100% sure if creating a NHWC and permuting it is the same as creating a NCHW with channels-last and working with that. The code that you deleted was doing the latter.

A benchmark may show a difference -- or not. Do you know?

NicolasHug and others added 20 commits October 16, 2024 09:41

pass a pointer, segfaults

8e06aa6

Revert "pass a pointer, segfaults"

025bf27

This reverts commit 8e06aa6.

Pre-allocate tensors when possible to avoid copies

f83ada9

refac

72717bd

Fix C++ tests

291bc87

minor simplification

887ae42

WIP

9418cb3

Merge branch 'main' of github.com:pytorch/torchcodec into pass_preall…

6b3da59

…ocated_tensors

WIP

6a2190c

don't use a ref

c8f2e79

Avoid temporary variable

5113b9c

Test, and fix

9387537

Merge branch 'main' of github.com:pytorch/torchcodec into pass_preall…

bcb4e50

…ocated_tensors

Merge branch 'main' of github.com:pytorch/torchcodec into pass_preall…

5db658e

…ocated_tensors

Update test/decoders/test_video_decoder_ops.py

e23acb7

M)erge branch 'pass_preallocated_tensors' of github.com:nicolashug/to…

96deb24

…rchcodec into pass_preallocated_tensors

Merge branch 'main' of github.com:pytorch/torchcodec into pass_preall…

c2f2e59

…ocated_tensors

Merge branch 'main' of github.com:pytorch/torchcodec into dim_convers…

faa0178

…ion_cleanup

WIP

3484615

Nits

22126c4

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 18, 2024

NicolasHug commented Oct 18, 2024

View reviewed changes

NicolasHug marked this pull request as ready for review October 18, 2024 12:05

NicolasHug mentioned this pull request Oct 18, 2024

Let getFramesAtIndices rely on getFrameAtIndex #278

Merged

ahmadsharif1 reviewed Oct 18, 2024

View reviewed changes

Use MaybePermuteHWC2CHW name

176a652

ahmadsharif1 approved these changes Oct 18, 2024

View reviewed changes

NicolasHug merged commit b82ea81 into meta-pytorch:main Oct 18, 2024
24 checks passed

NicolasHug deleted the dim_conversion_cleanup branch October 18, 2024 14:59

NicolasHug mentioned this pull request Oct 23, 2024

Add sort and dedup logic in C++ to getFramesAtIndices #280

Merged

Rework HWC / CHW dimension order conversions #277

Rework HWC / CHW dimension order conversions #277

Uh oh!

Conversation

NicolasHug commented Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NicolasHug commented Oct 18, 2024 •

edited

Loading

scotts Oct 18, 2024 •

edited

Loading

NicolasHug Oct 18, 2024 •

edited

Loading

ahmadsharif1 Oct 18, 2024 •

edited

Loading