Add sort and dedup logic in C++ with new `getFramesDisplayedByTimestamps` method / core API #282

NicolasHug · 2024-10-22T13:23:08Z

Same as #280 , but takes a list of pts as input instead of indices. This is needed for the samplers: #256

…d_dedup

NicolasHug · 2024-10-23T10:59:19Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

@@ -1090,6 +1090,45 @@ VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesAtIndices(
  return output;
 }

+VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesDisplayedByTimestamps(


For the function names, I based it on the existing getFramesDisplayedByTimestampsInRange(). I'm happy to bikeshed, but not on this PR please.
Note that our naming is becoming a bit inconsistent, e.g. the python names, ops names and C++ names are not always aligned. We probably want to clean that up, but that's for later.

We'll also have to re-think our public VideoDecoder method names very soon I think, because adding the 2 new "get_frames_..." that we recently added may conflict with existing names.

NicolasHug · 2024-10-23T11:01:46Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

@@ -1090,6 +1090,45 @@ VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesAtIndices(
  return output;
 }

+VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesDisplayedByTimestamps(
+    int streamIndex,
+    const std::vector<double>& framePtss) {


I'm not super happy about this parameter name framePtss in C++, frame_ptss in Python. It's the mirror of frame_indices. Happy to consider alternatives. The "single frame" equivalent in get_frame_displayed_at is just seconds, so it's hard to align it across all dimensions while at the same time conveying that this is a "plural".

Maybe just timestamps since it's in the function name too?

It should probably be framePtses and frame_ptses, following the English rule of adding "es" to make something that ends in "s" plural. That's still not great, but the current version definitely looks like a typo.

I'm happy with timestamps - there was a race condition in commenting. :)

I think timestamps is a bit more intuitive to me than frame_timestamps because these are random time points given by the user -- not necessarily corresponding to frame.pts or frame.pts + frame.duration (they could be somewhere in between).

Heck they could even be outside the min and max pts.

But whatever you choose is okay

SG, I'll go with timestamps then

ahmadsharif1 · 2024-10-23T13:16:40Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

@@ -1090,6 +1090,45 @@ VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesAtIndices(
  return output;
 }

+VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesDisplayedByTimestamps(
+    int streamIndex,
+    const std::vector<double>& framePtss) {


Maybe just timestamps since it's in the function name too?

ahmadsharif1 · 2024-10-23T13:26:47Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  validateUserProvidedStreamIndex(streamIndex);
+  validateScannedAllStreams("getFramesDisplayedByTimestamps");
+
+  // The frame displayed at timestamp t and the one displayed at timestamp `t +
+  // eps` are probably the same frame, with the same index. The easiest way to
+  // avoid decoding that unique frame twice is to convert the input timestamps
+  // to indices, and leverage the de-duplication logic of getFramesAtIndices.


The long term thing to do here is to not require a scan for this function.

Scan should only be required for index-based functions where indexes need to be exact.

The index-based function can then call this function if the de-duping is done here and this doesn't need a scan.

That said you can merge this as-is and do the long-term thing later

@ahmadsharif1 do you think it will be possible to correctly de-dup pts-based frame queries without going through indices?
That is currently the main reason we're converting to indices here.

This is what I mean by having explicit seeking modes (approximate versus exact) is a big conceptual change. :) It might only requires changing a few dozen lines of code, but it will require changes in many places, and we'll always have to be aware of what mode is active.

At the moment, we're implicitly exact when we need to do things with indices. I think we should just keep being that way as needed, and when we implement the different modes, we can reason about what changes to make wholistically.

It should be possible to dedup based on time, but it needs to be done on the fly -- i.e. when you decode a frame you see the extent of that frame on the timeline and can make copies of that for the timepoints that the user specified. It could be a bit more involved than that because you don't know the extent of the frame by looking at the frame itself -- you need to read (not decode) the next frame.

Doing it holistically in a future PR sounds good

ahmadsharif1 · 2024-10-23T14:36:42Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+VideoDecoder::BatchDecodedOutput VideoDecoder::getFramesDisplayedByTimestamps(
+    int streamIndex,
+    const std::vector<double>& framePtss) {
+  validateUserProvidedStreamIndex(streamIndex);


Maybe add a TODO saying long term we should not require scanning the file for time-based frame extraction

scotts · 2024-10-23T14:39:38Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+
+    auto it = std::lower_bound(
+        stream.allFrames.begin(),
+        stream.allFrames.end() - 1,


Are you intentionally not including the last frame? Note that by convention, the .end() method on C++ containers is actually an iterator past the end: https://en.cppreference.com/w/cpp/container/vector/end. So this code is actually eliminating the final element from the range.

Thanks for double-checking, that was actually something that confused me a bit.

I originally had just stream.allFrames.end(), since this was copy-pasted from the getFramesDisplayedByTimestampInRange.

But with that, we would end up with an index of 390 on the NASA video for queries that were close to the video duration (~13s), when the last valid index is 389.

My first fix was to do frameIndex = std::min(frameIndex, (int64_t)stream.allFrames.size() - 1);.

But IIUC, the way std::lower_bound() works is that if it finds no frame that satisfies the condition, it will return the value of the second parameter (i.e. stream.allFrames.end() - 1), which is what we want?

(You'll see in the test I added that querying ~13s now properly returns the last frame, at index 389)

There is a very non-zero chance that I'm misunderstanding all of this.

Hmmm. I think you're right. We're eliminating the final element, but if we should have gotten the final element, it will become that anyway. This works for, I think, two resons:

Because we've already eliminated the possibility of a pts value outside of the valid range because of the TORCH_CHECK above.

Because we're converting iterators to indices, and .end() becomes equivalent to .size() with the subtraction method we're using.

But, I think I actually prefer using min to fix things up after. That this works is so subtle, and doing .end() - 1 in an iterator range is so unusual, I'd rather have the min approach with a comment explaining why we need it. I think it makes the code more understandable. The comment would be something like:

// If the frame index is larger than the size of all frames, that means we // couldn't match the pts value to the pts value of a NEXT FRAME. And // that means that this timestamp falls during the time between when the // last frame is displayed, and the video ends. Hence, it should map to the // index of the last frame.

scotts · 2024-10-23T15:54:37Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+        stream.allFrames.end() - 1,
+        framePts,
+        [&stream](const FrameInfo& info, double start) {
+          return ptsToSeconds(info.nextPts, stream.timeBase) <= start;


Nit: let's rename start to something more indicative of what we're looking for. Using framePts is fine, as it will reinforce that it will take on the framePts value passed into lower_bound.

scotts · 2024-10-23T16:21:17Z

test/decoders/test_video_decoder_ops.py

+        # # first and last frame should be equal, at pts=2 [+ eps]. We then
+        # modify the # first frame and assert that it's now different from the
+        # last frame.  # This ensures a copy was properly made during the
+        # de-duplication logic.


Nit: looks like auto-formatting had some problems when merging line comments.

NicolasHug added 8 commits October 22, 2024 10:04

Let get_frames_at_indices op return a 3-tuple instead of single Tensor

823c8a3

Add deduplication logic

61b4937

Added sorting logic

f7a70ba

minor opt

133c213

Comments

f391582

scaffolding

d475890

Added logic

14e2876

Added test

9c9e462

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 22, 2024

NicolasHug added 6 commits October 22, 2024 17:17

Remove parameter, just sort if not already sorted

b8284cc

Remove parameter

be80996

Merge branch 'main' of github.com:pytorch/torchcodec into pts_sort_an…

12c0e29

…d_dedup

Rename

4dda5b7

Fix last frame request

7d26623

Better fix

3a8839d

NicolasHug marked this pull request as ready for review October 23, 2024 10:57

NicolasHug commented Oct 23, 2024

View reviewed changes

NicolasHug requested review from ahmadsharif1 and scotts October 23, 2024 11:02

NicolasHug changed the title ~~Add sort and dedup logic in C++ with new getFramesAtPtss method / core API~~ Add sort and dedup logic in C++ with new getFramesDisplayedByTimestamps method / core API Oct 23, 2024

ahmadsharif1 reviewed Oct 23, 2024

View reviewed changes

ahmadsharif1 approved these changes Oct 23, 2024

View reviewed changes

scotts reviewed Oct 23, 2024

View reviewed changes

Use timestamps as parameter name

d43dd91

scotts reviewed Oct 23, 2024

View reviewed changes

Address comments

8dd9b0a

NicolasHug mentioned this pull request Oct 23, 2024

Speed-up time-based samplers by 20X and index-based by 1.5X #284

Merged

scotts reviewed Oct 23, 2024

View reviewed changes

scotts approved these changes Oct 23, 2024

View reviewed changes

Use max() approach, and fix comment in tests

52dd9ed

NicolasHug merged commit 875f8bc into pytorch:main Oct 23, 2024
24 checks passed

NicolasHug deleted the pts_sort_and_dedup branch October 23, 2024 16:41

NicolasHug mentioned this pull request Oct 25, 2024

Expose getFramesAtIndices and getFramesDisplayedByTimestamps as public Python methods of VideoDecoder #293

Closed

Add sort and dedup logic in C++ with new getFramesDisplayedByTimestamps method / core API #282

Add sort and dedup logic in C++ with new getFramesDisplayedByTimestamps method / core API #282

Uh oh!

Conversation

NicolasHug commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Add sort and dedup logic in C++ with new `getFramesDisplayedByTimestamps` method / core API #282

Add sort and dedup logic in C++ with new `getFramesDisplayedByTimestamps` method / core API #282

NicolasHug commented Oct 22, 2024 •

edited

Loading

NicolasHug Oct 23, 2024 •

edited

Loading

NicolasHug Oct 23, 2024 •

edited

Loading

NicolasHug Oct 23, 2024 •

edited

Loading

NicolasHug Oct 23, 2024 •

edited

Loading