Add sort and dedup logic in C++ to `getFramesAtIndices` #280

NicolasHug · 2024-10-22T11:25:31Z

Benchmark shows no regression against main, and up to 10X speedup when indices are shuffled.

On main:
ordered indices: True
med = 22.55ms +- 1.75 

ordered indices: False
med = 309.52ms +- 11.83 

This PR:
ordered indices: True
med = 22.35ms +- 1.59 

ordered indices: False
med = 22.64ms +- 1.88

import random
import torch
from time import perf_counter_ns

from torchcodec.decoders._core import (
    _add_video_stream,
    create_from_file,
    get_frames_at_indices,
    scan_all_streams_to_update_metadata,
)

import torch
from time import perf_counter_ns


def bench(f, num_exp=100, **kwargs):

    times = []
    for _ in range(num_exp):
        VIDEO_PATH = "./test/resources/nasa_13013.mp4"
        decoder = create_from_file(VIDEO_PATH)
        _add_video_stream(decoder)
        scan_all_streams_to_update_metadata(decoder)

        start = perf_counter_ns()
        f(decoder, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms", suff=""):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f} {suff}")
    return med



NUM_EXP = 50

def _get_frames_at_indices(decoder, **kwargs):
    get_frames_at_indices(decoder=decoder, stream_index=3, **kwargs)

for ordered in (True, False):
    print(f"ordered indices: {ordered}")
    frame_indices = list(range(100))
    if not ordered:
        random.shuffle(frame_indices)
    # For main:
    # times = bench(_get_frames_at_indices, frame_indices=frame_indices,  num_exp=NUM_EXP)
    # report_stats(times)
    # For PR:
    for sort_indices in (True, False):
        times = bench(_get_frames_at_indices, frame_indices=frame_indices, sort_indices=sort_indices, num_exp=NUM_EXP)
        report_stats(times, suff=f"{sort_indices = }")
    print()

Still TODO for other PRs:

add similar functionality pts
expose publicly as method in VideoDecoder
let the samplers rely on those C++ APIs.

src/torchcodec/decoders/_core/VideoDecoder.cpp

ahmadsharif1 · 2024-10-22T14:07:08Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+      if (options.colorConversionLibrary ==
+          ColorConversionLibrary::FILTERGRAPH) {
+        output.frames[indexInOutput] = singleOut.frame;
+      }


Why is this needed when we are passing output.frames[indexInOutput] in getFrameAtIndex?

It's because the pre-allocated buffer isn't used with filtergraph, only with swscale.

(Just a note that this is not something that was introduced in this PR, you'll see the same pattern in other callers)

This seems like the wrong behavior. When a user passes in a pre-allocated tensor it should work with either color conversion library.

I guess this behavior is introduced by the PR to add pre-allocated tensor. That PR should have done it for either color conversion library. Maybe make that change first before merging in this PR? That would be my vote because otherwise the caller has to think about what color conversion library was used.

I agree with you that the current behavior is potentially surprising.

I am open to re-working this eventually, but I want us to acknowledge that #266 (and #277) are significantly improving the existing code-base. They're not perfect, but they're clear improvements.

scotts · 2024-10-22T18:12:03Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+      std::is_sorted(frameIndices.begin(), frameIndices.end());
+
+  std::vector<size_t> argsort;
+  if (!indicesAreSorted) {


Digging a bit, I think we're probably better off not checking to see if the sequence is already sorted, and just always sorting. Modern implementations of std::sort seem to be Introsort, which was designed to be nearly linear with an already sorted sequence. I'm also fine if we commit this as is. We can always investigate more later if it becomes important.

Nit: I like to keep object definitions as close as possible to their initial use. So I'd prefer to see lines 1037-1040 appear after the sorting, right before we use output in the loop. (I just ran into this trying to find the definition of what output is when reading the loop.)

NicolasHug added 5 commits October 22, 2024 10:04

Let get_frames_at_indices op return a 3-tuple instead of single Tensor

823c8a3

Add deduplication logic

61b4937

Added sorting logic

f7a70ba

minor opt

133c213

Comments

f391582

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 22, 2024

NicolasHug commented Oct 22, 2024

View reviewed changes

src/torchcodec/decoders/_core/VideoDecoder.cpp Outdated Show resolved Hide resolved

NicolasHug mentioned this pull request Oct 22, 2024

Add sort and dedup logic in C++ with new getFramesDisplayedByTimestamps method / core API #282

Merged

ahmadsharif1 reviewed Oct 22, 2024

View reviewed changes

Remove parameter, just sort if not already sorted

b8284cc

scotts reviewed Oct 22, 2024

View reviewed changes

scotts approved these changes Oct 22, 2024

View reviewed changes

Put definition closer to usag

d1f5645

NicolasHug merged commit c8de21c into meta-pytorch:main Oct 23, 2024
19 of 20 checks passed

NicolasHug deleted the sort_and_dedup branch October 23, 2024 09:49

NicolasHug mentioned this pull request Oct 25, 2024

Expose getFramesAtIndices and getFramesDisplayedByTimestamps as public Python methods of VideoDecoder #293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sort and dedup logic in C++ to `getFramesAtIndices` #280

Add sort and dedup logic in C++ to `getFramesAtIndices` #280

Uh oh!

NicolasHug commented Oct 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

ahmadsharif1 Oct 22, 2024

Uh oh!

NicolasHug Oct 22, 2024

Uh oh!

ahmadsharif1 Oct 22, 2024

Uh oh!

NicolasHug Oct 23, 2024

Uh oh!

scotts Oct 22, 2024

Uh oh!

scotts Oct 22, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add sort and dedup logic in C++ to getFramesAtIndices #280

Add sort and dedup logic in C++ to getFramesAtIndices #280

Uh oh!

Conversation

NicolasHug commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ahmadsharif1 Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

scotts Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

scotts Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add sort and dedup logic in C++ to `getFramesAtIndices` #280

Add sort and dedup logic in C++ to `getFramesAtIndices` #280

NicolasHug commented Oct 22, 2024 •

edited

Loading