Skip to content

Conversation

@NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Oct 22, 2024

Benchmark shows no regression against main, and up to 10X speedup when indices are shuffled.

On main:
ordered indices: True
med = 22.55ms +- 1.75 

ordered indices: False
med = 309.52ms +- 11.83 

This PR:
ordered indices: True
med = 22.35ms +- 1.59 

ordered indices: False
med = 22.64ms +- 1.88 
import random
import torch
from time import perf_counter_ns

from torchcodec.decoders._core import (
    _add_video_stream,
    create_from_file,
    get_frames_at_indices,
    scan_all_streams_to_update_metadata,
)

import torch
from time import perf_counter_ns


def bench(f, num_exp=100, **kwargs):

    times = []
    for _ in range(num_exp):
        VIDEO_PATH = "./test/resources/nasa_13013.mp4"
        decoder = create_from_file(VIDEO_PATH)
        _add_video_stream(decoder)
        scan_all_streams_to_update_metadata(decoder)

        start = perf_counter_ns()
        f(decoder, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms", suff=""):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f} {suff}")
    return med



NUM_EXP = 50

def _get_frames_at_indices(decoder, **kwargs):
    get_frames_at_indices(decoder=decoder, stream_index=3, **kwargs)

for ordered in (True, False):
    print(f"ordered indices: {ordered}")
    frame_indices = list(range(100))
    if not ordered:
        random.shuffle(frame_indices)
    # For main:
    # times = bench(_get_frames_at_indices, frame_indices=frame_indices,  num_exp=NUM_EXP)
    # report_stats(times)
    # For PR:
    for sort_indices in (True, False):
        times = bench(_get_frames_at_indices, frame_indices=frame_indices, sort_indices=sort_indices, num_exp=NUM_EXP)
        report_stats(times, suff=f"{sort_indices = }")
    print()

Still TODO for other PRs:

  • add similar functionality pts
  • expose publicly as method in VideoDecoder
  • let the samplers rely on those C++ APIs.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 22, 2024
Comment on lines +1078 to +1081
if (options.colorConversionLibrary ==
ColorConversionLibrary::FILTERGRAPH) {
output.frames[indexInOutput] = singleOut.frame;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed when we are passing output.frames[indexInOutput] in getFrameAtIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the pre-allocated buffer isn't used with filtergraph, only with swscale.

(Just a note that this is not something that was introduced in this PR, you'll see the same pattern in other callers)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong behavior. When a user passes in a pre-allocated tensor it should work with either color conversion library.

I guess this behavior is introduced by the PR to add pre-allocated tensor. That PR should have done it for either color conversion library. Maybe make that change first before merging in this PR? That would be my vote because otherwise the caller has to think about what color conversion library was used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that the current behavior is potentially surprising.

I am open to re-working this eventually, but I want us to acknowledge that #266 (and #277) are significantly improving the existing code-base. They're not perfect, but they're clear improvements.

std::is_sorted(frameIndices.begin(), frameIndices.end());

std::vector<size_t> argsort;
if (!indicesAreSorted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging a bit, I think we're probably better off not checking to see if the sequence is already sorted, and just always sorting. Modern implementations of std::sort seem to be Introsort, which was designed to be nearly linear with an already sorted sequence. I'm also fine if we commit this as is. We can always investigate more later if it becomes important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I like to keep object definitions as close as possible to their initial use. So I'd prefer to see lines 1037-1040 appear after the sorting, right before we use output in the loop. (I just ran into this trying to find the definition of what output is when reading the loop.)

@NicolasHug NicolasHug merged commit c8de21c into meta-pytorch:main Oct 23, 2024
19 of 20 checks passed
@NicolasHug NicolasHug deleted the sort_and_dedup branch October 23, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants