Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
3ff6292
Added doc for nvdec
ahmadsharif1 Nov 5, 2024
1fd5a10
.
ahmadsharif1 Nov 5, 2024
fa3e3b9
.
ahmadsharif1 Nov 5, 2024
36a5420
.
ahmadsharif1 Nov 5, 2024
f49baca
.
ahmadsharif1 Nov 5, 2024
f087a91
.
ahmadsharif1 Nov 5, 2024
5092418
.
ahmadsharif1 Nov 5, 2024
243e2ca
.
ahmadsharif1 Nov 5, 2024
7c6c033
.
ahmadsharif1 Nov 5, 2024
e40ec7a
.
ahmadsharif1 Nov 5, 2024
bb4bff9
.
ahmadsharif1 Nov 5, 2024
e8a5b07
.
ahmadsharif1 Nov 5, 2024
c9d54a4
.
ahmadsharif1 Nov 5, 2024
fb633e4
.
ahmadsharif1 Nov 6, 2024
9e334cd
.
ahmadsharif1 Nov 6, 2024
c107e02
.
ahmadsharif1 Nov 6, 2024
885c43f
.
ahmadsharif1 Nov 6, 2024
dd937c6
.
ahmadsharif1 Nov 6, 2024
bab07db
.
ahmadsharif1 Nov 6, 2024
60b06e1
.
ahmadsharif1 Nov 6, 2024
904bfa3
.
ahmadsharif1 Nov 6, 2024
75e76ee
.
ahmadsharif1 Nov 6, 2024
16218ac
.
ahmadsharif1 Nov 6, 2024
e8f0128
.
ahmadsharif1 Nov 6, 2024
9c36f4e
.
ahmadsharif1 Nov 6, 2024
2406435
.
ahmadsharif1 Nov 6, 2024
7b78be3
.
ahmadsharif1 Nov 6, 2024
20c6fba
.
ahmadsharif1 Nov 6, 2024
7630fdd
.
ahmadsharif1 Nov 6, 2024
37bfa5c
.
ahmadsharif1 Nov 6, 2024
24f2843
.
ahmadsharif1 Nov 6, 2024
4cb95a2
.
ahmadsharif1 Nov 6, 2024
4055346
.
ahmadsharif1 Nov 6, 2024
63bbb9e
.
ahmadsharif1 Nov 6, 2024
51e2308
.
ahmadsharif1 Nov 6, 2024
a926934
.
ahmadsharif1 Nov 6, 2024
400001a
.
ahmadsharif1 Nov 6, 2024
ccf95da
.
ahmadsharif1 Nov 7, 2024
209e746
.
ahmadsharif1 Nov 7, 2024
8d66147
.
ahmadsharif1 Nov 7, 2024
0a8ae5f
.
ahmadsharif1 Nov 7, 2024
8864b30
.
ahmadsharif1 Nov 7, 2024
936cbd1
.
ahmadsharif1 Nov 7, 2024
49197b5
.
ahmadsharif1 Nov 7, 2024
8291aa6
.
ahmadsharif1 Nov 7, 2024
4e10d0b
.
ahmadsharif1 Nov 7, 2024
b90bc7f
.
ahmadsharif1 Nov 7, 2024
2ae49ac
.
ahmadsharif1 Nov 7, 2024
f0444d4
.
ahmadsharif1 Nov 7, 2024
8d2070a
.
ahmadsharif1 Nov 7, 2024
89c380e
.
ahmadsharif1 Nov 7, 2024
1507669
.
ahmadsharif1 Nov 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ defaults:

jobs:
build:
runs-on: ubuntu-latest
runs-on: linux.g5.4xlarge.nvidia.gpu
strategy:
fail-fast: false
steps:
Expand All @@ -26,14 +26,15 @@ jobs:
python-version: '3.12'
- name: Update pip
run: python -m pip install --upgrade pip
- name: Install dependencies and FFmpeg
- name: Install torchcodec from nightly
run: |
# TODO: torchvision and torchaudio shouldn't be needed. They were only added
# to silence an error as seen in https://github.com/pytorch/torchcodec/issues/203
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
conda install "ffmpeg=7.0.1" pkg-config -c conda-forge
pip3 install --pre torch torchvision torchaudio torchcodec --index-url https://download.pytorch.org/whl/nightly/cu124
- name: Install FFMPEG and other deps
run: |
conda install cuda-nvrtc=12.4 libnpp cuda-nvcc=12.4 cuda-cudart=12.4 -c nvidia
conda install ffmpeg=7 cmake pkg-config -c conda-forge
ffmpeg -version
- name: Build and install torchcodec
- name: Build torchcodec
run: |
python -m pip install -e ".[dev]" --no-build-isolation -vvv
- name: Install doc dependencies
Expand Down
8 changes: 8 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ We achieve these capabilities through:

How to sample video clips

.. grid-item-card:: :octicon:`file-code;1em`
GPU decoding using TorchCodec
:img-top: _static/img/card-background.svg
:link: generated_examples/basic_cuda_example.html
:link-type: url

A simple example demonstrating Nvidia GPU decoding

.. toctree::
:maxdepth: 1
:caption: TorchCodec documentation
Expand Down
174 changes: 174 additions & 0 deletions examples/basic_cuda_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.
"""
Accelerated video decoding with NVDEC
=====================================

.. _nvdec_tutorial:

**Author**: `Ahmad Sharif <ahmads@meta.com>`__

TorchCodec can use Nvidia hardware to speed-up video decoding. This is called "CUDA Decoding".
CUDA Decoding can be faster than CPU Decoding for the actual decoding step and for
subsequent transform steps like scaling, cropping or rotating. This is because the decode step leaves
the decoded tensor in GPU memory so the GPU doesn't have to fetch from main memory before
running the transform steps. Encoded packets are often much smaller than decoded frames so
CUDA decoding also uses less PCI-e bandwidth.

CUDA Decoding can offer speed-up over CPU Decoding in a few scenarios:

#. You are decoding a large resolution video
#. You are decoding a large batch of videos that's saturting the CPU
#. You want to do whole-image transforms like scaling or convolutions on the decoded tensors
after decoding
#. Your CPU is saturated and you want to free it up for other work

In some scenarios CUDA Decoding can be slower than CPU Decoding, example:

#. If your GPU is already busy and CPU is not
#. If you have small resolution videos and the PCI-e transfer latency is large
#. You want bit-exact results compared to CPU Decoding

It's best to experiment with CUDA Decoding to see if it improves your use-case. With
TorchCodec you can simply pass in a device parameter to the VideoDecoder class to
use CUDA Decoding.

In order use CUDA Decoding will need the following installed in your environment:

#. CUDA-enabled pytorch
#. FFMPEG binaries that support NVDEC-enabled codecs
#. libnpp and nvrtc (these are usually installed when you install the full cuda-toolkit)


FFMPEG versions 5, 6 and 7 from conda-forge are built with NVDEC support and
you can install them by running (for example to install ffmpeg version 7):

.. code-block:: bash

conda install ffmpeg=7 -c conda-forge
conda install libnpp cuda-nvrtc -c nvidia
"""

# %%
#
# .. note::
#
# This tutorial requires FFmpeg libraries compiled with CUDA support.
#
#
import torch

print(f"{torch.__version__=}")
print(f"{torch.cuda.is_available()=}")
print(f"{torch.cuda.get_device_properties(0)=}")


# %%
######################################################################
# Downloading the video
######################################################################
#
# We will use the following video which has the following properties;
#
# - Codec: H.264
# - Resolution: 960x540
# - FPS: 29.97
# - Pixel format: YUV420P
#
# .. raw:: html
#
# <video style="max-width: 100%" controls>
# <source src="https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4" type="video/mp4">
# </video>
import urllib.request

video_file = "video.mp4"
urllib.request.urlretrieve(
"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4",
video_file,
)


# %%
######################################################################
# Decoding with CUDA
######################################################################
#
# To use CUDA decoder, you need to pass in a cuda device to the decoder.
#
from torchcodec.decoders import VideoDecoder

vd = VideoDecoder(video_file, device="cuda:0")
frame = vd[0]

# %%
#
# The video frames are decoded and returned as tensor of NCHW format.

print(frame.data.shape, frame.data.dtype)

# %%
#
# The video frames are left on the GPU memory.

print(frame.data.device)


# %%
######################################################################
# Visualizing Frames
######################################################################
#
# Let's look at the frames decoded by CUDA decoder and compare them
# against equivalent results from the CPU decoders.
import matplotlib.pyplot as plt


def get_frames(timestamps: list[float], device: str):
decoder = VideoDecoder(video_file, device=device)
return [decoder.get_frame_played_at(ts) for ts in timestamps]


def get_numpy_images(frames):
numpy_images = []
for frame in frames:
# We transfer to the CPU so they can be visualized by matplotlib.
numpy_image = frame.data.to("cpu").permute(1, 2, 0).numpy()
numpy_images.append(numpy_image)
return numpy_images


timestamps = [12, 19, 45, 131, 180]
cpu_frames = get_frames(timestamps, device="cpu")
cuda_frames = get_frames(timestamps, device="cuda:0")
cpu_numpy_images = get_numpy_images(cpu_frames)
cuda_numpy_images = get_numpy_images(cuda_frames)


def plot_cpu_and_cuda_images():
n_rows = len(timestamps)
fig, axes = plt.subplots(n_rows, 2, figsize=[12.8, 16.0])
for i in range(n_rows):
axes[i][0].imshow(cpu_numpy_images[i])
axes[i][1].imshow(cuda_numpy_images[i])

axes[0][0].set_title("CPU decoder")
axes[0][1].set_title("CUDA decoder")
plt.setp(axes, xticks=[], yticks=[])
plt.tight_layout()


plot_cpu_and_cuda_images()

# %%
#
# They look visually similar to the human eye but there may be subtle
# differences because CUDA math is not bit-exact to CPU math.
#
first_cpu_frame = cpu_frames[0].data.to("cpu")
first_cuda_frame = cuda_frames[0].data.to("cpu")
frames_equal = torch.equal(first_cpu_frame, first_cuda_frame)
print(f"{frames_equal=}")
Loading