Skip to content

Some H265 encoded videos return an error when seeking to particular points in time #179

Closed
@ahmadsharif1

Description

@ahmadsharif1

🐛 Describe the bug

# First generate a test video:
conda install -c conda-forge x265

# Download and build ffmpeg
git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
./configure --enable-nonfree --enable-gpl --prefix=$(readlink -f ../bin) --enable-libx265  --enable-rpath --extra-ldflags=-Wl,-rpath=$CONDA_PREFIX/lib --enable-filter=drawtext --enable-libfontconfig --enable-libfreetype --enable-libharfbuzz
ffmpeg -f lavfi -i color=size=128x128:duration=1:rate=10:color=blue -vf "drawtext=fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" -vcodec libx265 -pix_fmt yuv420p -g 2 -crf 10 h265_video.mp4 -y

# Now use torchcodec to seek into this file at timestamp 0.5 and write to a bmp file:
$ cat test.py

from torchcodec.decoders._simple_video_decoder import SimpleVideoDecoder
import sys
from PIL import Image

# Assume `rgb_tensor` is your PyTorch tensor with shape (3, H, W)
# The values in `rgb_tensor` should be in the range [0, 1]
def save_tensor_as_bmp(tensor, filename):
    # Convert the tensor to a numpy array
    numpy_array = tensor.mul(1).byte().cpu().numpy()

    # Reorder dimensions from (3, H, W) to (H, W, 3)
    numpy_array = numpy_array.transpose(1, 2, 0)

    # Create a PIL image from the numpy array
    image = Image.fromarray(numpy_array)

    # Save the image as a BMP file
    image.save(filename, format='BMP')



def main():
    video_path = sys.argv[1]
    ts = float(sys.argv[2])
    print(video_path)
    decoder = SimpleVideoDecoder(video_path)
    print(f"Getting frame at {ts=}")
    frame = decoder.get_frame_displayed_at(seconds=ts).data
    bmp_file = f"{video_path}.time{ts}.bmp"
    print(f"Saving to bmp file: {bmp_file}")
    save_tensor_as_bmp(frame, bmp_file)


if __name__ == "__main__":
    main()

# Run the test script like so:

python test.py h265_video.mp4 0.5

This actually fails right now (it throws an exception "no more frames to decode").

With #178 it will get "fixed" in the sense that at least we wont throw an exception, but we will return the wrong frame. i.e. if you run it you will get a bmp file with "Frame 6" instead of "Frame 5". That is a bug because the frame with "Frame 5" is the one that is displayed at timestamp=0.5 (inclusive) to timestamp=0.6 (exclusive).

The underlying cause of this buggy behavior is an FFMPEG bug with H265 videos. When we call avformat_seek_file(), with a max_ts set to an int64 timebase value corresponding to time=0.5, it seeks past our frame to the next frame.

I have filed a bug upstream about this:

https://trac.ffmpeg.org/ticket/11137

Until that bug is resolved, what we can do is to use our own index to seek into the file as opposed to letting FFMPEG seek for us. I will do that in a subsequent PR.

Versions

This bug is for torchcodec v0.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions