Some H265 encoded videos return an error when seeking to particular points in time

### 🐛 Describe the bug

```
# First generate a test video:
conda install -c conda-forge x265

# Download and build ffmpeg
git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
./configure --enable-nonfree --enable-gpl --prefix=$(readlink -f ../bin) --enable-libx265  --enable-rpath --extra-ldflags=-Wl,-rpath=$CONDA_PREFIX/lib --enable-filter=drawtext --enable-libfontconfig --enable-libfreetype --enable-libharfbuzz
ffmpeg -f lavfi -i color=size=128x128:duration=1:rate=10:color=blue -vf "drawtext=fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" -vcodec libx265 -pix_fmt yuv420p -g 2 -crf 10 h265_video.mp4 -y

# Now use torchcodec to seek into this file at timestamp 0.5 and write to a bmp file:
$ cat test.py

from torchcodec.decoders._simple_video_decoder import SimpleVideoDecoder
import sys
from PIL import Image

# Assume `rgb_tensor` is your PyTorch tensor with shape (3, H, W)
# The values in `rgb_tensor` should be in the range [0, 1]
def save_tensor_as_bmp(tensor, filename):
    # Convert the tensor to a numpy array
    numpy_array = tensor.mul(1).byte().cpu().numpy()

    # Reorder dimensions from (3, H, W) to (H, W, 3)
    numpy_array = numpy_array.transpose(1, 2, 0)

    # Create a PIL image from the numpy array
    image = Image.fromarray(numpy_array)

    # Save the image as a BMP file
    image.save(filename, format='BMP')



def main():
    video_path = sys.argv[1]
    ts = float(sys.argv[2])
    print(video_path)
    decoder = SimpleVideoDecoder(video_path)
    print(f"Getting frame at {ts=}")
    frame = decoder.get_frame_displayed_at(seconds=ts).data
    bmp_file = f"{video_path}.time{ts}.bmp"
    print(f"Saving to bmp file: {bmp_file}")
    save_tensor_as_bmp(frame, bmp_file)


if __name__ == "__main__":
    main()

# Run the test script like so:

python test.py h265_video.mp4 0.5
```


This actually fails right now (it throws an exception "no more frames to decode").

With https://github.com/pytorch/torchcodec/pull/178 it will get "fixed" in the sense that at least we wont throw an exception, but we will return the wrong frame. i.e. if you run it you will get a bmp file with "Frame 6" instead of "Frame 5". That is a bug because the frame with "Frame 5" is the one that is displayed at timestamp=0.5 (inclusive) to timestamp=0.6 (exclusive).

The underlying cause of this buggy behavior is an FFMPEG bug with H265 videos. When we call `avformat_seek_file()`, with  a max_ts set to an int64 timebase value corresponding to time=0.5, it seeks past our frame to the next frame.

I have filed a bug upstream about this:

https://trac.ffmpeg.org/ticket/11137

Until that bug is resolved, what we can do is to use our own index to seek into the file as opposed to letting FFMPEG seek for us. I will do that in a subsequent PR.



### Versions

This bug is for torchcodec v0.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some H265 encoded videos return an error when seeking to particular points in time #179

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some H265 encoded videos return an error when seeking to particular points in time #179

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions