Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ffmpeg-python backend for torchvision.io.write_video? #8569

Closed
adaGrad1 opened this issue Aug 8, 2024 · 1 comment · Fixed by #8576
Closed

Allow ffmpeg-python backend for torchvision.io.write_video? #8569

adaGrad1 opened this issue Aug 8, 2024 · 1 comment · Fixed by #8576

Comments

@adaGrad1
Copy link

adaGrad1 commented Aug 8, 2024

🚀 The feature

Create another backend for torchvision.io.write_video which uses ffmpeg-python as a backend, but which otherwise has exactly the same interface/functionality.

Motivation, pitch

torchvision.io.write_video currently calls PyAV, which in turn is a wrapper for ffmpeg. PyAV has an issue which seems still unresolved where setting the CRF (constant rate factor) through the options has no effect. This issue has been referenced as recently as March of this year. As far as I can tell, adjusting CRF is the canonical way to tune a video's level of compression. Adding support for ffmpeg-python as a backend would let users tune CRF, which would allow arbitrary levels of compression.

Alternatives

If there is some other set of options which can be passed to write_video to alter the level of compression, that would be an acceptable alternative (at least for my use-case). In this case, it would be ideal to include this alternative set of options in the write_video documentation as an example.

Additional context

I already kind of got it working in a notebook, but it's missing support for audio and such.

# Define output video parameters
output_filename = 'output_video.mp4'
fps = 30
codec = 'libx264' 

# Create the input process from the NumPy array
process1 = (
    ffmpeg
    .input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(video_array.shape[2], video_array.shape[1]))
    .output(output_filename, pix_fmt='yuv420p', r=fps, vcodec=codec, crf=10)
    .overwrite_output()
    .run_async(pipe_stdin=True)
)

# Write the NumPy array to the input pipe
for frame in video_array:
    process1.stdin.write(frame.tobytes())

# Close the input pipe
process1.stdin.close()

# Wait for the ffmpeg process to finish
process1.wait()

crf=10 produces something good-looking, while crf=50 produces something very compressed-looking as expected.

@NicolasHug
Copy link
Member

hi @adaGrad1 , and thank you for the feature request. We'll be making a wider announcement soon, but we plan to migrate video decoding/encoding efforts away from torchvision/torchaudio, and consolidate all that within https://github.com/pytorch/torchcodec/. At this time video-encoding isn't implemented in torchcodec, but that can be in scope.
It does mean however that we won't be able to include additional video encoding capabilities to torchvision, so I'm afraid we won't be adding the ffmpeg-python backend in vision.
We'll definitely keep that crf issue in mind while working on the torchcodec encoder though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants