-
Couldn't load subscription status.
- Fork 66
Create Python API for VideoEncoder #990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,2 @@ | ||
| from ._audio_encoder import AudioEncoder # noqa | ||
| from ._video_encoder import VideoEncoder # noqa |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| from pathlib import Path | ||
| from typing import Union | ||
|
|
||
| import torch | ||
| from torch import Tensor | ||
|
|
||
| from torchcodec import _core | ||
|
|
||
|
|
||
| class VideoEncoder: | ||
| """A video encoder. | ||
| Args: | ||
| frames (``torch.Tensor``): The frames to encode. This must be a 4D | ||
| tensor of shape ``(N, C, H, W)`` where N is the number of frames, | ||
| C is 3 channels (RGB), H is height, and W is width. | ||
| A 3D tensor of shape ``(C, H, W)`` is also accepted as a single RGB frame. | ||
| Values must be uint8 in the range ``[0, 255]``. | ||
| frame_rate (int): The frame rate to use when encoding the | ||
| **input** ``frames``. | ||
|
Comment on lines
+19
to
+20
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My interpretation of this description is that this parameter actually defines the the frame rate of the encoded output, because of the "frame rate to use when encoding" part. I think it might be less ambiguous as
|
||
| """ | ||
|
|
||
| def __init__(self, frames: Tensor, *, frame_rate: int): | ||
| torch._C._log_api_usage_once("torchcodec.encoders.VideoEncoder") | ||
| if not isinstance(frames, Tensor): | ||
| raise ValueError(f"Expected frames to be a Tensor, got {type(frames) = }.") | ||
| if frames.ndim == 3: | ||
| # make it 4D and assume single RGB frame, CHW -> NCHW | ||
| frames = torch.unsqueeze(frames, 0) | ||
| if frames.ndim != 4: | ||
| raise ValueError(f"Expected 3D or 4D frames, got {frames.shape = }.") | ||
| if frames.dtype != torch.uint8: | ||
| raise ValueError(f"Expected uint8 frames, got {frames.dtype = }.") | ||
| if frame_rate <= 0: | ||
| raise ValueError(f"{frame_rate = } must be > 0.") | ||
|
|
||
| self._frames = frames | ||
| self._frame_rate = frame_rate | ||
|
|
||
| def to_file( | ||
| self, | ||
| dest: Union[str, Path], | ||
| ) -> None: | ||
| """Encode frames into a file. | ||
| Args: | ||
| dest (str or ``pathlib.Path``): The path to the output file, e.g. | ||
| ``video.mp4``. The extension of the file determines the video | ||
| format and container. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the distinction between "format" and "container" here? I would just use one or the other? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The terms are used interchangeably, so I included both here to be understandable to users of both terms. Let me know if that is actually more confusing that just naming one term. I might have added the distinction after discovering the format |
||
| """ | ||
| _core.encode_video_to_file( | ||
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| filename=str(dest), | ||
| ) | ||
|
|
||
| def to_tensor( | ||
| self, | ||
| format: str, | ||
| ) -> Tensor: | ||
| """Encode frames into raw bytes, as a 1D uint8 Tensor. | ||
| Args: | ||
| format (str): The format of the encoded frames, e.g. "mp4", "mov", | ||
| "mkv", "avi", "webm", "flv", or "gif" | ||
| Returns: | ||
| Tensor: The raw encoded bytes as 4D uint8 Tensor. | ||
| """ | ||
| return _core.encode_video_to_tensor( | ||
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| format=format, | ||
| ) | ||
|
|
||
| def to_file_like( | ||
| self, | ||
| file_like, | ||
| format: str, | ||
| ) -> None: | ||
| """Encode frames into a file-like object. | ||
| Args: | ||
| file_like: A file-like object that supports ``write()`` and | ||
| ``seek()`` methods, such as io.BytesIO(), an open file in binary | ||
| write mode, etc. Methods must have the following signature: | ||
| ``write(data: bytes) -> int`` and ``seek(offset: int, whence: | ||
| int = 0) -> int``. | ||
| format (str): The format of the encoded frames, e.g. "mp4", "mov", | ||
| "mkv", "avi", "webm", "flv", or "gif". | ||
| """ | ||
| _core.encode_video_to_file_like( | ||
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| format=format, | ||
| file_like=file_like, | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q - Do we need to support that? I'm wondering if it makes a lot of sense to just encode a single image as a video. I suspect this was made to mimic the AudioEncoder behavior but that was a different use-case. In the AudioEncoder we want to allow for 1D audio to be supported as it's still a valid waveform. But I don't think we need to treat a single frame as a valid video.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the use case is for encoding an image as a video, but since FFmpeg allows encoding an image to video, I believe we can retain this functionality for a relatively low cost.