Add an operator for receiving video metadata #5630

treasan · 2024-09-10T15:39:43Z

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Should have (e.g. Adoption is possible, but the performance shortcomings make the solution inferior).

Please provide a clear description of problem this feature solves

The sample rate (fps) of videos may very and hence the time period a fixed number of frames represent also varies. Having access to either the fps, duration or even the concrete timesteps of each frame is often crucial in many tasks where the actual duration in seconds is more important than the number of frames. For example, I am decoding raw video bytes from a web dataset using the experimental video decoder and I am forced to retreat to other libraries that can give me this kind of information from the raw video bytes (specifically, pytorch's VideoReader API).

Feature Description

As a user I want to be able to extract information about the sample rate of a video alongside its decoded frames.

Describe your ideal solution

A new DALI operator that extracts the desired metadata from raw video bytes.
An example video decoding pipeline reading from a webdataset (raw video bytes could also come from an external source):

@pipeline_def
def pipeline(tar_paths):
    raw_video = fn.readers.webdataset(tar_paths, ...)
    duration, fps = fn.get_video_metadata(...)
    video = fn.experimental.decoders.video(raw_video)
    return video, duration, fps

Describe any alternatives you have considered

No response

Additional context

No response

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

JanuszL · 2024-09-10T16:37:14Z

Hi @treasan,

Thank you for reaching out. Yes, that sounds like a good feature to add. Let us add this to our ToDo list.
Could you also tell me how do you want to utilize this data further? To drive transformations or to feed the model?

treasan · 2024-09-10T18:08:31Z

Hey @JanuszL

I am training a model, which expects video snippets with a certain duration (in seconds). Furthermore it expects a timestep for each frame, which is used for a temporal positional encoding.

JanuszL · 2024-09-10T18:24:05Z

Thank you for the clarification. In this case, I think it would be best to return this data directly from the video decoder (at least timesteps for each frame), and or extend the decoder to decode not the number of frames but the duration.

awolant · 2024-09-10T19:08:27Z

Hello @treasan

thanks for creating the issue. To better understand the requirement I wanted to ask do your use case expect the samples to have the same number of frames or the number of frames varies per sample. If it varies is it due to the variable frame rates in the video or variable duration of frames in seconds or both? If it varies what is expected type and shape of the output in your desired framework?

treasan · 2024-09-10T19:28:05Z

Please have a look at another issue/question I have submitted #5626. I explain my pipeline there in more detail.

tl;dr:

DALI pipeline: Loading raw video bytes from webdataset
Python function: Peeking duration and fps metadata from raw video bytes and filter out unwanted videos beforehand (e.g. too short ones)
DALI pipeline: Get raw video bytes, duration, fps from external source --> decode video --> return decoded video, duration, fps
Python function: Cut out multiple consecutive snippets of certain duration (e.g. 3 secs) of respective videos based on fps/duration metadata. These snippets constitute one training sample. They get batched and fed to the model alongside their timesteps that were also calculated based on the fps/duration metadata.

So, optimal for my use-case would be a DALI operator that peeks this metadata from raw video bytes, as I am then able to filter them out before the decoding step (more efficient). This might be similar to the peek_image_shape operator, which gives certain information about an encoded image.

treasan added the enhancement New feature or request label Sep 10, 2024

dali-automaton assigned szkarpinski Sep 10, 2024

JanuszL assigned awolant and unassigned szkarpinski Sep 10, 2024

awolant added the Video Video related feature/question label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an operator for receiving video metadata #5630

Add an operator for receiving video metadata #5630

treasan commented Sep 10, 2024

JanuszL commented Sep 10, 2024

treasan commented Sep 10, 2024 •

edited

Loading

JanuszL commented Sep 10, 2024

awolant commented Sep 10, 2024

treasan commented Sep 10, 2024 •

edited

Loading

Add an operator for receiving video metadata #5630

Add an operator for receiving video metadata #5630

Comments

treasan commented Sep 10, 2024

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Feature Description

Describe your ideal solution

Describe any alternatives you have considered

Additional context

Check for duplicates

JanuszL commented Sep 10, 2024

treasan commented Sep 10, 2024 • edited Loading

JanuszL commented Sep 10, 2024

awolant commented Sep 10, 2024

treasan commented Sep 10, 2024 • edited Loading

treasan commented Sep 10, 2024 •

edited

Loading

treasan commented Sep 10, 2024 •

edited

Loading