-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an operator for receiving video metadata #5630
Comments
Hi @treasan, Thank you for reaching out. Yes, that sounds like a good feature to add. Let us add this to our ToDo list. |
Hey @JanuszL I am training a model, which expects video snippets with a certain duration (in seconds). Furthermore it expects a timestep for each frame, which is used for a temporal positional encoding. |
Thank you for the clarification. In this case, I think it would be best to return this data directly from the video decoder (at least timesteps for each frame), and or extend the decoder to decode not the number of frames but the duration. |
Hello @treasan thanks for creating the issue. To better understand the requirement I wanted to ask do your use case expect the samples to have the same number of frames or the number of frames varies per sample. If it varies is it due to the variable frame rates in the video or variable duration of frames in seconds or both? If it varies what is expected type and shape of the output in your desired framework? |
Please have a look at another issue/question I have submitted #5626. I explain my pipeline there in more detail. tl;dr:
So, optimal for my use-case would be a DALI operator that peeks this metadata from raw video bytes, as I am then able to filter them out before the decoding step (more efficient). This might be similar to the |
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Should have (e.g. Adoption is possible, but the performance shortcomings make the solution inferior).
Please provide a clear description of problem this feature solves
The sample rate (fps) of videos may very and hence the time period a fixed number of frames represent also varies. Having access to either the fps, duration or even the concrete timesteps of each frame is often crucial in many tasks where the actual duration in seconds is more important than the number of frames. For example, I am decoding raw video bytes from a web dataset using the experimental video decoder and I am forced to retreat to other libraries that can give me this kind of information from the raw video bytes (specifically, pytorch's VideoReader API).
Feature Description
As a user I want to be able to extract information about the sample rate of a video alongside its decoded frames.
Describe your ideal solution
A new DALI operator that extracts the desired metadata from raw video bytes.
An example video decoding pipeline reading from a webdataset (raw video bytes could also come from an external source):
Describe any alternatives you have considered
No response
Additional context
No response
Check for duplicates
The text was updated successfully, but these errors were encountered: