Avoid relying on PyAV-provided video frame count #6929
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and context
They are not reliable.
In particular, MP4 has a feature called "edit lists" that allows you to set a custom playback order for the media data. With edit lists, you could only specify that a particular range of frames should be played, or that a range should be played multiple times, etc. See the following for technical details:
https://developer.apple.com/documentation/quicktime-file-format/edit_list_atom
FFmpeg follows edit lists when decoding videos. However, the frame count returned by PyAV's
Stream.frames
property is the number of frames in the raw media data and does not reflect the modifications applied by an edit list.When we build a video manifest, we use
Stream.frames
if it's non-zero. Therefore, in the presence of an edit list we will obtain a frame count that does not match the actual number of frames that we can get out of the video.FWIW, edit lists are probably not the only way that
Stream.frames
could be inaccurate, it's just the reason behind a specific problem I encountered.Since we already have to handle the situation where
Stream.frames
is not available, just pretend it doesn't exist and always count frames by traversing the entire video. I don't think it even matters much, since we have to do it anyway to build the rest of the manifest.We also have to stop validating the frame count in a user-provided manifest, which is unfortunate, but it doesn't seem worthwhile to decode the entire video just for that.
How has this been tested?
I checked that
dataset_manifest/create.py
now calculates the correct number of frames for a file with an edit list. I also tested the same file by uploading it to CVAT.Checklist
develop
branch[ ] I have updated the documentation accordingly[ ] I have added tests to cover my changes[ ] I have linked related issues (see GitHub docs)[ ] I have increased versions of npm packages if it is necessary(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.