Skip to content

Commit

Permalink
Avoid relying on PyAV-provided video frame count
Browse files Browse the repository at this point in the history
They are not reliable.

In particular, MP4 has a feature called "edit lists" that allows you to set
a custom playback order for the media data. With edit lists, you could only
specify that a particular range of frames should be played, or that a range
should be played multiple times, etc. See the following for technical
details:

https://developer.apple.com/documentation/quicktime-file-format/edit_list_atom

FFmpeg follows edit lists when decoding videos. However, the frame count
returned by PyAV's `Stream.frames` property is the number of frames in the
raw media data and does not reflect the modifications applied by an edit
list.

When we build a video manifest, we use `Stream.frames` if it's non-zero.
Therefore, in the presence of an edit list we will obtain a frame count that
does not match the actual number of frames that we can get out of the video.

FWIW, edit lists are probably not the only way that `Stream.frames` could be
inaccurate, it's just the reason behind a specific problem I encountered.

Since we already have to handle the situation where `Stream.frames` is not
available, just pretend it doesn't exist and always count frames by
traversing the entire video. I don't think it even matters much, since we
have to do it anyway to build the rest of the manifest.

We also have to stop validating the frame count in a user-provided manifest,
which is unfortunate, but it doesn't seem worthwhile to decode the entire
video just for that.
  • Loading branch information
SpecLad committed Sep 29, 2023
1 parent d497bb6 commit b98f3d3
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 26 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

- TDB
- Incorrectly determined video frame count when the video contains an MP4 edit list
(<https://github.com/opencv/cvat/pull/6929>)

### Security

Expand Down
1 change: 0 additions & 1 deletion cvat/apps/engine/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -932,7 +932,6 @@ def _update_status(msg):
manifest_path=db_data.get_manifest_path())
manifest.init_index()
manifest.validate_seek_key_frames()
manifest.validate_frame_numbers()
assert len(manifest) > 0, 'No key frames.'

all_frames = manifest.video_length
Expand Down
33 changes: 9 additions & 24 deletions utils/dataset_manifest/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,6 @@ def __init__(self, source_path, chunk_size, force):
)
self.height, self.width = (frame.height, frame.width)

# not all videos contain information about numbers of frames
if video_stream.frames:
self._frames_number = video_stream.frames

return

@property
Expand All @@ -63,6 +59,9 @@ def _get_video_stream(container):
return video_stream

def __len__(self):
assert self._frames_number is not None, \
"The length will not be available until the reader is iterated all the way through at least once"

return self._frames_number

@property
Expand Down Expand Up @@ -498,7 +497,7 @@ def _write_base_information(self, file):

def _write_core_part(self, file, _tqdm):
iterable_obj = self._reader if _tqdm is None else \
_tqdm(self._reader, desc="Manifest creating", total=len(self._reader))
_tqdm(self._reader, desc="Manifest creating", total=float("inf"))
for item in iterable_obj:
if isinstance(item, tuple):
json_item = json.dumps({
Expand All @@ -510,17 +509,12 @@ def _write_core_part(self, file, _tqdm):

def create(self, *, _tqdm=None): # pylint: disable=arguments-differ
""" Creating and saving a manifest file """
if not len(self._reader):
tmp_file = StringIO()
self._write_core_part(tmp_file, _tqdm)
tmp_file = StringIO()
self._write_core_part(tmp_file, _tqdm)

with open(self._manifest.path, 'w') as manifest_file:
self._write_base_information(manifest_file)
manifest_file.write(tmp_file.getvalue())
else:
with open(self._manifest.path, 'w') as manifest_file:
self._write_base_information(manifest_file)
self._write_core_part(manifest_file, _tqdm)
with open(self._manifest.path, 'w') as manifest_file:
self._write_base_information(manifest_file)
manifest_file.write(tmp_file.getvalue())

self.set_index()

Expand Down Expand Up @@ -576,15 +570,6 @@ def validate_seek_key_frames(self):
self.validate_key_frame(container, video_stream, key_frame)
last_key_frame = key_frame

def validate_frame_numbers(self):
with closing(av.open(self._source_path, mode='r')) as container:
video_stream = self._get_video_stream(container)
# not all videos contain information about numbers of frames
frames = video_stream.frames
if frames:
assert frames == self.video_length, "The uploaded manifest does not match the video"
return

class ImageProperties(dict):
@property
def full_name(self):
Expand Down

0 comments on commit b98f3d3

Please sign in to comment.