diff --git a/.coveragerc b/.coveragerc index 6e95757cd008..c669baf71266 100644 --- a/.coveragerc +++ b/.coveragerc @@ -5,6 +5,7 @@ branch = true source = cvat/apps/ utils/cli/ + utils/dataset_manifest omit = cvat/settings/* diff --git a/CHANGELOG.md b/CHANGELOG.md index 42b78c7eb4cd..d80d97301cfb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -26,6 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [Backup/Restore guide](cvat/apps/documentation/backup_guide.md) () - Label deletion from tasks and projects () - [Market-1501](https://www.aitribune.com/dataset/2018051063) format support () +- Ability of upload manifest for dataset with images () - Annotations filters UI using react-awesome-query-builder (https://github.com/openvinotoolkit/cvat/issues/1418) ### Changed @@ -40,6 +41,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Image visualizations settings on canvas for faster access () - Better scale management of left panel when screen is too small () - Improved error messages for annotation import () +- Using manifest support instead video meta information and dummy chunks () ### Deprecated diff --git a/cvat/apps/documentation/data_on_fly.md b/cvat/apps/documentation/data_on_fly.md index 5963bdf30133..c60263d933fa 100644 --- a/cvat/apps/documentation/data_on_fly.md +++ b/cvat/apps/documentation/data_on_fly.md @@ -2,40 +2,25 @@ ## Description -Data on the fly processing is a way of working with data, the main idea of which is as follows: -Minimum necessary meta information is collected, when task is created. -This meta information allows in the future to create a necessary chunks when receiving a request from a client. +Data on the fly processing is a way of working with data, the main idea of which is as follows: when creating a task, +the minimum necessary meta information is collected. This meta information allows in the future to create necessary +chunks when receiving a request from a client. -Generated chunks are stored in a cache of limited size with a policy of evicting less popular items. +Generated chunks are stored in a cache of the limited size with a policy of evicting less popular items. -When a request received from a client, the required chunk is searched for in the cache. -If the chunk does not exist yet, it is created using a prepared meta information and then put into the cache. +When a request is received from a client, the required chunk is searched for in the cache. If the chunk does not exist +yet, it is created using prepared meta information and then put into the cache. This method of working with data allows: - reduce the task creation time. -- store data in a cache of limited size with a policy of evicting less popular items. +- store data in a cache of the limited size with a policy of evicting less popular items. -## Prepare meta information +Unfortunately, this method will not work for all videos with a valid manifest file. If there are not enough keyframes +in the video for smooth video decoding, the task will be created in another way. Namely, all chunks will be prepared +during task creation, which may take some time. -Different meta information is collected for different types of uploaded data. +#### Uploading a manifest with data -### Video - -For video, this is a valid mapping of key frame numbers and their timestamps. This information is saved to `meta_info.txt`. - -Unfortunately, this method will not work for all videos with valid meta information. -If there are not enough keyframes in the video for smooth video decoding, the task will be created in the old way. - -#### Uploading meta information along with data - -When creating a task, you can upload a file with meta information along with the video, -which will further reduce the time for creating a task. -You can see how to prepare meta information [here](/utils/prepare_meta_information/README.md). - -It is worth noting that the generated file also contains information about the number of frames in the video at the end. - -### Images - -Mapping of chunk number and paths to images that should enter the chunk -is saved at the time of creating a task in a files `dummy_{chunk_number}.txt` +When creating a task, you can upload a `manifest.jsonl` file along with the video or dataset with images. +You can see how to prepare it [here](/utils/dataset_manifest/README.md). diff --git a/cvat/apps/documentation/faq.md b/cvat/apps/documentation/faq.md index 16cfd620cac1..87b778ca4308 100644 --- a/cvat/apps/documentation/faq.md +++ b/cvat/apps/documentation/faq.md @@ -15,7 +15,6 @@ - [How to create a task with multiple jobs](#how-to-create-a-task-with-multiple-jobs) - [How to transfer CVAT to another machine](#how-to-transfer-cvat-to-another-machine) - ## How to update CVAT Before upgrading, please follow the [backup guide](backup_guide.md) and backup all CVAT volumes. @@ -151,4 +150,5 @@ Set the segment size when you create a new task, this option is available in the [Advanced configuration](user_guide.md#advanced-configuration) section. ## How to transfer CVAT to another machine + Follow the [backup/restore guide](backup_guide.md#how-to-backup-all-cvat-data). diff --git a/cvat/apps/documentation/user_guide.md b/cvat/apps/documentation/user_guide.md index 3904b0158328..77b334f14a91 100644 --- a/cvat/apps/documentation/user_guide.md +++ b/cvat/apps/documentation/user_guide.md @@ -153,8 +153,8 @@ Go to the [Django administration panel](http://localhost:8080/admin). There you **Select files**. Press tab `My computer` to choose some files for annotation from your PC. If you select tab `Connected file share` you can choose files for annotation from your network. If you select ` Remote source` , you'll see a field where you can enter a list of URLs (one URL per line). - If you upload a video data and select `Use cache` option, you can along with the video file attach a file with meta information. - You can find how to prepare it [here](/utils/prepare_meta_information/README.md). + If you upload a video or dataset with images and select `Use cache` option, you can attach a `manifest.jsonl` file. + You can find how to prepare it [here](/utils/dataset_manifest/README.md). ![](static/documentation/images/image127.jpg) @@ -1157,8 +1157,6 @@ Intelligent scissors is an CV method of creating a polygon by placing points wit The distance between the adjacent points is limited by the threshold of action, displayed as a red square which is tied to the cursor. - - - First, select the label and then click on the `intelligent scissors` button. ![](static/documentation/images/image199.jpg) diff --git a/cvat/apps/engine/cache.py b/cvat/apps/engine/cache.py index 5ea9a1e87ccd..077e6ef14fe9 100644 --- a/cvat/apps/engine/cache.py +++ b/cvat/apps/engine/cache.py @@ -1,4 +1,4 @@ -# Copyright (C) 2020 Intel Corporation +# Copyright (C) 2020-2021 Intel Corporation # # SPDX-License-Identifier: MIT @@ -9,9 +9,9 @@ from django.conf import settings from cvat.apps.engine.media_extractors import (Mpeg4ChunkWriter, - Mpeg4CompressedChunkWriter, ZipChunkWriter, ZipCompressedChunkWriter) + Mpeg4CompressedChunkWriter, ZipChunkWriter, ZipCompressedChunkWriter, + ImageDatasetManifestReader, VideoDatasetManifestReader) from cvat.apps.engine.models import DataChoice, StorageChoice -from cvat.apps.engine.prepare import PrepareInfo from cvat.apps.engine.models import DimensionType class CacheInteraction: @@ -51,17 +51,24 @@ def prepare_chunk_buff(self, db_data, quality, chunk_number): StorageChoice.LOCAL: db_data.get_upload_dirname(), StorageChoice.SHARE: settings.SHARE_ROOT }[db_data.storage] - if os.path.exists(db_data.get_meta_path()): + if hasattr(db_data, 'video'): source_path = os.path.join(upload_dir, db_data.video.path) - meta = PrepareInfo(source_path=source_path, meta_path=db_data.get_meta_path()) - for frame in meta.decode_needed_frames(chunk_number, db_data): - images.append(frame) - writer.save_as_chunk([(image, source_path, None) for image in images], buff) + reader = VideoDatasetManifestReader(manifest_path=db_data.get_manifest_path(), + source_path=source_path, chunk_number=chunk_number, + chunk_size=db_data.chunk_size, start=db_data.start_frame, + stop=db_data.stop_frame, step=db_data.get_frame_step()) + for frame in reader: + images.append((frame, source_path, None)) else: - with open(db_data.get_dummy_chunk_path(chunk_number), 'r') as dummy_file: - images = [os.path.join(upload_dir, line.strip()) for line in dummy_file] - writer.save_as_chunk([(image, image, None) for image in images], buff) + reader = ImageDatasetManifestReader(manifest_path=db_data.get_manifest_path(), + chunk_number=chunk_number, chunk_size=db_data.chunk_size, + start=db_data.start_frame, stop=db_data.stop_frame, + step=db_data.get_frame_step()) + for item in reader: + source_path = os.path.join(upload_dir, f"{item['name']}{item['extension']}") + images.append((source_path, source_path, None)) + writer.save_as_chunk(images, buff) buff.seek(0) return buff, mime_type diff --git a/cvat/apps/engine/media_extractors.py b/cvat/apps/engine/media_extractors.py index 20b2d2d16a35..22503f38be16 100644 --- a/cvat/apps/engine/media_extractors.py +++ b/cvat/apps/engine/media_extractors.py @@ -11,6 +11,7 @@ import struct import re from abc import ABC, abstractmethod +from contextlib import closing import av import numpy as np @@ -25,6 +26,7 @@ ImageFile.LOAD_TRUNCATED_IMAGES = True from cvat.apps.engine.mime_types import mimetypes +from utils.dataset_manifest import VideoManifestManager, ImageManifestManager def get_mime(name): for type_name, type_def in MEDIA_TYPES.items(): @@ -121,6 +123,10 @@ def get_image_size(self, i): img = Image.open(self._source_path[i]) return img.width, img.height + @property + def absolute_source_paths(self): + return [self.get_path(idx) for idx, _ in enumerate(self._source_path)] + class DirectoryReader(ImageListReader): def __init__(self, source_path, step=1, start=0, stop=None): image_paths = [] @@ -311,6 +317,103 @@ def get_image_size(self, i): image = (next(iter(self)))[0] return image.width, image.height +class FragmentMediaReader: + def __init__(self, chunk_number, chunk_size, start, stop, step=1): + self._start = start + self._stop = stop + 1 # up to the last inclusive + self._step = step + self._chunk_number = chunk_number + self._chunk_size = chunk_size + self._start_chunk_frame_number = \ + self._start + self._chunk_number * self._chunk_size * self._step + self._end_chunk_frame_number = min(self._start_chunk_frame_number \ + + (self._chunk_size - 1) * self._step + 1, self._stop) + self._frame_range = self._get_frame_range() + + @property + def frame_range(self): + return self._frame_range + + def _get_frame_range(self): + frame_range = [] + for idx in range(self._start, self._stop, self._step): + if idx < self._start_chunk_frame_number: + continue + elif idx < self._end_chunk_frame_number and \ + not ((idx - self._start_chunk_frame_number) % self._step): + frame_range.append(idx) + elif (idx - self._start_chunk_frame_number) % self._step: + continue + else: + break + return frame_range + +class ImageDatasetManifestReader(FragmentMediaReader): + def __init__(self, manifest_path, **kwargs): + super().__init__(**kwargs) + self._manifest = ImageManifestManager(manifest_path) + self._manifest.init_index() + + def __iter__(self): + for idx in self._frame_range: + yield self._manifest[idx] + +class VideoDatasetManifestReader(FragmentMediaReader): + def __init__(self, manifest_path, **kwargs): + self.source_path = kwargs.pop('source_path') + super().__init__(**kwargs) + self._manifest = VideoManifestManager(manifest_path) + self._manifest.init_index() + + def _get_nearest_left_key_frame(self): + if self._start_chunk_frame_number >= \ + self._manifest[len(self._manifest) - 1].get('number'): + left_border = len(self._manifest) - 1 + else: + left_border = 0 + delta = len(self._manifest) + while delta: + step = delta // 2 + cur_position = left_border + step + if self._manifest[cur_position].get('number') < self._start_chunk_frame_number: + cur_position += 1 + left_border = cur_position + delta -= step + 1 + else: + delta = step + if self._manifest[cur_position].get('number') > self._start_chunk_frame_number: + left_border -= 1 + frame_number = self._manifest[left_border].get('number') + timestamp = self._manifest[left_border].get('pts') + return frame_number, timestamp + + def __iter__(self): + start_decode_frame_number, start_decode_timestamp = self._get_nearest_left_key_frame() + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = next(stream for stream in container.streams if stream.type == 'video') + video_stream.thread_type = 'AUTO' + + container.seek(offset=start_decode_timestamp, stream=video_stream) + + frame_number = start_decode_frame_number - 1 + for packet in container.demux(video_stream): + for frame in packet.decode(): + frame_number += 1 + if frame_number in self._frame_range: + if video_stream.metadata.get('rotate'): + frame = av.VideoFrame().from_ndarray( + rotate_image( + frame.to_ndarray(format='bgr24'), + 360 - int(container.streams.video[0].metadata.get('rotate')) + ), + format ='bgr24' + ) + yield frame + elif frame_number < self._frame_range[-1]: + continue + else: + return + class IChunkWriter(ABC): def __init__(self, quality, dimension=DimensionType.DIM_2D): self._image_quality = quality diff --git a/cvat/apps/engine/migrations/0038_manifest.py b/cvat/apps/engine/migrations/0038_manifest.py new file mode 100644 index 000000000000..7447aa6f5740 --- /dev/null +++ b/cvat/apps/engine/migrations/0038_manifest.py @@ -0,0 +1,83 @@ +# Generated by Django 3.1.1 on 2021-02-20 08:36 + +import glob +import os +from re import search + +from django.conf import settings +from django.db import migrations + +from cvat.apps.engine.models import (DimensionType, StorageChoice, + StorageMethodChoice) +from utils.dataset_manifest import ImageManifestManager, VideoManifestManager + +def migrate_data(apps, shema_editor): + Data = apps.get_model("engine", "Data") + query_set = Data.objects.filter(storage_method=StorageMethodChoice.CACHE) + for db_data in query_set: + try: + upload_dir = '{}/{}/raw'.format(settings.MEDIA_DATA_ROOT, db_data.id) + if os.path.exists(os.path.join(upload_dir, 'meta_info.txt')): + os.remove(os.path.join(upload_dir, 'meta_info.txt')) + else: + for path in glob.glob(f'{upload_dir}/dummy_*.txt'): + os.remove(path) + # it's necessary for case with long data migration + if os.path.exists(os.path.join(upload_dir, 'manifest.jsonl')): + continue + data_dir = upload_dir if db_data.storage == StorageChoice.LOCAL else settings.SHARE_ROOT + if hasattr(db_data, 'video'): + media_file = os.path.join(data_dir, db_data.video.path) + manifest = VideoManifestManager(manifest_path=upload_dir) + meta_info = manifest.prepare_meta(media_file=media_file) + manifest.create(meta_info) + manifest.init_index() + else: + manifest = ImageManifestManager(manifest_path=upload_dir) + sources = [] + if db_data.storage == StorageChoice.LOCAL: + for (root, _, files) in os.walk(data_dir): + sources.extend([os.path.join(root, f) for f in files]) + sources.sort() + # using share, this means that we can not explicitly restore the entire data structure + else: + sources = [os.path.join(data_dir, db_image.path) for db_image in db_data.images.all().order_by('frame')] + if any(list(filter(lambda x: x.dimension==DimensionType.DIM_3D, db_data.tasks.all()))): + content = [] + for source in sources: + name, ext = os.path.splitext(os.path.relpath(source, upload_dir)) + content.append({ + 'name': name, + 'extension': ext + }) + else: + meta_info = manifest.prepare_meta(sources=sources, data_dir=data_dir) + content = meta_info.content + + if db_data.storage == StorageChoice.SHARE: + def _get_frame_step(str_): + match = search("step\s*=\s*([1-9]\d*)", str_) + return int(match.group(1)) if match else 1 + step = _get_frame_step(db_data.frame_filter) + start = db_data.start_frame + stop = db_data.stop_frame + 1 + images_range = range(start, stop, step) + result_content = [] + for i in range(stop): + item = content.pop(0) if i in images_range else dict() + result_content.append(item) + content = result_content + manifest.create(content) + manifest.init_index() + except Exception as ex: + print(str(ex)) + +class Migration(migrations.Migration): + + dependencies = [ + ('engine', '0037_task_subset'), + ] + + operations = [ + migrations.RunPython(migrate_data) + ] diff --git a/cvat/apps/engine/models.py b/cvat/apps/engine/models.py index 8e4f558cedc5..d9fcda7743e8 100644 --- a/cvat/apps/engine/models.py +++ b/cvat/apps/engine/models.py @@ -138,11 +138,10 @@ def get_compressed_chunk_path(self, chunk_number): def get_preview_path(self): return os.path.join(self.get_data_dirname(), 'preview.jpeg') - def get_meta_path(self): - return os.path.join(self.get_upload_dirname(), 'meta_info.txt') - - def get_dummy_chunk_path(self, chunk_number): - return os.path.join(self.get_upload_dirname(), 'dummy_{}.txt'.format(chunk_number)) + def get_manifest_path(self): + return os.path.join(self.get_upload_dirname(), 'manifest.jsonl') + def get_index_path(self): + return os.path.join(self.get_upload_dirname(), 'index.json') class Video(models.Model): data = models.OneToOneField(Data, on_delete=models.CASCADE, related_name="video", null=True) diff --git a/cvat/apps/engine/prepare.py b/cvat/apps/engine/prepare.py deleted file mode 100644 index 4cedf4ab0175..000000000000 --- a/cvat/apps/engine/prepare.py +++ /dev/null @@ -1,277 +0,0 @@ -# Copyright (C) 2020 Intel Corporation -# -# SPDX-License-Identifier: MIT - -import av -from collections import OrderedDict -import hashlib -import os -from cvat.apps.engine.utils import rotate_image - -class WorkWithVideo: - def __init__(self, **kwargs): - if not kwargs.get('source_path'): - raise Exception('No sourse path') - self.source_path = kwargs.get('source_path') - - @staticmethod - def _open_video_container(sourse_path, mode, options=None): - return av.open(sourse_path, mode=mode, options=options) - - @staticmethod - def _close_video_container(container): - container.close() - - @staticmethod - def _get_video_stream(container): - video_stream = next(stream for stream in container.streams if stream.type == 'video') - video_stream.thread_type = 'AUTO' - return video_stream - - @staticmethod - def _get_frame_size(container): - video_stream = WorkWithVideo._get_video_stream(container) - for packet in container.demux(video_stream): - for frame in packet.decode(): - if video_stream.metadata.get('rotate'): - frame = av.VideoFrame().from_ndarray( - rotate_image( - frame.to_ndarray(format='bgr24'), - 360 - int(container.streams.video[0].metadata.get('rotate')), - ), - format ='bgr24', - ) - return frame.width, frame.height - -class AnalyzeVideo(WorkWithVideo): - def check_type_first_frame(self): - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - - for packet in container.demux(video_stream): - for frame in packet.decode(): - self._close_video_container(container) - assert frame.pict_type.name == 'I', 'First frame is not key frame' - return - - def check_video_timestamps_sequences(self): - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - - frame_pts = -1 - frame_dts = -1 - for packet in container.demux(video_stream): - for frame in packet.decode(): - - if None not in [frame.pts, frame_pts] and frame.pts <= frame_pts: - self._close_video_container(container) - raise Exception('Invalid pts sequences') - - if None not in [frame.dts, frame_dts] and frame.dts <= frame_dts: - self._close_video_container(container) - raise Exception('Invalid dts sequences') - - frame_pts, frame_dts = frame.pts, frame.dts - self._close_video_container(container) - -def md5_hash(frame): - return hashlib.md5(frame.to_image().tobytes()).hexdigest() - -class PrepareInfo(WorkWithVideo): - - def __init__(self, **kwargs): - super().__init__(**kwargs) - - if not kwargs.get('meta_path'): - raise Exception('No meta path') - - self.meta_path = kwargs.get('meta_path') - self.key_frames = {} - self.frames = 0 - - container = self._open_video_container(self.source_path, 'r') - self.width, self.height = self._get_frame_size(container) - self._close_video_container(container) - - def get_task_size(self): - return self.frames - - @property - def frame_sizes(self): - return (self.width, self.height) - - def check_key_frame(self, container, video_stream, key_frame): - for packet in container.demux(video_stream): - for frame in packet.decode(): - if md5_hash(frame) != key_frame[1]['md5'] or frame.pts != key_frame[1]['pts']: - self.key_frames.pop(key_frame[0]) - return - - def check_seek_key_frames(self): - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - - key_frames_copy = self.key_frames.copy() - - for key_frame in key_frames_copy.items(): - container.seek(offset=key_frame[1]['pts'], stream=video_stream) - self.check_key_frame(container, video_stream, key_frame) - - def check_frames_ratio(self, chunk_size): - return (len(self.key_frames) and (self.frames // len(self.key_frames)) <= 2 * chunk_size) - - def save_key_frames(self): - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - frame_number = 0 - - for packet in container.demux(video_stream): - for frame in packet.decode(): - if frame.key_frame: - self.key_frames[frame_number] = { - 'pts': frame.pts, - 'md5': md5_hash(frame), - } - frame_number += 1 - - self.frames = frame_number - self._close_video_container(container) - - def save_meta_info(self): - with open(self.meta_path, 'w') as meta_file: - for index, frame in self.key_frames.items(): - meta_file.write('{} {}\n'.format(index, frame['pts'])) - - def get_nearest_left_key_frame(self, start_chunk_frame_number): - start_decode_frame_number = 0 - start_decode_timestamp = 0 - - with open(self.meta_path, 'r') as file: - for line in file: - frame_number, timestamp = line.strip().split(' ') - - if int(frame_number) <= start_chunk_frame_number: - start_decode_frame_number = frame_number - start_decode_timestamp = timestamp - else: - break - - return int(start_decode_frame_number), int(start_decode_timestamp) - - def decode_needed_frames(self, chunk_number, db_data): - step = db_data.get_frame_step() - start_chunk_frame_number = db_data.start_frame + chunk_number * db_data.chunk_size * step - end_chunk_frame_number = min(start_chunk_frame_number + (db_data.chunk_size - 1) * step + 1, db_data.stop_frame + 1) - start_decode_frame_number, start_decode_timestamp = self.get_nearest_left_key_frame(start_chunk_frame_number) - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - container.seek(offset=start_decode_timestamp, stream=video_stream) - - frame_number = start_decode_frame_number - 1 - for packet in container.demux(video_stream): - for frame in packet.decode(): - frame_number += 1 - if frame_number < start_chunk_frame_number: - continue - elif frame_number < end_chunk_frame_number and not ((frame_number - start_chunk_frame_number) % step): - if video_stream.metadata.get('rotate'): - frame = av.VideoFrame().from_ndarray( - rotate_image( - frame.to_ndarray(format='bgr24'), - 360 - int(container.streams.video[0].metadata.get('rotate')) - ), - format ='bgr24' - ) - yield frame - elif (frame_number - start_chunk_frame_number) % step: - continue - else: - self._close_video_container(container) - return - - self._close_video_container(container) - -class UploadedMeta(PrepareInfo): - def __init__(self, **kwargs): - super().__init__(**kwargs) - uploaded_meta = kwargs.get('uploaded_meta') - assert uploaded_meta is not None , 'No uploaded meta path' - - with open(uploaded_meta, 'r') as meta_file: - lines = meta_file.read().strip().split('\n') - self.frames = int(lines.pop()) - - key_frames = {int(line.split()[0]): int(line.split()[1]) for line in lines} - self.key_frames = OrderedDict(sorted(key_frames.items(), key=lambda x: x[0])) - - @property - def frame_sizes(self): - container = self._open_video_container(self.source_path, 'r') - video_stream = self._get_video_stream(container) - container.seek(offset=next(iter(self.key_frames.values())), stream=video_stream) - for packet in container.demux(video_stream): - for frame in packet.decode(): - if video_stream.metadata.get('rotate'): - frame = av.VideoFrame().from_ndarray( - rotate_image( - frame.to_ndarray(format='bgr24'), - 360 - int(container.streams.video[0].metadata.get('rotate')) - ), - format ='bgr24' - ) - self._close_video_container(container) - return (frame.width, frame.height) - - def save_meta_info(self): - with open(self.meta_path, 'w') as meta_file: - for index, pts in self.key_frames.items(): - meta_file.write('{} {}\n'.format(index, pts)) - - def check_key_frame(self, container, video_stream, key_frame): - for packet in container.demux(video_stream): - for frame in packet.decode(): - assert frame.pts == key_frame[1], "Uploaded meta information does not match the video" - return - - def check_seek_key_frames(self): - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - - for key_frame in self.key_frames.items(): - container.seek(offset=key_frame[1], stream=video_stream) - self.check_key_frame(container, video_stream, key_frame) - - self._close_video_container(container) - - def check_frames_numbers(self): - container = self._open_video_container(self.source_path, mode='r') - video_stream = self._get_video_stream(container) - # not all videos contain information about numbers of frames - if video_stream.frames: - self._close_video_container(container) - assert video_stream.frames == self.frames, "Uploaded meta information does not match the video" - return - self._close_video_container(container) - -def prepare_meta(media_file, upload_dir=None, meta_dir=None, chunk_size=None): - paths = { - 'source_path': os.path.join(upload_dir, media_file) if upload_dir else media_file, - 'meta_path': os.path.join(meta_dir, 'meta_info.txt') if meta_dir else os.path.join(upload_dir, 'meta_info.txt'), - } - analyzer = AnalyzeVideo(source_path=paths.get('source_path')) - analyzer.check_type_first_frame() - analyzer.check_video_timestamps_sequences() - - meta_info = PrepareInfo(source_path=paths.get('source_path'), - meta_path=paths.get('meta_path')) - meta_info.save_key_frames() - meta_info.check_seek_key_frames() - meta_info.save_meta_info() - smooth_decoding = meta_info.check_frames_ratio(chunk_size) if chunk_size else None - return (meta_info, smooth_decoding) - -def prepare_meta_for_upload(func, *args): - meta_info, smooth_decoding = func(*args) - with open(meta_info.meta_path, 'a') as meta_file: - meta_file.write(str(meta_info.get_task_size())) - return smooth_decoding diff --git a/cvat/apps/engine/task.py b/cvat/apps/engine/task.py index b54b3af95581..e24865c73690 100644 --- a/cvat/apps/engine/task.py +++ b/cvat/apps/engine/task.py @@ -1,12 +1,11 @@ -# Copyright (C) 2018-2020 Intel Corporation +# Copyright (C) 2018-2021 Intel Corporation # # SPDX-License-Identifier: MIT import itertools import os import sys -from re import findall import rq import shutil from traceback import print_exception @@ -17,8 +16,9 @@ from cvat.apps.engine.media_extractors import get_mime, MEDIA_TYPES, Mpeg4ChunkWriter, ZipChunkWriter, Mpeg4CompressedChunkWriter, ZipCompressedChunkWriter, ValidateDimension from cvat.apps.engine.models import DataChoice, StorageMethodChoice, StorageChoice, RelatedFile from cvat.apps.engine.utils import av_scan_paths -from cvat.apps.engine.prepare import prepare_meta from cvat.apps.engine.models import DimensionType +from utils.dataset_manifest import ImageManifestManager, VideoManifestManager +from utils.dataset_manifest.core import VideoManifestValidator import django_rq from django.conf import settings @@ -107,7 +107,7 @@ def _save_task_to_db(db_task): db_task.data.save() db_task.save() -def _count_files(data, meta_info_file=None): +def _count_files(data, manifest_file=None): share_root = settings.SHARE_ROOT server_files = [] @@ -134,8 +134,8 @@ def count_files(file_mapping, counter): mime = get_mime(full_path) if mime in counter: counter[mime].append(rel_path) - elif findall('meta_info.txt$', rel_path): - meta_info_file.append(rel_path) + elif 'manifest.jsonl' == os.path.basename(rel_path): + manifest_file.append(rel_path) else: slogger.glob.warn("Skip '{}' file (its mime type doesn't " "correspond to a video or an image file)".format(full_path)) @@ -154,7 +154,7 @@ def count_files(file_mapping, counter): return counter -def _validate_data(counter, meta_info_file=None): +def _validate_data(counter, manifest_file=None): unique_entries = 0 multiple_entries = 0 for media_type, media_config in MEDIA_TYPES.items(): @@ -164,8 +164,8 @@ def _validate_data(counter, meta_info_file=None): else: multiple_entries += len(counter[media_type]) - if meta_info_file and media_type != 'video': - raise Exception('File with meta information can only be uploaded with video file') + if manifest_file and media_type not in ('video', 'image'): + raise Exception('File with meta information can only be uploaded with video/images ') if unique_entries == 1 and multiple_entries > 0 or unique_entries > 1: unique_types = ', '.join([k for k, v in MEDIA_TYPES.items() if v['unique']]) @@ -221,10 +221,10 @@ def _create_thread(tid, data): if data['remote_files']: data['remote_files'] = _download_data(data['remote_files'], upload_dir) - meta_info_file = [] - media = _count_files(data, meta_info_file) - media, task_mode = _validate_data(media, meta_info_file) - if meta_info_file: + manifest_file = [] + media = _count_files(data, manifest_file) + media, task_mode = _validate_data(media, manifest_file) + if manifest_file: assert settings.USE_CACHE and db_data.storage_method == StorageMethodChoice.CACHE, \ "File with meta information can be uploaded if 'Use cache' option is also selected" @@ -248,8 +248,10 @@ def _create_thread(tid, data): if extractor is not None: raise Exception('Combined data types are not supported') source_paths=[os.path.join(upload_dir, f) for f in media_files] - if media_type in ('archive', 'zip') and db_data.storage == StorageChoice.SHARE: + if media_type in {'archive', 'zip'} and db_data.storage == StorageChoice.SHARE: source_paths.append(db_data.get_upload_dirname()) + upload_dir = db_data.get_upload_dirname() + db_data.storage = StorageChoice.LOCAL extractor = MEDIA_TYPES[media_type]['extractor']( source_path=source_paths, step=db_data.get_frame_step(), @@ -322,68 +324,108 @@ def update_progress(progress): video_path = "" video_size = (0, 0) + def _update_status(msg): + job.meta['status'] = msg + job.save_meta() + if settings.USE_CACHE and db_data.storage_method == StorageMethodChoice.CACHE: for media_type, media_files in media.items(): if not media_files: continue + # replace manifest file (e.g was uploaded 'subdir/manifest.jsonl') + if manifest_file and not os.path.exists(db_data.get_manifest_path()): + shutil.copyfile(os.path.join(upload_dir, manifest_file[0]), + db_data.get_manifest_path()) + if upload_dir != settings.SHARE_ROOT: + os.remove(os.path.join(upload_dir, manifest_file[0])) + if task_mode == MEDIA_TYPES['video']['mode']: try: - if meta_info_file: + manifest_is_prepared = False + if manifest_file: try: - from cvat.apps.engine.prepare import UploadedMeta - meta_info = UploadedMeta(source_path=os.path.join(upload_dir, media_files[0]), - meta_path=db_data.get_meta_path(), - uploaded_meta=os.path.join(upload_dir, meta_info_file[0])) - meta_info.check_seek_key_frames() - meta_info.check_frames_numbers() - meta_info.save_meta_info() - assert len(meta_info.key_frames) > 0, 'No key frames.' + manifest = VideoManifestValidator(source_path=os.path.join(upload_dir, media_files[0]), + manifest_path=db_data.get_manifest_path()) + manifest.init_index() + manifest.validate_seek_key_frames() + manifest.validate_frame_numbers() + assert len(manifest) > 0, 'No key frames.' + + all_frames = manifest['properties']['length'] + video_size = manifest['properties']['resolution'] + manifest_is_prepared = True except Exception as ex: - base_msg = str(ex) if isinstance(ex, AssertionError) else \ - 'Invalid meta information was upload.' - job.meta['status'] = '{} Start prepare valid meta information.'.format(base_msg) - job.save_meta() - meta_info, smooth_decoding = prepare_meta( - media_file=media_files[0], - upload_dir=upload_dir, - meta_dir=os.path.dirname(db_data.get_meta_path()), - chunk_size=db_data.chunk_size - ) - assert smooth_decoding == True, 'Too few keyframes for smooth video decoding.' - else: - meta_info, smooth_decoding = prepare_meta( + if os.path.exists(db_data.get_index_path()): + os.remove(db_data.get_index_path()) + if isinstance(ex, AssertionError): + base_msg = str(ex) + else: + base_msg = 'Invalid manifest file was upload.' + slogger.glob.warning(str(ex)) + _update_status('{} Start prepare a valid manifest file.'.format(base_msg)) + + if not manifest_is_prepared: + _update_status('Start prepare a manifest file') + manifest = VideoManifestManager(db_data.get_manifest_path()) + meta_info = manifest.prepare_meta( media_file=media_files[0], upload_dir=upload_dir, - meta_dir=os.path.dirname(db_data.get_meta_path()), chunk_size=db_data.chunk_size ) - assert smooth_decoding == True, 'Too few keyframes for smooth video decoding.' + manifest.create(meta_info) + manifest.init_index() + _update_status('A manifest had been created') - all_frames = meta_info.get_task_size() - video_size = meta_info.frame_sizes + all_frames = meta_info.get_size() + video_size = meta_info.frame_sizes + manifest_is_prepared = True - db_data.size = len(range(db_data.start_frame, min(data['stop_frame'] + 1 if data['stop_frame'] else all_frames, all_frames), db_data.get_frame_step())) + db_data.size = len(range(db_data.start_frame, min(data['stop_frame'] + 1 \ + if data['stop_frame'] else all_frames, all_frames), db_data.get_frame_step())) video_path = os.path.join(upload_dir, media_files[0]) except Exception as ex: db_data.storage_method = StorageMethodChoice.FILE_SYSTEM - if os.path.exists(db_data.get_meta_path()): - os.remove(db_data.get_meta_path()) - base_msg = str(ex) if isinstance(ex, AssertionError) else "Uploaded video does not support a quick way of task creating." - job.meta['status'] = "{} The task will be created using the old method".format(base_msg) - job.save_meta() - else:#images,archive + if os.path.exists(db_data.get_manifest_path()): + os.remove(db_data.get_manifest_path()) + if os.path.exists(db_data.get_index_path()): + os.remove(db_data.get_index_path()) + base_msg = str(ex) if isinstance(ex, AssertionError) \ + else "Uploaded video does not support a quick way of task creating." + _update_status("{} The task will be created using the old method".format(base_msg)) + else:# images, archive, pdf db_data.size = len(extractor) - + manifest = ImageManifestManager(db_data.get_manifest_path()) + if not manifest_file: + if db_task.dimension == DimensionType.DIM_2D: + meta_info = manifest.prepare_meta( + sources=extractor.absolute_source_paths, + data_dir=upload_dir + ) + content = meta_info.content + else: + content = [] + for source in extractor.absolute_source_paths: + name, ext = os.path.splitext(os.path.relpath(source, upload_dir)) + content.append({ + 'name': name, + 'extension': ext + }) + manifest.create(content) + manifest.init_index() counter = itertools.count() - for chunk_number, chunk_frames in itertools.groupby(extractor.frame_range, lambda x: next(counter) // db_data.chunk_size): + for _, chunk_frames in itertools.groupby(extractor.frame_range, lambda x: next(counter) // db_data.chunk_size): chunk_paths = [(extractor.get_path(i), i) for i in chunk_frames] img_sizes = [] - with open(db_data.get_dummy_chunk_path(chunk_number), 'w') as dummy_chunk: - for path, frame_id in chunk_paths: - dummy_chunk.write(os.path.relpath(path, upload_dir) + '\n') - img_sizes.append(extractor.get_image_size(frame_id)) + + for _, frame_id in chunk_paths: + properties = manifest[frame_id] + if db_task.dimension == DimensionType.DIM_2D: + resolution = (properties['width'], properties['height']) + else: + resolution = extractor.get_image_size(frame_id) + img_sizes.append(resolution) db_images.extend([ models.Image(data=db_data, @@ -453,6 +495,10 @@ def update_progress(progress): if db_data.stop_frame == 0: db_data.stop_frame = db_data.start_frame + (db_data.size - 1) * db_data.get_frame_step() + else: + # validate stop_frame + db_data.stop_frame = min(db_data.stop_frame, \ + db_data.start_frame + (db_data.size - 1) * db_data.get_frame_step()) preview = extractor.get_preview() preview.save(db_data.get_preview_path()) diff --git a/cvat/apps/engine/tests/test_rest_api.py b/cvat/apps/engine/tests/test_rest_api.py index f39bf95acf75..67fb41b5fa90 100644 --- a/cvat/apps/engine/tests/test_rest_api.py +++ b/cvat/apps/engine/tests/test_rest_api.py @@ -30,9 +30,9 @@ from cvat.apps.engine.models import (AttributeSpec, AttributeType, Data, Job, Project, Segment, StatusChoice, Task, Label, StorageMethodChoice, StorageChoice) -from cvat.apps.engine.prepare import prepare_meta, prepare_meta_for_upload from cvat.apps.engine.media_extractors import ValidateDimension from cvat.apps.engine.models import DimensionType +from utils.dataset_manifest import ImageManifestManager, VideoManifestManager def create_db_users(cls): (group_admin, _) = Group.objects.get_or_create(name="admin") @@ -1971,6 +1971,26 @@ def generate_pdf_file(filename, page_count=1): file_buf.seek(0) return image_sizes, file_buf +def generate_manifest_file(data_type, manifest_path, sources): + kwargs = { + 'images': { + 'sources': sources, + 'is_sorted': False, + }, + 'video': { + 'media_file': sources[0], + 'upload_dir': os.path.dirname(sources[0]), + 'force': True + } + } + + if data_type == 'video': + manifest = VideoManifestManager(manifest_path) + else: + manifest = ImageManifestManager(manifest_path) + prepared_meta = manifest.prepare_meta(**kwargs[data_type]) + manifest.create(prepared_meta) + class TaskDataAPITestCase(APITestCase): _image_sizes = {} @@ -2093,6 +2113,12 @@ def setUpClass(cls): shutil.rmtree(root_path) cls._image_sizes[filename] = image_sizes + generate_manifest_file(data_type='video', manifest_path=os.path.join(settings.SHARE_ROOT, 'videos', 'manifest.jsonl'), + sources=[os.path.join(settings.SHARE_ROOT, 'videos', 'test_video_1.mp4')]) + + generate_manifest_file(data_type='images', manifest_path=os.path.join(settings.SHARE_ROOT, 'manifest.jsonl'), + sources=[os.path.join(settings.SHARE_ROOT, f'test_{i}.jpg') for i in range(1,4)]) + @classmethod def tearDownClass(cls): super().tearDownClass() @@ -2114,7 +2140,10 @@ def tearDownClass(cls): path = os.path.join(settings.SHARE_ROOT, "videos", "test_video_1.mp4") os.remove(path) - path = os.path.join(settings.SHARE_ROOT, "videos", "meta_info.txt") + path = os.path.join(settings.SHARE_ROOT, "videos", "manifest.jsonl") + os.remove(path) + + path = os.path.join(settings.SHARE_ROOT, "manifest.jsonl") os.remove(path) def _run_api_v1_tasks_id_data_post(self, tid, user, data): @@ -2257,7 +2286,7 @@ def _test_api_v1_tasks_id_data_spec(self, user, spec, data, expected_compressed_ self.assertEqual(len(images), min(task["data_chunk_size"], len(image_sizes))) if task["data_original_chunk_type"] == self.ChunkType.IMAGESET: - server_files = [img for key, img in data.items() if key.startswith("server_files")] + server_files = [img for key, img in data.items() if key.startswith("server_files") and not img.endswith("manifest.jsonl")] client_files = [img for key, img in data.items() if key.startswith("client_files")] if server_files: @@ -2446,7 +2475,7 @@ def _test_api_v1_tasks_id_data(self, user): image_sizes = self._image_sizes[task_data["server_files[0]"]] self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, image_sizes, - expected_uploaded_data_location=StorageChoice.SHARE) + expected_uploaded_data_location=StorageChoice.LOCAL) task_spec.update([('name', 'my archive task #12')]) task_data.update([('copy_data', True)]) @@ -2546,7 +2575,7 @@ def _test_api_v1_tasks_id_data(self, user): image_sizes = self._image_sizes[task_data["server_files[0]"]] self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, - self.ChunkType.IMAGESET, image_sizes, StorageMethodChoice.CACHE, StorageChoice.SHARE) + self.ChunkType.IMAGESET, image_sizes, StorageMethodChoice.CACHE, StorageChoice.LOCAL) task_spec.update([('name', 'my cached zip archive task #19')]) task_data.update([('copy_data', True)]) @@ -2595,11 +2624,6 @@ def _test_api_v1_tasks_id_data(self, user): self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, image_sizes) - prepare_meta_for_upload( - prepare_meta, - os.path.join(settings.SHARE_ROOT, "videos", "test_video_1.mp4"), - os.path.join(settings.SHARE_ROOT, "videos") - ) task_spec = { "name": "my video with meta info task without copying #22", "overlap": 0, @@ -2611,7 +2635,7 @@ def _test_api_v1_tasks_id_data(self, user): } task_data = { "server_files[0]": os.path.join("videos", "test_video_1.mp4"), - "server_files[1]": os.path.join("videos", "meta_info.txt"), + "server_files[1]": os.path.join("videos", "manifest.jsonl"), "image_quality": 70, "use_cache": True } @@ -2723,6 +2747,38 @@ def _test_api_v1_tasks_id_data(self, user): self.ChunkType.IMAGESET, image_sizes, dimension=DimensionType.DIM_3D) + task_spec = { + "name": "my images+manifest without copying #26", + "overlap": 0, + "segment_size": 0, + "labels": [ + {"name": "car"}, + {"name": "person"}, + ] + } + + task_data = { + "server_files[0]": "test_1.jpg", + "server_files[1]": "test_2.jpg", + "server_files[2]": "test_3.jpg", + "server_files[3]": "manifest.jsonl", + "image_quality": 70, + "use_cache": True + } + image_sizes = [ + self._image_sizes[task_data["server_files[0]"]], + self._image_sizes[task_data["server_files[1]"]], + self._image_sizes[task_data["server_files[2]"]], + ] + + self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, + image_sizes, StorageMethodChoice.CACHE, StorageChoice.SHARE) + + task_spec.update([('name', 'my images+manifest #27')]) + task_data.update([('copy_data', True)]) + self._test_api_v1_tasks_id_data_spec(user, task_spec, task_data, self.ChunkType.IMAGESET, self.ChunkType.IMAGESET, + image_sizes, StorageMethodChoice.CACHE, StorageChoice.LOCAL) + def test_api_v1_tasks_id_data_admin(self): self._test_api_v1_tasks_id_data(self.admin) diff --git a/cvat/apps/engine/utils.py b/cvat/apps/engine/utils.py index 854393cfa75f..f37440731281 100644 --- a/cvat/apps/engine/utils.py +++ b/cvat/apps/engine/utils.py @@ -1,15 +1,17 @@ -# Copyright (C) 2020 Intel Corporation +# Copyright (C) 2020-2021 Intel Corporation # # SPDX-License-Identifier: MIT import ast import cv2 as cv from collections import namedtuple +import hashlib import importlib import sys import traceback import subprocess import os +from av import VideoFrame from django.core.exceptions import ValidationError @@ -51,6 +53,7 @@ class InterpreterError(Exception): def execute_python_code(source_code, global_vars=None, local_vars=None): try: + # pylint: disable=exec-used exec(source_code, global_vars, local_vars) except SyntaxError as err: error_class = err.__class__.__name__ @@ -72,7 +75,7 @@ def av_scan_paths(*paths): if 'yes' == os.environ.get('CLAM_AV'): command = ['clamscan', '--no-summary', '-i', '-o'] command.extend(paths) - res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + res = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) # nosec if res.returncode: raise ValidationError(res.stdout) @@ -88,3 +91,8 @@ def rotate_image(image, angle): matrix[1, 2] += bound_h/2 - image_center[1] matrix = cv.warpAffine(image, matrix, (bound_w, bound_h)) return matrix + +def md5_hash(frame): + if isinstance(frame, VideoFrame): + frame = frame.to_image() + return hashlib.md5(frame.tobytes()).hexdigest() # nosec \ No newline at end of file diff --git a/utils/dataset_manifest/README.md b/utils/dataset_manifest/README.md new file mode 100644 index 000000000000..4a16f6151712 --- /dev/null +++ b/utils/dataset_manifest/README.md @@ -0,0 +1,118 @@ +## Simple command line to prepare dataset manifest file + +### Steps before use + +When used separately from Computer Vision Annotation Tool(CVAT), the required dependencies must be installed + +#### Ubuntu:20.04 + +Install dependencies: + +```bash +# General +sudo apt-get update && sudo apt-get --no-install-recommends install -y \ + python3-dev python3-pip python3-venv pkg-config +``` + +```bash +# Library components +sudo apt-get install --no-install-recommends -y \ + libavformat-dev libavcodec-dev libavdevice-dev \ + libavutil-dev libswscale-dev libswresample-dev libavfilter-dev +``` + +Create an environment and install the necessary python modules: + +```bash +python3 -m venv .env +. .env/bin/activate +pip install -U pip +pip install -r requirements.txt +``` + +### Using + +```bash +usage: python create.py [-h] [--force] [--output-dir .] source + +positional arguments: + source Source paths + +optional arguments: + -h, --help show this help message and exit + --force Use this flag to prepare the manifest file for video data if by default the video does not meet the requirements + and a manifest file is not prepared + --output-dir OUTPUT_DIR + Directory where the manifest file will be saved +``` + +### Alternative way to use with openvino/cvat_server + +```bash +docker run -it --entrypoint python3 -v /path/to/host/data/:/path/inside/container/:rw openvino/cvat_server +utils/dataset_manifest/create.py --output-dir /path/to/manifest/directory/ /path/to/data/ +``` + +### Examples of using + +Create a dataset manifest in the current directory with video which contains enough keyframes: + +```bash +python create.py ~/Documents/video.mp4 +``` + +Create a dataset manifest with video which does not contain enough keyframes: + +```bash +python create.py --force --output-dir ~/Documents ~/Documents/video.mp4 +``` + +Create a dataset manifest with images: + +```bash +python create.py --output-dir ~/Documents ~/Documents/images/ +``` + +Create a dataset manifest with pattern (may be used `*`, `?`, `[]`): + +```bash +python create.py --output-dir ~/Documents "/home/${USER}/Documents/**/image*.jpeg" +``` + +Create a dataset manifest with `openvino/cvat_server`: + +```bash +docker run -it --entrypoint python3 -v ~/Documents/data/:${HOME}/manifest/:rw openvino/cvat_server +utils/dataset_manifest/create.py --output-dir ~/manifest/ ~/manifest/images/ +``` + +### Examples of generated `manifest.jsonl` files + +A maifest file contains some intuitive information and some specific like: + +`pts` - time at which the frame should be shown to the user +`checksum` - `md5` hash sum for the specific image/frame + +#### For a video + +```json +{"version":"1.0"} +{"type":"video"} +{"properties":{"name":"video.mp4","resolution":[1280,720],"length":778}} +{"number":0,"pts":0,"checksum":"17bb40d76887b56fe8213c6fded3d540"} +{"number":135,"pts":486000,"checksum":"9da9b4d42c1206d71bf17a7070a05847"} +{"number":270,"pts":972000,"checksum":"a1c3a61814f9b58b00a795fa18bb6d3e"} +{"number":405,"pts":1458000,"checksum":"18c0803b3cc1aa62ac75b112439d2b62"} +{"number":540,"pts":1944000,"checksum":"4551ecea0f80e95a6c32c32e70cac59e"} +{"number":675,"pts":2430000,"checksum":"0e72faf67e5218c70b506445ac91cdd7"} +``` + +#### For a dataset with images + +```json +{"version":"1.0"} +{"type":"images"} +{"name":"image1","extension":".jpg","width":720,"height":405,"checksum":"548918ec4b56132a5cff1d4acabe9947"} +{"name":"image2","extension":".jpg","width":183,"height":275,"checksum":"4b4eefd03cc6a45c1c068b98477fb639"} +{"name":"image3","extension":".jpg","width":301,"height":167,"checksum":"0e454a6f4a13d56c82890c98be063663"} +``` diff --git a/utils/dataset_manifest/__init__.py b/utils/dataset_manifest/__init__.py new file mode 100644 index 000000000000..f6547acf3583 --- /dev/null +++ b/utils/dataset_manifest/__init__.py @@ -0,0 +1,4 @@ +# Copyright (C) 2021 Intel Corporation +# +# SPDX-License-Identifier: MIT +from .core import VideoManifestManager, ImageManifestManager \ No newline at end of file diff --git a/utils/dataset_manifest/core.py b/utils/dataset_manifest/core.py new file mode 100644 index 000000000000..78a00b0b98bf --- /dev/null +++ b/utils/dataset_manifest/core.py @@ -0,0 +1,446 @@ +# Copyright (C) 2021 Intel Corporation +# +# SPDX-License-Identifier: MIT + +import av +import json +import os +from abc import ABC, abstractmethod +from collections import OrderedDict +from contextlib import closing +from PIL import Image +from .utils import md5_hash, rotate_image + +class VideoStreamReader: + def __init__(self, source_path): + self.source_path = source_path + self._key_frames = OrderedDict() + self.frames = 0 + + with closing(av.open(self.source_path, mode='r')) as container: + self.width, self.height = self._get_frame_size(container) + + @staticmethod + def _get_video_stream(container): + video_stream = next(stream for stream in container.streams if stream.type == 'video') + video_stream.thread_type = 'AUTO' + return video_stream + + @staticmethod + def _get_frame_size(container): + video_stream = VideoStreamReader._get_video_stream(container) + for packet in container.demux(video_stream): + for frame in packet.decode(): + if video_stream.metadata.get('rotate'): + frame = av.VideoFrame().from_ndarray( + rotate_image( + frame.to_ndarray(format='bgr24'), + 360 - int(container.streams.video[0].metadata.get('rotate')), + ), + format ='bgr24', + ) + return frame.width, frame.height + + def check_type_first_frame(self): + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + + for packet in container.demux(video_stream): + for frame in packet.decode(): + if not frame.pict_type.name == 'I': + raise Exception('First frame is not key frame') + return + + def check_video_timestamps_sequences(self): + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + + frame_pts = -1 + frame_dts = -1 + for packet in container.demux(video_stream): + for frame in packet.decode(): + + if None not in {frame.pts, frame_pts} and frame.pts <= frame_pts: + raise Exception('Invalid pts sequences') + + if None not in {frame.dts, frame_dts} and frame.dts <= frame_dts: + raise Exception('Invalid dts sequences') + + frame_pts, frame_dts = frame.pts, frame.dts + + def rough_estimate_frames_ratio(self, upper_bound): + analyzed_frames_number, key_frames_number = 0, 0 + _processing_end = False + + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + for packet in container.demux(video_stream): + for frame in packet.decode(): + if frame.key_frame: + key_frames_number += 1 + analyzed_frames_number += 1 + if upper_bound == analyzed_frames_number: + _processing_end = True + break + if _processing_end: + break + # In our case no videos with non-key first frame, so 1 key frame is guaranteed + return analyzed_frames_number // key_frames_number + + def validate_frames_ratio(self, chunk_size): + upper_bound = 3 * chunk_size + ratio = self.rough_estimate_frames_ratio(upper_bound + 1) + assert ratio < upper_bound, 'Too few keyframes' + + def get_size(self): + return self.frames + + @property + def frame_sizes(self): + return (self.width, self.height) + + def validate_key_frame(self, container, video_stream, key_frame): + for packet in container.demux(video_stream): + for frame in packet.decode(): + if md5_hash(frame) != key_frame[1]['md5'] or frame.pts != key_frame[1]['pts']: + self._key_frames.pop(key_frame[0]) + return + + def validate_seek_key_frames(self): + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + + key_frames_copy = self._key_frames.copy() + + for key_frame in key_frames_copy.items(): + container.seek(offset=key_frame[1]['pts'], stream=video_stream) + self.validate_key_frame(container, video_stream, key_frame) + + def save_key_frames(self): + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + frame_number = 0 + + for packet in container.demux(video_stream): + for frame in packet.decode(): + if frame.key_frame: + self._key_frames[frame_number] = { + 'pts': frame.pts, + 'md5': md5_hash(frame), + } + frame_number += 1 + self.frames = frame_number + + @property + def key_frames(self): + return self._key_frames + + def __len__(self): + return len(self._key_frames) + + #TODO: need to change it in future + def __iter__(self): + for idx, key_frame in self._key_frames.items(): + yield (idx, key_frame['pts'], key_frame['md5']) + + +class DatasetImagesReader: + def __init__(self, sources, is_sorted=True, use_image_hash=False, *args, **kwargs): + self._sources = sources if is_sorted else sorted(sources) + self._content = [] + self._data_dir = kwargs.get('data_dir', None) + self._use_image_hash = use_image_hash + + def __iter__(self): + for image in self._sources: + img = Image.open(image, mode='r') + img_name = os.path.relpath(image, self._data_dir) if self._data_dir \ + else os.path.basename(image) + name, extension = os.path.splitext(img_name) + image_properties = { + 'name': name, + 'extension': extension, + 'width': img.width, + 'height': img.height, + } + if self._use_image_hash: + image_properties['checksum'] = md5_hash(img) + yield image_properties + + def create(self): + for item in self: + self._content.append(item) + + @property + def content(self): + return self._content + +class _Manifest: + FILE_NAME = 'manifest.jsonl' + VERSION = '1.0' + + def __init__(self, path, is_created=False): + assert path, 'A path to manifest file not found' + self._path = os.path.join(path, self.FILE_NAME) if os.path.isdir(path) else path + self._is_created = is_created + + @property + def path(self): + return self._path + + @property + def is_created(self): + return self._is_created + + @is_created.setter + def is_created(self, value): + assert isinstance(value, bool) + self._is_created = value + +# Needed for faster iteration over the manifest file, will be generated to work inside CVAT +# and will not be generated when manually creating a manifest +class _Index: + FILE_NAME = 'index.json' + + def __init__(self, path): + assert path and os.path.isdir(path), 'No index directory path' + self._path = os.path.join(path, self.FILE_NAME) + self._index = {} + + @property + def path(self): + return self._path + + def dump(self): + with open(self._path, 'w') as index_file: + json.dump(self._index, index_file, separators=(',', ':')) + + def load(self): + with open(self._path, 'r') as index_file: + self._index = json.load(index_file, + object_hook=lambda d: {int(k): v for k, v in d.items()}) + + def create(self, manifest, skip): + assert os.path.exists(manifest), 'A manifest file not exists, index cannot be created' + with open(manifest, 'r+') as manifest_file: + while skip: + manifest_file.readline() + skip -= 1 + image_number = 0 + position = manifest_file.tell() + line = manifest_file.readline() + while line: + if line.strip(): + self._index[image_number] = position + image_number += 1 + position = manifest_file.tell() + line = manifest_file.readline() + + def partial_update(self, manifest, number): + assert os.path.exists(manifest), 'A manifest file not exists, index cannot be updated' + with open(manifest, 'r+') as manifest_file: + manifest_file.seek(self._index[number]) + line = manifest_file.readline() + while line: + if line.strip(): + self._index[number] = manifest_file.tell() + number += 1 + line = manifest_file.readline() + + def __getitem__(self, number): + assert 0 <= number < len(self), \ + 'A invalid index number: {}\nMax: {}'.format(number, len(self)) + return self._index[number] + + def __len__(self): + return len(self._index) + +class _ManifestManager(ABC): + BASE_INFORMATION = { + 'version' : 1, + 'type': 2, + } + def __init__(self, path, *args, **kwargs): + self._manifest = _Manifest(path) + + def _parse_line(self, line): + """ Getting a random line from the manifest file """ + with open(self._manifest.path, 'r') as manifest_file: + if isinstance(line, str): + assert line in self.BASE_INFORMATION.keys(), \ + 'An attempt to get non-existent information from the manifest' + for _ in range(self.BASE_INFORMATION[line]): + fline = manifest_file.readline() + return json.loads(fline)[line] + else: + assert self._index, 'No prepared index' + offset = self._index[line] + manifest_file.seek(offset) + properties = manifest_file.readline() + return json.loads(properties) + + def init_index(self): + self._index = _Index(os.path.dirname(self._manifest.path)) + if os.path.exists(self._index.path): + self._index.load() + else: + self._index.create(self._manifest.path, 3 if self._manifest.TYPE == 'video' else 2) + self._index.dump() + + @abstractmethod + def create(self, content, **kwargs): + pass + + @abstractmethod + def partial_update(self, number, properties): + pass + + def __iter__(self): + with open(self._manifest.path, 'r') as manifest_file: + manifest_file.seek(self._index[0]) + image_number = 0 + line = manifest_file.readline() + while line: + if not line.strip(): + continue + yield (image_number, json.loads(line)) + image_number += 1 + line = manifest_file.readline() + + @property + def manifest(self): + return self._manifest + + def __len__(self): + if hasattr(self, '_index'): + return len(self._index) + else: + return None + + def __getitem__(self, item): + return self._parse_line(item) + + @property + def index(self): + return self._index + +class VideoManifestManager(_ManifestManager): + def __init__(self, manifest_path, *args, **kwargs): + super().__init__(manifest_path) + setattr(self._manifest, 'TYPE', 'video') + self.BASE_INFORMATION['properties'] = 3 + + def create(self, content, **kwargs): + """ Creating and saving a manifest file """ + with open(self._manifest.path, 'w') as manifest_file: + base_info = { + 'version': self._manifest.VERSION, + 'type': self._manifest.TYPE, + 'properties': { + 'name': os.path.basename(content.source_path), + 'resolution': content.frame_sizes, + 'length': content.get_size(), + }, + } + for key, value in base_info.items(): + json_item = json.dumps({key: value}, separators=(',', ':')) + manifest_file.write(f'{json_item}\n') + + for item in content: + json_item = json.dumps({ + 'number': item[0], + 'pts': item[1], + 'checksum': item[2] + }, separators=(',', ':')) + manifest_file.write(f"{json_item}\n") + self._manifest.is_created = True + + def partial_update(self, number, properties): + pass + + @staticmethod + def prepare_meta(media_file, upload_dir=None, chunk_size=36, force=False): + source_path = os.path.join(upload_dir, media_file) if upload_dir else media_file + meta_info = VideoStreamReader(source_path=source_path) + meta_info.check_type_first_frame() + try: + meta_info.validate_frames_ratio(chunk_size) + except AssertionError: + if not force: + raise + meta_info.check_video_timestamps_sequences() + meta_info.save_key_frames() + meta_info.validate_seek_key_frames() + return meta_info + +#TODO: add generic manifest structure file validation +class ManifestValidator: + def validate_base_info(self): + with open(self._manifest.path, 'r') as manifest_file: + assert self._manifest.VERSION != json.loads(manifest_file.readline())['version'] + assert self._manifest.TYPE != json.loads(manifest_file.readline())['type'] + +class VideoManifestValidator(VideoManifestManager): + def __init__(self, **kwargs): + self.source_path = kwargs.pop('source_path') + super().__init__(self, **kwargs) + + def validate_key_frame(self, container, video_stream, key_frame): + for packet in container.demux(video_stream): + for frame in packet.decode(): + assert frame.pts == key_frame['pts'], "The uploaded manifest does not match the video" + return + + def validate_seek_key_frames(self): + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + last_key_frame = None + + for _, key_frame in self: + # check that key frames sequence sorted + if last_key_frame and last_key_frame['number'] >= key_frame['number']: + raise AssertionError('Invalid saved key frames sequence in manifest file') + container.seek(offset=key_frame['pts'], stream=video_stream) + self.validate_key_frame(container, video_stream, key_frame) + last_key_frame = key_frame + + def validate_frame_numbers(self): + with closing(av.open(self.source_path, mode='r')) as container: + video_stream = self._get_video_stream(container) + # not all videos contain information about numbers of frames + frames = video_stream.frames + if frames: + assert frames == self['properties']['length'], "The uploaded manifest does not match the video" + return + +class ImageManifestManager(_ManifestManager): + def __init__(self, manifest_path): + super().__init__(manifest_path) + setattr(self._manifest, 'TYPE', 'images') + + def create(self, content, **kwargs): + """ Creating and saving a manifest file""" + with open(self._manifest.path, 'w') as manifest_file: + base_info = { + 'version': self._manifest.VERSION, + 'type': self._manifest.TYPE, + } + for key, value in base_info.items(): + json_item = json.dumps({key: value}, separators=(',', ':')) + manifest_file.write(f'{json_item}\n') + + for item in content: + json_item = json.dumps({ + key: value for key, value in item.items() + }, separators=(',', ':')) + manifest_file.write(f"{json_item}\n") + self._manifest.is_created = True + + def partial_update(self, number, properties): + pass + + @staticmethod + def prepare_meta(sources, **kwargs): + meta_info = DatasetImagesReader(sources=sources, **kwargs) + meta_info.create() + return meta_info \ No newline at end of file diff --git a/utils/dataset_manifest/create.py b/utils/dataset_manifest/create.py new file mode 100644 index 000000000000..680052f0cf8a --- /dev/null +++ b/utils/dataset_manifest/create.py @@ -0,0 +1,91 @@ +# Copyright (C) 2021 Intel Corporation +# +# SPDX-License-Identifier: MIT +import argparse +import mimetypes +import os +import sys +from glob import glob + +def _define_data_type(media): + media_type, _ = mimetypes.guess_type(media) + if media_type: + return media_type.split('/')[0] + +def _is_video(media_file): + return _define_data_type(media_file) == 'video' + +def _is_image(media_file): + return _define_data_type(media_file) == 'image' + +def get_args(): + parser = argparse.ArgumentParser() + parser.add_argument('--force', action='store_true', + help='Use this flag to prepare the manifest file for video data ' + 'if by default the video does not meet the requirements and a manifest file is not prepared') + parser.add_argument('--output-dir',type=str, help='Directory where the manifest file will be saved', + default=os.getcwd()) + parser.add_argument('source', type=str, help='Source paths') + return parser.parse_args() + +def main(): + args = get_args() + + manifest_directory = os.path.abspath(args.output_dir) + os.makedirs(manifest_directory, exist_ok=True) + source = os.path.abspath(args.source) + + sources = [] + if not os.path.isfile(source): # directory/pattern with images + data_dir = None + if os.path.isdir(source): + data_dir = source + for root, _, files in os.walk(source): + sources.extend([os.path.join(root, f) for f in files if _is_image(f)]) + else: + items = source.lstrip('/').split('/') + position = 0 + try: + for item in items: + if set(item) & {'*', '?', '[', ']'}: + break + position += 1 + else: + raise Exception('Wrong positional argument') + assert position != 0, 'Wrong pattern: there must be a common root' + data_dir = source.split(items[position])[0] + except Exception as ex: + sys.exit(str(ex)) + sources = list(filter(_is_image, glob(source, recursive=True))) + try: + assert len(sources), 'A images was not found' + manifest = ImageManifestManager(manifest_path=manifest_directory) + meta_info = manifest.prepare_meta(sources=sources, is_sorted=False, + use_image_hash=True, data_dir=data_dir) + manifest.create(meta_info) + except Exception as ex: + sys.exit(str(ex)) + else: # video + try: + assert _is_video(source), 'You can specify a video path or a directory/pattern with images' + manifest = VideoManifestManager(manifest_path=manifest_directory) + try: + meta_info = manifest.prepare_meta(media_file=source, force=args.force) + except AssertionError as ex: + if str(ex) == 'Too few keyframes': + msg = 'NOTE: prepared manifest file contains too few key frames for smooth decoding.\n' \ + 'Use --force flag if you still want to prepare a manifest file.' + print(msg) + sys.exit(2) + else: + raise + manifest.create(meta_info) + except Exception as ex: + sys.exit(str(ex)) + + print('The manifest file has been prepared') +if __name__ == "__main__": + base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) + sys.path.append(base_dir) + from dataset_manifest.core import VideoManifestManager, ImageManifestManager + main() \ No newline at end of file diff --git a/utils/dataset_manifest/requirements.txt b/utils/dataset_manifest/requirements.txt new file mode 100644 index 000000000000..1089a5f0a331 --- /dev/null +++ b/utils/dataset_manifest/requirements.txt @@ -0,0 +1,3 @@ +av==8.0.2 --no-binary=av +opencv-python-headless==4.4.0.42 +Pillow==7.2.0 \ No newline at end of file diff --git a/utils/dataset_manifest/utils.py b/utils/dataset_manifest/utils.py new file mode 100644 index 000000000000..c5e9feeac1d1 --- /dev/null +++ b/utils/dataset_manifest/utils.py @@ -0,0 +1,24 @@ +# Copyright (C) 2021 Intel Corporation +# +# SPDX-License-Identifier: MIT +import hashlib +import cv2 as cv +from av import VideoFrame + +def rotate_image(image, angle): + height, width = image.shape[:2] + image_center = (width/2, height/2) + matrix = cv.getRotationMatrix2D(image_center, angle, 1.) + abs_cos = abs(matrix[0,0]) + abs_sin = abs(matrix[0,1]) + bound_w = int(height * abs_sin + width * abs_cos) + bound_h = int(height * abs_cos + width * abs_sin) + matrix[0, 2] += bound_w/2 - image_center[0] + matrix[1, 2] += bound_h/2 - image_center[1] + matrix = cv.warpAffine(image, matrix, (bound_w, bound_h)) + return matrix + +def md5_hash(frame): + if isinstance(frame, VideoFrame): + frame = frame.to_image() + return hashlib.md5(frame.tobytes()).hexdigest() # nosec \ No newline at end of file diff --git a/utils/prepare_meta_information/README.md b/utils/prepare_meta_information/README.md deleted file mode 100644 index 67f6fff7146f..000000000000 --- a/utils/prepare_meta_information/README.md +++ /dev/null @@ -1,30 +0,0 @@ -# Simple command line for prepare meta information for video data - -**Usage** - -```bash -usage: prepare.py [-h] [-chunk_size CHUNK_SIZE] video_file meta_directory - -positional arguments: - video_file Path to video file - meta_directory Directory where the file with meta information will be saved - -optional arguments: - -h, --help show this help message and exit - -chunk_size CHUNK_SIZE - Chunk size that will be specified when creating the task with specified video and generated meta information -``` - -**NOTE**: For smooth video decoding, the `chunk size` must be greater than or equal to the ratio of number of frames -to a number of key frames. -You can understand the approximate `chunk size` by preparing and looking at the file with meta information. - -**NOTE**: If ratio of number of frames to number of key frames is small compared to the `chunk size`, -then when creating a task with prepared meta information, you should expect that the waiting time for some chunks -will be longer than the waiting time for other chunks. (At the first iteration, when there is no chunk in the cache) - -**Examples** - -```bash -python prepare.py ~/Documents/some_video.mp4 ~/Documents -``` diff --git a/utils/prepare_meta_information/prepare.py b/utils/prepare_meta_information/prepare.py deleted file mode 100644 index 0cd200a0c866..000000000000 --- a/utils/prepare_meta_information/prepare.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright (C) 2020 Intel Corporation -# -# SPDX-License-Identifier: MIT -import argparse -import sys -import os - -def get_args(): - parser = argparse.ArgumentParser() - parser.add_argument('video_file', - type=str, - help='Path to video file') - parser.add_argument('meta_directory', - type=str, - help='Directory where the file with meta information will be saved') - parser.add_argument('-chunk_size', - type=int, - help='Chunk size that will be specified when creating the task with specified video and generated meta information') - - return parser.parse_args() - -def main(): - args = get_args() - try: - smooth_decoding = prepare_meta_for_upload(prepare_meta, args.video_file, None, args.meta_directory, args.chunk_size) - print('Meta information for video has been prepared') - - if smooth_decoding != None and not smooth_decoding: - print('NOTE: prepared meta information contains too few key frames for smooth decoding.') - except Exception: - print('Impossible to prepare meta information') - -if __name__ == "__main__": - base_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) - sys.path.append(base_dir) - from cvat.apps.engine.prepare import prepare_meta, prepare_meta_for_upload - main() \ No newline at end of file