Support ownCloud as cloud storage provider #6028

bruluk5w · 2023-04-14T13:01:04Z

My actions before raising this issue

[y] Read/searched the docs
[y] Searched past issues

We are annotating a dataset with multiple possibly large videos. Task creation takes forever for large videos (>5GB). I saw that CVAT uses manifest files which can be generated in advance. I thought that generating these will be my solution. However, I have the dataset currently on a shared volume and it seems to not be possible to select the manifest file when creating a task. I had a look to the code and indeed the UI only allows to use manifest files for image datasets (Is this intended or should this also work for videos?).
Further reading though the documentation made me realize that apparently the intended way is having the manifest files and videos on some cloud storage, which would also be beneficial for us.
However, we use an internal ownCloud instance.

In case using a cloud storage is the right intended way to use the optimization that manifest files provide for videos, would it make sense to implement support for ownCloud as a cloud storage provider?

I am by no means good at frontend stuff but there are other cloud providers already implemented and I may have some time on my hand to do it within a university project.

Thank you and looking forward to your feedback.

Marishka17 · 2023-05-08T12:49:25Z

@bruluk5w, Hello,

It's possible to use the manifest file for a video from share point.
But I should say that it suits not for all videos, only for those that have an appropriate number of I frames.
You can generate the manifest by command:

docker run -it --rm -u "$(id -u)":"$(id -g)" \
  -v ~/Documents/data/:${HOME}/dir_with_video/:rw \
  --entrypoint '/usr/bin/bash' \
  cvat/server \
  utils/dataset_manifest/create.py --output-dir ~/dir_with_video/ ~/dir_with_video/video.mp4

You can use --force option If a video doesn't contain enough key frames but it can work not optimally.
Please see the documentation about manifest here.

Once you've created the manifest file for your video you'll need to move it to share point.
Then you can create a task e.g. using CVAT SDK this way:

from cvat_sdk import make_client
from cvat_sdk.core.proxies.tasks import ResourceType

with make_client(host="localhost", port=8080, credentials=('user', 'password')) as client:
    task_spec = {
        "name": "example task",
        "labels": [
            {
                "name": "car",
            }
        ],
    }

    task = client.tasks.create_from_data(
        spec=task_spec,
        resource_type=ResourceType.SHARE,
        resources=['video.mp4', 'manifest.jsonl'],
    )
    print(task)

Now you cannot create a such task in CVAT UI, it's a bug and will be fixed in PR.
Please, see the SDK documentation here.

bruluk5w · 2023-05-19T09:55:44Z

Hi,
Thank you for the hint!
I tried creating the tasks with the sdk.
However when I create a task, my ram consumption explodes and I run out of memory (may machine has 32GB).
I created the task roughly like this

OVERLAP = 1
SEGMENT_SIZE = 10 * 60 * 20  # 10 minute chunks at 20fps

def main(dataset_directory: Path, url: str, user: str, pwd: str):
    with cvat_sdk.make_client(host=url, credentials=(user, pwd)) as client:
        project_id = get_project_id(client)

        for dirpath, dirnames, filenames in walk(dataset_directory):
            for filename in filenames:
                if path.splitext(filename)[1] == '.mp4':
                    full_path = path.join(dirpath, filename)
                    view_name = path.basename(dirpath)
                    take_name = path.basename(path.dirname(dirpath))

                    task_spec = {
                        'name': 'test{}_{}'.format(take_name, view_name),  # e.g.: '1_center'
                        'project_id': project_id,
                        'overlap': OVERLAP,
                        'segment_size': SEGMENT_SIZE + OVERLAP
                    }

                    manifest_file_path = path.join(dirpath, 'manifest.jsonl')
                    if not path.exists(manifest_file_path):
                        print("ERROR: Could not find manifest.jsonl next to media file {}! Skipping...".format(
                            full_path))
                        continue

                    relative_path_media_file = path.relpath(full_path, dataset_directory)
                    relative_path_manifest_file = path.relpath(manifest_file_path, dataset_directory)

                    task = client.tasks.create_from_data(
                        spec=task_spec,
                        resource_type=ResourceType.SHARE,
                        resources=[relative_path_media_file, relative_path_manifest_file],
                        data_params={
                            'image_quality': 50,
                            'chunk_size': SEGMENT_SIZE
                        }
                    )

I tired to create the tasks without compression by specifying image_quality to be 100, because I thought that maybe the data duplication for the compressed video could be an issue. But I ran into this bug:
#4461
I removed the 'profile' argument to the encoder as suggested and it indeed created the task and even the preview image on the task page appears properly.
However when I open the task (in chrome) the the ram usage jumps by >16GB and crashes the browser tab.
It happens on postMessage to the H264Decoder here:

The error is:

Uncaught (in promise) DOMException: Failed to execute 'postMessage' on 'Worker': Data cannot be cloned, out of memory.
    at http://localhost:8080/assets/cvat-ui.bc7ca4a9b695ca2ae70e.min.js:2:4673214
    at Array.forEach (<anonymous>)
    at nn.startDecode (http://localhost:8080/assets/cvat-ui.bc7ca4a9b695ca2ae70e.min.js:2:4673199)

I then tried to reduce the number of frames by setting frame_step to 5 and only choosing 1 minute segments (=10x less frames), but with the same out of memory error. Then I tried setting the chunk_size to a lower value (20 frames) and I did not get the error anymore. But loading the task page now still takes up roughly 2GB and the page loads indefinitely...

I am a little bit at my wit's end and will probably move to another tool to get my job done for now, but if I can help with something to make CVAT work better with longer videos then I would like to help.

### Motivation and context Resolved #6037 Related #4400 Related #6028 ![image](https://user-images.githubusercontent.com/49038720/236890662-c44b578e-5808-4fde-a216-2dcab6e95ab0.png) Co-authored-by: Boris Sekachev <boris.sekachev@yandex.ru> Co-authored-by: Boris Sekachev <sekachev.bs@gmail.com> Co-authored-by: Roman Donchenko <roman@cvat.ai> Co-authored-by: Nikita Manovich <nikita@cvat.ai>

### Motivation and context Resolved cvat-ai#6037 Related cvat-ai#4400 Related cvat-ai#6028 ![image](https://user-images.githubusercontent.com/49038720/236890662-c44b578e-5808-4fde-a216-2dcab6e95ab0.png) Co-authored-by: Boris Sekachev <boris.sekachev@yandex.ru> Co-authored-by: Boris Sekachev <sekachev.bs@gmail.com> Co-authored-by: Roman Donchenko <roman@cvat.ai> Co-authored-by: Nikita Manovich <nikita@cvat.ai>

pktiuk · 2023-07-20T10:52:12Z

Maybe instead supporting OwnCloud it would be better to add support a bit more universal protocol used by OwnCloud like WebDAV?

pktiuk · 2024-03-05T11:07:01Z

closed this as completed

@bsekachev Is it implemented?

Marishka17 mentioned this issue May 8, 2023

Improve task creation with cloud storage and share data #6074

Merged

16 tasks

AnnaPetrovicheva mentioned this issue Oct 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.0 #7077

Closed

bsekachev closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ownCloud as cloud storage provider #6028

Support ownCloud as cloud storage provider #6028

bruluk5w commented Apr 14, 2023

Marishka17 commented May 8, 2023 •

edited

Loading

bruluk5w commented May 19, 2023 •

edited

Loading

pktiuk commented Jul 20, 2023

pktiuk commented Mar 5, 2024

Support ownCloud as cloud storage provider #6028

Support ownCloud as cloud storage provider #6028

Comments

bruluk5w commented Apr 14, 2023

My actions before raising this issue

Marishka17 commented May 8, 2023 • edited Loading

bruluk5w commented May 19, 2023 • edited Loading

pktiuk commented Jul 20, 2023

pktiuk commented Mar 5, 2024

Marishka17 commented May 8, 2023 •

edited

Loading

bruluk5w commented May 19, 2023 •

edited

Loading