Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ownCloud as cloud storage provider #6028

Closed
bruluk5w opened this issue Apr 14, 2023 · 4 comments
Closed

Support ownCloud as cloud storage provider #6028

bruluk5w opened this issue Apr 14, 2023 · 4 comments

Comments

@bruluk5w
Copy link

My actions before raising this issue

We are annotating a dataset with multiple possibly large videos. Task creation takes forever for large videos (>5GB). I saw that CVAT uses manifest files which can be generated in advance. I thought that generating these will be my solution. However, I have the dataset currently on a shared volume and it seems to not be possible to select the manifest file when creating a task. I had a look to the code and indeed the UI only allows to use manifest files for image datasets (Is this intended or should this also work for videos?).
Further reading though the documentation made me realize that apparently the intended way is having the manifest files and videos on some cloud storage, which would also be beneficial for us.
However, we use an internal ownCloud instance.

In case using a cloud storage is the right intended way to use the optimization that manifest files provide for videos, would it make sense to implement support for ownCloud as a cloud storage provider?

I am by no means good at frontend stuff but there are other cloud providers already implemented and I may have some time on my hand to do it within a university project.

Thank you and looking forward to your feedback.

@Marishka17
Copy link
Contributor

Marishka17 commented May 8, 2023

@bruluk5w, Hello,

It's possible to use the manifest file for a video from share point.
But I should say that it suits not for all videos, only for those that have an appropriate number of I frames.
You can generate the manifest by command:

docker run -it --rm -u "$(id -u)":"$(id -g)" \
  -v ~/Documents/data/:${HOME}/dir_with_video/:rw \
  --entrypoint '/usr/bin/bash' \
  cvat/server \
  utils/dataset_manifest/create.py --output-dir ~/dir_with_video/ ~/dir_with_video/video.mp4

You can use --force option If a video doesn't contain enough key frames but it can work not optimally.
Please see the documentation about manifest here.

Once you've created the manifest file for your video you'll need to move it to share point.
Then you can create a task e.g. using CVAT SDK this way:

from cvat_sdk import make_client
from cvat_sdk.core.proxies.tasks import ResourceType

with make_client(host="localhost", port=8080, credentials=('user', 'password')) as client:
    task_spec = {
        "name": "example task",
        "labels": [
            {
                "name": "car",
            }
        ],
    }

    task = client.tasks.create_from_data(
        spec=task_spec,
        resource_type=ResourceType.SHARE,
        resources=['video.mp4', 'manifest.jsonl'],
    )
    print(task)

Now you cannot create a such task in CVAT UI, it's a bug and will be fixed in PR.
Please, see the SDK documentation here.

@bruluk5w
Copy link
Author

bruluk5w commented May 19, 2023

Hi,
Thank you for the hint!
I tried creating the tasks with the sdk.
However when I create a task, my ram consumption explodes and I run out of memory (may machine has 32GB).
I created the task roughly like this

OVERLAP = 1
SEGMENT_SIZE = 10 * 60 * 20  # 10 minute chunks at 20fps

def main(dataset_directory: Path, url: str, user: str, pwd: str):
    with cvat_sdk.make_client(host=url, credentials=(user, pwd)) as client:
        project_id = get_project_id(client)

        for dirpath, dirnames, filenames in walk(dataset_directory):
            for filename in filenames:
                if path.splitext(filename)[1] == '.mp4':
                    full_path = path.join(dirpath, filename)
                    view_name = path.basename(dirpath)
                    take_name = path.basename(path.dirname(dirpath))

                    task_spec = {
                        'name': 'test{}_{}'.format(take_name, view_name),  # e.g.: '1_center'
                        'project_id': project_id,
                        'overlap': OVERLAP,
                        'segment_size': SEGMENT_SIZE + OVERLAP
                    }

                    manifest_file_path = path.join(dirpath, 'manifest.jsonl')
                    if not path.exists(manifest_file_path):
                        print("ERROR: Could not find manifest.jsonl next to media file {}! Skipping...".format(
                            full_path))
                        continue

                    relative_path_media_file = path.relpath(full_path, dataset_directory)
                    relative_path_manifest_file = path.relpath(manifest_file_path, dataset_directory)

                    task = client.tasks.create_from_data(
                        spec=task_spec,
                        resource_type=ResourceType.SHARE,
                        resources=[relative_path_media_file, relative_path_manifest_file],
                        data_params={
                            'image_quality': 50,
                            'chunk_size': SEGMENT_SIZE
                        }
                    )

I tired to create the tasks without compression by specifying image_quality to be 100, because I thought that maybe the data duplication for the compressed video could be an issue. But I ran into this bug:
#4461
I removed the 'profile' argument to the encoder as suggested and it indeed created the task and even the preview image on the task page appears properly.
However when I open the task (in chrome) the the ram usage jumps by >16GB and crashes the browser tab.
It happens on postMessage to the H264Decoder here:
Screenshot from 2023-05-19 11-29-44

The error is:

Uncaught (in promise) DOMException: Failed to execute 'postMessage' on 'Worker': Data cannot be cloned, out of memory.
    at http://localhost:8080/assets/cvat-ui.bc7ca4a9b695ca2ae70e.min.js:2:4673214
    at Array.forEach (<anonymous>)
    at nn.startDecode (http://localhost:8080/assets/cvat-ui.bc7ca4a9b695ca2ae70e.min.js:2:4673199)

I then tried to reduce the number of frames by setting frame_step to 5 and only choosing 1 minute segments (=10x less frames), but with the same out of memory error. Then I tried setting the chunk_size to a lower value (20 frames) and I did not get the error anymore. But loading the task page now still takes up roughly 2GB and the page loads indefinitely...

I am a little bit at my wit's end and will probably move to another tool to get my job done for now, but if I can help with something to make CVAT work better with longer videos then I would like to help.

nmanovic added a commit that referenced this issue May 20, 2023
### Motivation and context
Resolved #6037
Related #4400
Related #6028


![image](https://user-images.githubusercontent.com/49038720/236890662-c44b578e-5808-4fde-a216-2dcab6e95ab0.png)

Co-authored-by: Boris Sekachev <boris.sekachev@yandex.ru>
Co-authored-by: Boris Sekachev <sekachev.bs@gmail.com>
Co-authored-by: Roman Donchenko <roman@cvat.ai>
Co-authored-by: Nikita Manovich <nikita@cvat.ai>
mikhail-treskin pushed a commit to retailnext/cvat that referenced this issue Jul 1, 2023
### Motivation and context
Resolved cvat-ai#6037
Related cvat-ai#4400
Related cvat-ai#6028


![image](https://user-images.githubusercontent.com/49038720/236890662-c44b578e-5808-4fde-a216-2dcab6e95ab0.png)

Co-authored-by: Boris Sekachev <boris.sekachev@yandex.ru>
Co-authored-by: Boris Sekachev <sekachev.bs@gmail.com>
Co-authored-by: Roman Donchenko <roman@cvat.ai>
Co-authored-by: Nikita Manovich <nikita@cvat.ai>
@pktiuk
Copy link
Contributor

pktiuk commented Jul 20, 2023

Maybe instead supporting OwnCloud it would be better to add support a bit more universal protocol used by OwnCloud like WebDAV?

@pktiuk
Copy link
Contributor

pktiuk commented Mar 5, 2024

closed this as completed

@bsekachev Is it implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants