Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Cloud manifest improvements [Duplicate] #4951

Closed
2 tasks done
ehofesmann opened this issue Sep 14, 2022 · 5 comments
Closed
2 tasks done

[Feature Request] Cloud manifest improvements [Duplicate] #4951

ehofesmann opened this issue Sep 14, 2022 · 5 comments

Comments

@ehofesmann
Copy link
Contributor

My actions before raising this issue

The current way of generating cloud manifests to connect to cloud storage is difficult to use in practice due to needing to pull the data locally to generate a manifest.

Current Behaviour

One of the appeals of being able to use cloud-backed media is that you can create a task in CVAT that directly points to the media in a cloud bucket without ever needing to worry about downloading the media locally. The current manifest approach requiring you to download media locally to generate a manifest removes this benefit.

Expected Behaviour/Possible Solution

  1. Create a standalone pip package allowing you to generate cloud manifests. This would at least allow other packages to generate manifests automatically for tasks that are being created in CVAT without needing to clone the repo or access the server to use the create.py file.

  2. Allow cloud manifests to be computed directly from media in the cloud, without needing to pull large datasets locally. This may require some refactoring to remove the need for checksums which have to touch all of the pixels of the media.

  3. (Ideally) Remove the need for cloud manifests entirely. This would require quite a bit of refactoring, but since CVAT is storing the cloud credentials anyway, it is possible to parse a cloud bucket and access media directly when needed rather than requiring the user to perform any pre-processing. This is by far the most user-friendly way to connect cloud media to CVAT.

Context

Many of the users of FiftyOne use the integration it provides with CVAT. FiftyOne provides a way to manage datasets, some of which can be in the cloud. For some users with large amounts of data, there may be thousands of new media samples being generated each day that need to be processed and annotated. They use FiftyOne to explore these datasets and find subsets that need to be sent for annotation (generally through CVAT).

Currently, most users just let FiftyOne download the media for that annotation run locally, then upload that media to CVAT. This is not ideal since it requires both a download and upload of media to two tools (FiftyOne Teams/CVAT) that both support accessing data directly from the cloud. Ideally, it would be possible for users to create a CVAT annotation task for a cloud-backed FiftyOne dataset, then only transfer the paths to the relevant cloud media for that task from FiftyOne to CVAT without ever needing to download the data itself.

FiftyOne Teams does support the current implementation of cloud support in CVAT, but it assumes that the user has already generated a manifest file for the required media. This means that the user has to manually download data they want to annotate and generate the manifest files prior to creating any annotation tasks, which becomes difficult for users that have large amounts of data regularly being added to their cloud buckets.

@ehofesmann
Copy link
Contributor Author

Ah, just looked again and found this duplicate: #4400

I will close this.

@ehofesmann ehofesmann changed the title [Feature Request] Cloud manifest improvements [Feature Request] Cloud manifest improvements [Duplicate] Sep 14, 2022
@nmanovic
Copy link
Contributor

@ehofesmann , in any case it is great to see your opinion about that. We have plans to fix it in the future. Our team discussed the problem mutliple times. We want to make the manifest optional. The biggest problem is getting HxW of an image to be honest. Need to get them on the fly.

BTW, one of our engineers wrote you about cvat-sdk. Has somebody from FiftyOne team had a chance to look at it?

@ehofesmann
Copy link
Contributor Author

@nmanovic Awesome, glad to know that you're thinking the same thing! We also have the same issue of needing to get HxW of cloud-backed media in order to properly visualize samples in the FiftyOne App. It is possible to just stream media from a cloud bucket to get the metadata without needing to download the whole thing.
This may be helpful in that regard to automatically be able to get metadata without the user needing to generate it themselves. We built a way to stream videos as well and access metadata like number of frames.

I was actually opening these issues to respond to the engineer that reached out, haha. I am really excited to potentially switch over to the CVAT sdk, but it will take a pretty big rewrite of the integration on the FiftyOne side.

My main question is if the sdk is backwards compatible with CVATv1 servers? We still have a number of users that haven't upgraded to CVATv2.

@nmanovic
Copy link
Contributor

@ehofesmann , it isn't. We will need to move all these users to the new version of CVAT. We don't support old versions for now, they may have some security problems. It is better to upgrade.

1 similar comment
@nmanovic
Copy link
Contributor

@ehofesmann , it isn't. We will need to move all these users to the new version of CVAT. We don't support old versions for now, they may have some security problems. It is better to upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants