Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Avoid uploading data if samples exist in CVAT connected fileshare #1235

Open
ehofesmann opened this issue Aug 30, 2021 · 2 comments
Open
Labels
enhancement Code enhancement

Comments

@ehofesmann
Copy link
Member

CVAT allows files to be accessed either by:

  1. Uploading local files (What is currently done)
  2. Uploaded through remote URLs
  3. Accessed directly through a mounted file share

image

If the data that is being uploaded exists in a file share connected to CVAT, then it would be preferable to not upload the data to the server. This is especially important in cases where a large number of images or videos are being annotated at one time.

Adding this should be fairly simple. It would require updating this to allow for shared files:

files = {}
for idx, path in enumerate(paths):
# IMPORTANT: CVAT organizes media within a task alphabetically by
# filename, so we must give CVAT filenames whose alphabetical order
# matches the order of `paths`
filename = "%06d%s" % (idx, os.path.splitext(path)[1])
files["client_files[%d]" % idx] = (filename, open(path, "rb"))

 files = {}
 for idx, path in enumerate(paths): 
     # IMPORTANT: CVAT organizes media within a task alphabetically by 
     # filename, so we must give CVAT filenames whose alphabetical order 
     # matches the order of `paths` 
     filename = "%06d%s" % (idx, os.path.splitext(path)[1]) 
     if use_fileshare:
         data["server_files[%d]" % idx] = (filename, path)
     else:
         files["client_files[%d]" % idx] = (filename, open(path, "rb")) 

In order to use the correct path, it would be straightforward to follow the Alternate media workflow and store the file share path to every sample as a field on the FiftyOne dataset.

Other points to consider are the options to copy_data and use_cache that will likely need to be incorporated to avoid copying data even for media in the file share. cvat-ai/cvat#3544

@Huy2122k
Copy link

Huy2122k commented Sep 10, 2022

@ehofesmann
I think CVAT has new sort feature in a task in PR: #cvat-ai/cvat#3937

We can use sorting method: Predefined with server_files and storage = "share", storage_method ="cache" to avoid any uploading or copy file.

But I found a bug (maybe) when using sort: Predefined: the files order in task created was reversed ... (so confused?)
image

Finally

We can simple modify funtion upload_data in utils/cvat.py like:
image

and shared_path is path to the shared folder contain images.

Its work for me !

Hope it helps anyone who has problems uploading large quantities of images.

@thiagoribeirodamotta
Copy link

thiagoribeirodamotta commented Jul 4, 2024

Was this ever integrated to the main branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code enhancement
Projects
None yet
Development

No branches or pull requests

3 participants