Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import dataset from FiftyOne #1344

Closed
mmoollllee opened this issue Mar 14, 2024 · 2 comments
Closed

Import dataset from FiftyOne #1344

mmoollllee opened this issue Mar 14, 2024 · 2 comments
Assignees

Comments

@mmoollllee
Copy link

Is your feature request related to a problem? Please describe.

First time useing datumaro. My goal ist to tile my dataset. I'm already failing with importing the dataset. I exported it with FiftyOne to COCO and CVAT Images so far. Both import attempts in Datumaro Python failed with errors like this.
I'm wondering what the best workflow is to transfer a dataset from Voxel51 FiftyOne to Datumaro?

Dataset.import_from(args.path, "cvat")
# datumaro.components.errors.DatasetImportError: Failed to import dataset 'cvat' at '/Users/user/Desktop/dataset-folder'.
Dataset.import_from(args.path, "coco_instances")
# datumaro.components.errors.DatasetImportError: Failed to import dataset 'coco_instances' at '/Users/user/Desktop/dataset-folder'.

Describe the solution you'd like

Best case
Provide a API to download data directly from FiftyOne.

Or
A example of how successfully import a dataset after exporting it from Fiftyone.

Additional context

I've tried COCO, VOC, CVAT & Yolo Format. None worked, but with different output using datum dinfo:

# YOLO
length: 157
categories: 
subsets: default
  'default':
    length: 157
    categories: 
# CVAT
Failed to import dataset 'cvat' at '/Users/user/Desktop/dataset-folder'.
# VOC
length: 157
categories: 
subsets: default
  'default':
    length: 157
    categories: 

As it doesn't print categories I checked the dataset.yaml in the YOLO Dataset:

names:
  0: van
  1: person
  ...
path: /Users/user/Desktop/dataset-folder
val: ./images/val/

With some folder structure and file renaming I get these errors with datum dinfo and COCO format.

2024-03-14 10:31:50,910 WARNING: Category id of '0' is reserved for no class (background) but category named 'bike' with id of '0' is found in /Users/user/Desktop/dataset-folder/annotations/instances_default.json. Please be warned that annotations with category id of '0' would have `None` as label. (https://openvinotoolkit.github.io/datumaro/latest/docs/explanation/formats/coco.html#import-coco-dataset)
2024-03-14 10:31:50,911 ERROR: Failed to parse revspec:
  Can't find project at 'Failed to find project at '/Users/user/Desktop/dataset-folder. Specify project path with '-p/--project' or in the target pathspec.'
  Failed to import dataset 'coco' at '/Users/user/Desktop/dataset-folder'.
@wonjuleee
Copy link
Contributor

Hi @mmoollllee, thank you for bring this issue!
From our investigation, there are two problems when importing FiftyOne exported dataset into Datumaro:

  1. As you observed, the directory structure of FiftyOne exported dataset is not same to the original COCO's one.
    Here is the document of COCO directory structure https://openvinotoolkit.github.io/datumaro/latest/docs/data-formats/formats/coco.html.
  2. Datumaro coco importer should have a "segmentation" field in the annotation file, but fo.types.COCODetectionDataset contains only a "bbox". Therefore, it requires to fix Datumaro code slightly. The PR is given by Enable coco format to import bbox annotations #1360 and this will be released in 2.0.0 within 2 weeks.

In order to resolve the problem 1, we have two different solutions.
The first one is to modify the directory structure and the name of annotation file as
image
Then, we can import the dataset as

dm_dataset = dm.Dataset.import_from("fo_cocodet_modified", "coco_instances")

The next one is leveraging the Kaggle format, which is newly introduced in Datumaro 2.0.0 to import an unregularized data as FiftyOne did. Instead, we require to add more argument for the use of Kaggle importer in Datumaro as below.

dm_dataset = dm.Dataset.import_from(path="./fo_cocodet/data", format="kaggle_coco", ann_file="./fo_cocodet/labels.json")

As a result, please use Datumaro 2.0.0 for importing FiftyOne exported dataset with one of above code lines.
Thank you for your attention!

@mmoollllee
Copy link
Author

Awesome! Thanks for your feedback. I'll retry as soon as v2.0.0 is out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants