-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection for nested folders #623
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a test and an error, if the directory does not exist
def test_can_detect_with_nested_folder(self): | ||
env = Environment() | ||
env.importers.items = {DEFAULT_FORMAT: env.importers[DEFAULT_FORMAT]} | ||
env.extractors.items = {DEFAULT_FORMAT: env.extractors[DEFAULT_FORMAT]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is generally fine, but consider extending it with a generic "fits everything" format (like image dir
) to test for collisions and that we continue search till we find one match. Maybe, it should be in another test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's look that such situation when we have more than one match and continue search till we find one match, will occurs never. Because we continue search if our path has only one file system object and this object it's a directory, but it for this path we have more than one match it's mean that this path most likely contains more than one file system object, so that these two situations are contradictory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, we could detect "image_dir" and "imagenet" in any root image directory, but only if we go deeper, we we'll detect a COCO dataset, for example. I know it is implemented already, just asking for a test that ensures this logic works.
Example:
a/annotations/x_instances.json
a/images/y.jpg
^
here we detect few general matching formats (currently, imagedir, imagenet, celeba, align_celeba, cifar etc.)
And inside a
we detect COCO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this test.
datumaro/components/dataset.py
Outdated
@@ -911,6 +911,9 @@ def detect(path: str, env: Optional[Environment] = None, | |||
if env is None: | |||
env = Environment() | |||
|
|||
if not osp.exists(path): | |||
raise FileNotFoundError(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving it to Environment.detect_dataset()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it, but it looks like it will affect the #576 a bit.
Summary
Datasets are often contains in archives and users usually extract them in a separate directory. This leads to the fact that the dataset can be stored in subfolders, so this PR resolved problem when Datumaro cannot detect dataset format with such folder structure.
How to test
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.