Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iNaturalist dataset #4123

Merged
merged 7 commits into from
Jul 1, 2021
Merged

Add iNaturalist dataset #4123

merged 7 commits into from
Jul 1, 2021

Conversation

dgenzel
Copy link
Contributor

@dgenzel dgenzel commented Jun 25, 2021

Adding iNaturalist dataset from https://github.com/visipedia/inat_comp
This relies on the data files only, not using annotations.

Resolves #3292

@pmeier pmeier mentioned this pull request Jun 28, 2021
17 tasks
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dgenzel2 and thanks for the PR! While INaturalist is on the list of potential new datasets in #3562, I don't recall any decision on this. Did I miss something?

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the quick PR Dmitriy!

@pmeier I've discussed with Dmitriy about working on this dataset as a good onboarding task. We've decided to only provide the labels for now, and not the bounding boxes.

I've done an initial pass and the PR looks good to me.

I made a few minor comments, but I'll leave @pmeier do a more thorough review.

torchvision/datasets/inaturalist.py Show resolved Hide resolved
torchvision/datasets/__init__.py Show resolved Hide resolved
torchvision/datasets/inaturalist.py Show resolved Hide resolved
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. I got some comments inline. Plus, could you add the dataset to the documentation?

torchvision/datasets/inaturalist.py Outdated Show resolved Hide resolved
torchvision/datasets/inaturalist.py Outdated Show resolved Hide resolved
torchvision/datasets/inaturalist.py Outdated Show resolved Hide resolved
@dgenzel
Copy link
Contributor Author

dgenzel commented Jun 30, 2021

It turned out that the format for earlier years was different, so I had to make some changes. But now download is supported, and I verified it manually.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking image / video folders for integrity is not feasible, so we normally go another way: we skip the integrity check completely and bail out if we encounter already extracted folders together with download=True:

if path.exists(self.split_folder):
raise RuntimeError(
f"The directory {self.split_folder} already exists. "
f"If you want to re-download or re-extract the images, delete the directory."
)

IMO we should adopt the same approach here, to avoid accidentally downloading again.

torchvision/datasets/inaturalist.py Outdated Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking pretty good, thanks a lot Dmitriy!

I've left a minor comment that can be addressed in follow-up PRs. @pmeier I'm merging this PR, but let us know if you have further comments and we can address it in a follow-up PR


ADDITIONAL_CONFIGS = datasets_utils.combinations_grid(
target_type=("kingdom", "full", "genus", ["kingdom", "phylum", "class", "order", "family", "genus", "full"]),
version=("2021_train",),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, it would be good to also test for the other years, as they contain different code-paths in the initialization phase

@fmassa fmassa merged commit ef71159 into master Jul 1, 2021
@fmassa fmassa deleted the inat branch July 1, 2021 18:33
@fmassa
Copy link
Member

fmassa commented Jul 1, 2021

Failures are unrelated, merging

@github-actions
Copy link

github-actions bot commented Jul 1, 2021

Hey @fmassa!

You merged this PR, but no labels were added.

facebook-github-bot pushed a commit that referenced this pull request Jul 12, 2021
Summary:
* Add iNaturalist dataset

* Add download support

* address comments

Reviewed By: fmassa

Differential Revision: D29659493

fbshipit-source-id: 9bdb53c24aeb6fdba9cf0604f1f824ed506d3c89

Co-authored-by: dgenzel <dgenzel@fb.com>
Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
@ferreirafabio
Copy link

The torchvision iNaturalist dataset code does not allow to load the test split, e.g. 2017 or 2018 test split. What's the suggestion how to use the torchvision code when one also needs the test split?

@pmeier
Copy link
Collaborator

pmeier commented Sep 19, 2022

What's the suggestion how to use the torchvision code when one also needs the test split?

Unfortunately, there is none at the moment. We are working on revamping our datasets API after which all splits will be supported. But this is not ready yet.

We could introduce the test splits on the current API by returning None for the labels. Some of our datasets already do this, but this is not supported by the default collation. @ferreirafabio Could you open an issue, so we can discuss there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add iNaturalist dataset
5 participants