-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EuroSAT100: add new dataset/datamodule #1130
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
datasets
Geospatial or benchmark datasets
testing
Continuous integration testing
labels
Feb 21, 2023
adamjstewart
changed the title
EuroSAT100: add new dataset
EuroSAT100: add new dataset/datamodule
Feb 21, 2023
calebrob6
reviewed
Feb 23, 2023
Could not reproduce the 1 second download, it took my laptop 1.971 seconds. |
I tried the download several times and it's always between 0.8s and 1.8s on my laptop. Either way, drastic improvement over the previous 3m download for EuroSAT and 10+ min download for Cyclone. |
calebrob6
approved these changes
Feb 23, 2023
That's more like it! We emphasize test metric distributions in torchgeo! |
calebrob6
pushed a commit
that referenced
this pull request
Apr 10, 2023
* EuroSAT100: add new dataset * Fix type hints * Add EuroSAT100DataModule * Isort and test fixes * Add disclaimer, remove duplicate code
yichiac
pushed a commit
to yichiac/torchgeo
that referenced
this pull request
Apr 29, 2023
* EuroSAT100: add new dataset * Fix type hints * Add EuroSAT100DataModule * Isort and test fixes * Add disclaimer, remove duplicate code
Merged
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
datamodules
PyTorch Lightning datamodules
datasets
Geospatial or benchmark datasets
testing
Continuous integration testing
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What?
This is a subset of the EuroSAT MS dataset containing only 100 images. Yes, you heard that right, 100 images. This dataset maintains the 60-20-20 train-val-test split from https://arxiv.org/abs/1911.06721. There are 10 images (6 train, 2 val, 2 test) for all 10 classes, for a total of 100 images.
But why?
Our tutorials download a lot of data. So much in fact that we crash the meager 14 GB SSD in our GitHub Actions runners, causing our notebook tests to fail for the last year or so. Between NAIP, Chesapeake, Sentinel, EuroSAT, and TropicalCyclone, we're talking several minutes of downloads. The TropicalCyclone download doesn't even work unless you sign up for an API key (#1074). A quick comparison:
Yes, 1 second, down to the millisecond:
Our tutorials should not take several minutes just on data prep, nor should they fill up your hard drive, nor should they require an API key.
Why EuroSAT?
EuroSAT was chosen for the following reasons:
The milestone says 0.4.1 but versionadded says 0.5...
Technically, new features shouldn't be added in patch releases. But this may fix our chronically failing notebook tests. So I would like to add this to the next patch release for the sake of testing, but we can just pretend that it was actually added in 0.5.
Hey, you forgot to update the docs!
I'll open a separate PR to add this to the docs that we can save for the 0.5 release.