Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ETCI2021 Dataset #119

Merged
merged 12 commits into from
Sep 12, 2021
Merged

Conversation

isaaccorley
Copy link
Collaborator

  • Adds torchgeo.datasets.ETCI2021 from the ETCI 2021 Flood Detection Challenge
  • Updated docs
  • Added tests/datasets/test_etci2021.py
  • Added sample data for tests under tests/data/etci2021/

Questions:

  • If the dataset splits (train/val/test) can be downloaded as separate files from google drive (train.zip, val.zip, test.zip), should we only download the file for the desired split or should we download the entire dataset?
  • The dataset has 2 masks (water body and flood). For the train/val splits I load them into a mask of shape (2, h, w). However, the test set flood masks were never released so the water mask is loaded (h, w). Does this seem right or should I load add the additional dimension (1, h, w) for consistency?

Closes #118

@adamjstewart
Copy link
Collaborator

should we only download the file for the desired split or should we download the entire dataset?

Just that split

Does this seem right or should I load add the additional dimension (1, h, w) for consistency?

Prob (1, h w)

@isaaccorley
Copy link
Collaborator Author

I refactored the filenames, urls, md5s, and directories into a metadata attribute which can be indexed by the split. Let me know if you have a more standardized approach to this already. Also updated per your suggestions.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Sep 10, 2021
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, just had a few minor comments.

torchgeo/datasets/etci2021.py Outdated Show resolved Hide resolved
torchgeo/datasets/etci2021.py Outdated Show resolved Hide resolved
torchgeo/datasets/etci2021.py Outdated Show resolved Hide resolved
torchgeo/datasets/etci2021.py Show resolved Hide resolved
torchgeo/datasets/etci2021.py Show resolved Hide resolved
adamjstewart
adamjstewart previously approved these changes Sep 10, 2021
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
@adamjstewart
Copy link
Collaborator

Looks like there are conflicts preventing this from being rebased and merged. Personally I never use merge commits for this very reason. I always rebase manually to resolve merge conflicts. For now I'll squash and merge.

@adamjstewart adamjstewart merged commit 67f7d8a into microsoft:main Sep 12, 2021
@isaaccorley isaaccorley deleted the datasets/etci2021 branch September 12, 2021 02:26
Comment on lines +255 to +256
if os.path.exists(os.path.join(self.root, "__MACOSX")):
shutil.rmtree(os.path.join(self.root, "__MACOSX"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isaaccorley is this strictly needed? It's currently not covered by our unit tests and I'm wondering if we can just remove it.

@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* add dataset to docs

* add sample test data

* add dataset unit tests

* add etci2021 dataset

* updated tests

* updated dataset to download only desired split file

* removed flood mask from file list for test set and other formatting

* Update torchgeo/datasets/etci2021.py

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* fixed doc formatting

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ETCI2021 Dataset
2 participants