Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BigEarthNet dataset #197

Merged
merged 7 commits into from
Oct 17, 2021

Conversation

isaaccorley
Copy link
Collaborator

@isaaccorley isaaccorley commented Oct 13, 2021

  • Adds torchgeo.datasets.BigEarthNet
  • Recently BigEarthNet was updated with coregistered Sentinel-1 patches to the original Sentinel-2 patches referred to as BigEarthNet-MM. This dataset handles loading either Sentinel-1 bands, Sentinel-2 bands, or both.

Notes:

  • I'm not using Radiant MLHub's formatted version of the dataset because they don't include the Sentinel-1 patches which were included to create the BigEarthNet-MM dataset. See bigearth.net for more info.
  • BigEarthNet._load_paths is really not elegant but it's due to the mapping between Sentinel-1 -> Sentinel-2 patches being in the individual json file for each Sentinel-1 patch. I'm computing the mapping on indexing and not on instantiation of the dataset because opening 590k json files is too time consuming for instantiating a dataset.
  • I made non georeferenced dummy images for tests so rasterio throws a warning during tests related to this

Closes #63

@isaaccorley isaaccorley marked this pull request as draft October 13, 2021 22:46
@isaaccorley isaaccorley marked this pull request as ready for review October 16, 2021 02:56
@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Oct 16, 2021
@adamjstewart adamjstewart merged commit 7d1ff80 into microsoft:main Oct 17, 2021
@isaaccorley isaaccorley deleted the datasets/bigearthnet branch October 17, 2021 16:21
@adamjstewart
Copy link
Collaborator

I see the following warnings in our tests:

tests/datasets/test_bigearthnet.py::TestBigEarthNet::test_getitem[all]
tests/datasets/test_bigearthnet.py::TestBigEarthNet::test_getitem[s1]
tests/datasets/test_bigearthnet.py::TestBigEarthNet::test_getitem[s2]
  /opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/rasterio/__init__.py:220: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix be returned.
    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)

Can you look into this? It seems like the test files are not actually GeoTIFFs but we're still using rasterio to load them. Are the actual data files GeoTIFFs?

@isaaccorley
Copy link
Collaborator Author

isaaccorley commented Oct 26, 2021

They are just dummy tiffs without georeferenced metadata. The actual files are geotiffs.

@adamjstewart
Copy link
Collaborator

Okay, so we just need to create real GeoTIFFs using https://github.com/microsoft/torchgeo/tree/main/tests/data#raster-data for these tests. Can you do that?

@isaaccorley
Copy link
Collaborator Author

Yes, I'll make a PR updating them.

@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* add bigearthnet dataset

* add dummy data for bigearthnet tests

* add bigearthnet unit tests

* updated bigearthnet dataset and tests with s1 imagery

* add bigearthnet to docs

* mypy fixes

* updated docstrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigEarthNet dataset
3 participants