Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NASAMarineDebris Dataset download process changed #1101

Closed
SpontaneousDuck opened this issue Feb 9, 2023 · 2 comments · Fixed by #1102
Closed

NASAMarineDebris Dataset download process changed #1101

SpontaneousDuck opened this issue Feb 9, 2023 · 2 comments · Fixed by #1102
Labels
dependencies Packaging and dependencies

Comments

@SpontaneousDuck
Copy link
Contributor

Description

It appears like the format of what is downloaded by radiant_mlhub is different now and the extracted datasets are downloaded directly instead of the tarballed datasets. When torchgeo.Datasets.NASAMarineDebris.__init__() calls self._verify(), the verification fails because the files it checks for (filenames = ["nasa_marine_debris_source.tar.gz", "nasa_marine_debris_labels.tar.gz"]) are not being created and their contact jsut downloaded directly.

Steps to reproduce

  1. Try to instantiate NASAMarineDebris with download=True
from torchgeo.datasets import NASAMarineDebris
NASAMarineDebris(root='test', download=True, api_key = "key")
  1. The following error occurs after downloading:
File torchgeo/datasets/nasa_marine_debris.py:86, in NASAMarineDebris.__init__(self, root, transforms, download, api_key, checksum, verbose)
     84 self.checksum = checksum
     85 self.verbose = verbose
---> 86 self._verify()
     87 self.files = self._load_files()

File torchgeo/datasets/nasa_marine_debris.py:216, in NASAMarineDebris._verify(self)
    214 for filename in self.filenames:
    215     filepath = os.path.join(self.root, filename)
--> 216     extract_archive(filepath)

File torchgeo/datasets/utils.py:124, in extract_archive(src, dst)
    122 for suffix, extractor in suffix_and_extractor:
    123     if src.endswith(suffix):
--> 124         with extractor(src, "r") as f:
    125             f.extractall(dst)
    126         return

File lib/python3.10/tarfile.py:1632, in TarFile.open(cls, name, mode, fileobj, bufsize, **kwargs)
   1630     saved_pos = fileobj.tell()
...
--> 174     fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
    175 if filename is None:
    176     filename = getattr(fileobj, 'name', '')
  1. Inspecting what is downloaded and the code, when the download code is called:
    collection = radiant_mlhub.Collection.fetch(collection_id, api_key=api_key)
    collection.download(output_dir=download_root, api_key=api_key)

    The below tree is created:
test
├── nasa_marine_debris
│   ├── catalog.json
│   ├── err_report.csv
│   ├── mlhub_stac_assets.db
│   ├── nasa_marine_debris_labels
│   └── nasa_marine_debris_source
└── nasa_marine_debris.tar.gz

Version

0.4.0

@adamjstewart
Copy link
Collaborator

Yes, TorchGeo is not compatible with Radiant MLHub 0.5+, see #711. If you install torchgeo[datasets], the correct version should be installed, but if you manually install radiant-mlhub on your own, you may get an incompatible version. Quick solution is to downgrade:

$ pip install radiant-mlhub<0.5

We had some Radiant Earth folks who were contributing to TorchGeo but one of them moved on to greener pastures. If anyone would like to submit a PR to add support for Radiant MLHub 0.5+, I would love to review it!

@adamjstewart adamjstewart added the dependencies Packaging and dependencies label Feb 9, 2023
@SpontaneousDuck
Copy link
Contributor Author

Great quick fix! I'll work on the code to update it to work with the new structure as well but that will probably end up being tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Packaging and dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants