You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@calebrob6 flagged an issue in merging #510 that the utils.download_radiant_mlhub_collection(...) method is resulting in a FileNotFoundError. This is because the method in torchgeo.utils is expecting the radiant_mlhub.Dataset.download(...) method to fetch a .tar.gz archive of the entire dataset or for an individual collection. As of v0.5.0+ of the radiant-mlhub Python Client, this is no longer the case, there are no .tar.gz archives downloaded except for the archive that contains the STAC catalog. This change was made to support asset level filtering and downloading (please see https://github.com/radiantearth/radiant-mlhub/pull/104).
As is the case for all MLHub dataset classes in Torchgeo (BeninSmallHolderCashews, CV4AKenyaCropType, TropicalCycloneWindEstimation, NASAMarineDebris), they anticipate an archive of the entire dataset, or archives of each collection. For example with NASAMarineDebris, similar to CloudCoverDetection, these are assigned as class properties for the archives and their md5 checksums:
Proposed Short Term Solution:
It seems the easiest solution is to fix the version of the radiant-mlhub Python Client to v0.4.x, prior to the release which introduced the bug in torchgeo.
Long Term Solution:
Ideally any existing and new torchgeo datasets that are sourced from Radiant MLHub will be updated to reflect the latest Dataset.download(...) functionality. This is the sequence of events from calling that method now:
Download the catalog archive with all STAC objects for the dataset
Uncompress the STAC catalog archive directory
Load the STAC catalog into a Sqlite database on disk
Apply spatial, temporal and band filters to the SQL table
Construct list of assets to download from query
Download all assets locally into catalog directory
Steps to reproduce
Run the following code to reproduce:
fromtorchgeo.datasetsimportBeninSmallHolderCashewscashews=BeninSmallHolderCashews(
root='/data',
bands=('B02','B03','B04'),
download=True
)
# the same result will happen with CV4AKenyaCropType, TropicalCycloneWindEstimation, NASAMarineDebris, CloudCoverDetection
Version
0.4.0.dev0
The text was updated successfully, but these errors were encountered:
It seems the easiest solution is to fix the version of the radiant-mlhub Python Client to v0.4.x, prior to the release which introduced the bug in torchgeo.
Fixed in 51b4d6d (prob should have opened a PR first but accidentally pushed to main). Unfortunately I don't know of an easy way to test this. We try to avoid tests that require internet access, so we currently monkeypatch Dataset.download() to simply copy a local file. That's why our tests didn't catch the bug.
@KennSmithDS any updates on a Radiant MLHub dataset? We're planning a 0.4.0 release by the end of the month and would love to see support for the latest version of radiant-mlhub. Not sure what you're work schedule is like before the holidays.
Hi @adamjstewart sorry for the belated response. We have been in a mad push during Q4, especially during November and December to finish updating all of our STAC catalogs to be in compliance with the STAC specifications, and add in metadata/extensions where they weren't used before. Then I was on holiday break, and today is my last day at Radiant Earth. I will not be able to continue supporting the development of this dataset class for Radiant MLHub's datasets going forward, but look forward to contributing to torchgeo in other capacities.
Description
@calebrob6 flagged an issue in merging #510 that the
utils.download_radiant_mlhub_collection(...)
method is resulting in aFileNotFoundError
. This is because the method intorchgeo.utils
is expecting theradiant_mlhub.Dataset.download(...)
method to fetch a .tar.gz archive of the entire dataset or for an individual collection. As of v0.5.0+ of theradiant-mlhub
Python Client, this is no longer the case, there are no .tar.gz archives downloaded except for the archive that contains the STAC catalog. This change was made to support asset level filtering and downloading (please see https://github.com/radiantearth/radiant-mlhub/pull/104).As is the case for all MLHub dataset classes in Torchgeo (BeninSmallHolderCashews, CV4AKenyaCropType, TropicalCycloneWindEstimation, NASAMarineDebris), they anticipate an archive of the entire dataset, or archives of each collection. For example with NASAMarineDebris, similar to CloudCoverDetection, these are assigned as class properties for the archives and their md5 checksums:
Proposed Short Term Solution:
It seems the easiest solution is to fix the version of the
radiant-mlhub
Python Client to v0.4.x, prior to the release which introduced the bug in torchgeo.Long Term Solution:
Ideally any existing and new torchgeo datasets that are sourced from Radiant MLHub will be updated to reflect the latest
Dataset.download(...)
functionality. This is the sequence of events from calling that method now:Steps to reproduce
Version
0.4.0.dev0
The text was updated successfully, but these errors were encountered: