Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new datasets from remote (cloud, ftp, etc.) stores #296

Closed
4 tasks
hboisgon opened this issue Mar 3, 2023 · 7 comments
Closed
4 tasks

Support new datasets from remote (cloud, ftp, etc.) stores #296

hboisgon opened this issue Mar 3, 2023 · 7 comments
Assignees
Labels
Blocked An issue that cannot be progressed right now Datasets request to update or add new datasets
Milestone

Comments

@hboisgon
Copy link
Contributor

hboisgon commented Mar 3, 2023

This is a general issue to list and investigate interesting datasets from remote stores (cloud, ftp, etc.) to support with HydroMT.
You can edit this post directly to update the list or add below in comments.
To request a new dataset, add the name and link to where the dataset can be found.
For data that needs to be downloaded please use the separate issue #295

Datasets:

@hboisgon
Copy link
Contributor Author

hboisgon commented Jun 1, 2023

Put on high priority especially for ERA5

@savente93
Copy link
Contributor

Move to discussion? Then have separate issues for data sets we want to integrate

@savente93
Copy link
Contributor

@savente93

@alimeshgi alimeshgi added the Needs refinement issue still needs refinement label Oct 5, 2023
@alimeshgi alimeshgi assigned alimeshgi and savente93 and unassigned alimeshgi Oct 5, 2023
@savente93 savente93 removed the Needs refinement issue still needs refinement label Oct 5, 2023
@savente93 savente93 added this to the Q4 milestone Oct 5, 2023
@savente93 savente93 added the Blocked An issue that cannot be progressed right now label Oct 9, 2023
@savente93
Copy link
Contributor

I have tried to look into this issue, but the datasets linked have issues preventing them from being translated directly into data catalog entries. For example ERA5 does not have a standard CRS and might need regridding, and when I try to create the esa worldcover into a dataset, I get semantic errors such as MergeError: Geotransform and/or shape do not match that don't mean anything to me, so I will need help from a hydrologist to progress with this

@savente93 savente93 removed the Blocked An issue that cannot be progressed right now label Oct 10, 2023
@savente93
Copy link
Contributor

savente93 commented Oct 13, 2023

  • esa worldcover v200 This does not have a VRT file, so custom logic would be needed to map bounding boxes to the correct files. This could be done by one of the custom drivers, but is out of scope for this issue.
  • ERA5 (GS) - ERA5 uses a gausian grid that isn't regular, so to include it we'd either have to add support for mesh catalog entries to be read through remote files, or regrid it outselves, both of which are out of scope for this issue.
  • Copernicus DEM (S3) Same as ESA same as esa worldcover
  • CHIRPS Africa (S3) Multiple tiff files segmented by time rather than spatial index. More processing would be needed to be able to query this along spatial dementions. Probably some zarr conversion

Not quite sure where this leaves us in terms of moving forward. @DirkEilander Given our discussions earlier, I don't think we can move forward on any of these at the moment. I think it would be best to create tickets of what would need to be done to add these, and either close this so we can make new ones when those conditions are met, or take this out of the backlog until such time.

@savente93 savente93 added the Blocked An issue that cannot be progressed right now label Oct 13, 2023
@DirkEilander
Copy link
Contributor

Thanks for looking into this @savente93. There seems to be a general case of tiled data without a vrt file (esa worldcover and copernicus). This could potentially be solved by a new tiled_raster driver to read and merge this type of data. This would also benefit many datasets that are available on the Delft-Dashboard file server, see #52. The other datasets would require custom drivers which I think is currently out of scope for the core. If you all agree we can open a new issue for that new driver.

@savente93
Copy link
Contributor

Yeah, I think that's the correct way to go, so I'll be closing this one, I'll try to make a new issue for the new driver somewhere today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked An issue that cannot be progressed right now Datasets request to update or add new datasets
Projects
None yet
Development

No branches or pull requests

4 participants