-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store projection metadata in rtree index of GeoDataset #411
Conversation
Creating a namedtuple called 'GeoMetaData' that stores the original filepath and crs of the geodataset.
…perly" This reverts commit ca9b3e7.
This looks good to me / I don't see why we wouldn't want this. |
Yes, I'm hoping that this can get extended to store other bits of metadata in the future (e.g. spatial/radiometric resolution, % cloud cover, etc)! But the key thing is to have a path forward to resolve the big elephant in the room - #278/#409 I can do a rebase/merge from main to bring this branch up to speed, is there anything else in the implementation that you think could be improved? |
If we want the possibility of storing additional metadata we def don't want to use a namedtuple, a dict would be better. Not all datasets will have things like cloud cover. |
Ok, cloud cover was definitely a bad example. But CRS and things like resolution would be mostly universal for raster datasets. Main difference between a namedtuple/dataclass and dict is the way attributes are accessed. Namedtuple/dataclass uses dot something (allows tab completion): |
For
All of the attribute access is internal to TorchGeo, users will almost never use this themselves. The more important difference to me is whether or not features can be optional and whether or not type hints are supported. Not trying to shut down any of the ideas here, just playing devil's advocate for how this data structure could fail. |
Ok, I see where you're coming from now. I had the impression that dataclass attributes could simply use |
Oh wait, I just re-read your comment a bit closely, you prefer a |
I don't think TypedDict supports optional keys so a regular dict would be better. |
…ata" This reverts commit 847998e.
Ok, and seems like TypedDict isn't available on Python 3.7 either. Reverted in 801d09c. |
Can you rebase to run the new Python 3.10 and minimum version tests? |
Actually, let me see if closing and reopening will run the new tests... |
That ran the new tests, but you'll need to rebase or add a merge commit to incorporate the Sphinx changes to fix the RtD test. Also, it looks like there is a problem with the new tests we added to test things with the minimum version of our dependencies that we support. Looks like the Happy to make these changes for you if you're busy but I'll need push access to your branch. |
After a bit more thought, I think I'll have to agree with the comment made in #409 (comment) that this PR won't make sense unless we sort out the downstream tasks of actually how to make use of the stored CRS information. I'll close this PR as I'm very low on bandwidth for the next two months and won't be able to sort out the merge conflicts anytime soon. Maybe someone can revisit this and/or come up with a better implementation later. |
Currently the GeoDataset's rtree index stores only the filepath to the file the data was loaded from (e.g. the geotiff or vector file). This pull request expands that to store the projection (CRS) information also. The filepath and crs are stored using a Python dictionary so that more metadata fields can be added in the future.
Note that this PR is mostly standalone and can be merged to handle implementation points 1 and 2 of #409.