Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RasterDataset: res does not change when crs changes #2190

Closed
erob-archim opened this issue Jul 24, 2024 · 12 comments · Fixed by #2193
Closed

RasterDataset: res does not change when crs changes #2190

erob-archim opened this issue Jul 24, 2024 · 12 comments · Fixed by #2193
Assignees
Labels
datasets Geospatial or benchmark datasets

Comments

@erob-archim
Copy link

erob-archim commented Jul 24, 2024

Description

It is possible to create an invalid combination of GridGeoSampler and RasterDataset. This is because .res can be in units which do not match the CRS, specifically when the CRS is angular (e.g. EPSG:4326) rather than a planar grid.

It seems like this was somewhat known when pixel space sizes was introduced (#279 (comment)).

I think these are the relevant lines

Steps to reproduce

import os
import tempfile

from torch.utils.data import DataLoader
from torchgeo.datasets import NAIP, stack_samples
from torchgeo.datasets.utils import download_url
from torchgeo.samplers import GridGeoSampler
from torchgeo.samplers.constants import Units

naip_root = os.path.join(tempfile.gettempdir(), "naip")
naip_url = (
    "https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/"
)
tile = "m_3807511_ne_18_060_20181104.tif"
download_url(naip_url + tile, naip_root)

naip = NAIP(naip_root, crs="EPSG:4326")  # This doesn't work
# naip = NAIP(naip_root)  # This works

sampler = GridGeoSampler(naip, size=(1000, 1000), stride=(1000, 1000), units=Units.PIXELS)
dataloader = DataLoader(naip, sampler=sampler, collate_fn=stack_samples)

first_batch = next(iter(dataloader))
print(first_batch["crs"])
print(first_batch["bbox"])
print(first_batch["image"].shape)

Version

0.5.2

@adamjstewart adamjstewart added datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets labels Jul 25, 2024
@adamjstewart
Copy link
Collaborator

adamjstewart commented Jul 25, 2024

Thanks for the code to reproduce, this is extremely useful! Looking into this now. Discoveries I've made so far:

  • It isn't specific to GridGeoSampler, RandomGeoSampler also fails
  • The reason for the failure is that the sampler and data loader are of length 0
  • Length is 0 because CRS changes from meter units to degree units but res does not change
  • It works if you use GDAL to reproject manually

@adamjstewart
Copy link
Collaborator

adamjstewart commented Jul 25, 2024

Current workaround is to specify the resolution yourself:

naip = NAIP(naip_root, crs="EPSG:4326", res=6.052321988618891e-06)

Let me see if I can figure out how to compute the new resolution automatically.

@adamjstewart
Copy link
Collaborator

For RasterDataset, I think the ideal situation (depending on user input) would be:

crs=CRS crs=None
res=RES set both set res, detect crs
res=None set crs, calculate res detect both

The only thing we're currently missing is "calculate res", which led to the issue you saw. Note that this does not apply to VectorDataset where res=None is not allowed. I don't think this matters for Intersection/UnionDataset but this is worth thinking about too.

@adamjstewart
Copy link
Collaborator

Another related bug is that changing the CRS after initialization also doesn't update res (not sure if it should though):

naip = NAIP(naip_root)
print(naip.crs, naip.res)
naip.crs = "EPSG:4326"
print(naip.crs, naip.res)  # crs changed, but not res

@adamjstewart
Copy link
Collaborator

I will probably ignore this related bug until someone complains about it. I don't know off the top of my head how to compute resolution in the new coordinate system, and it seems like it depends on the location, which we know for a single image, but not for the entire R-tree.

@adamjstewart adamjstewart removed the samplers Samplers for indexing datasets label Jul 26, 2024
@adamjstewart adamjstewart changed the title GridGeoSampler doesn't work as expected for datasets with angular CRS RasterDataset: res does not change when crs changes Jul 26, 2024
@erob-archim
Copy link
Author

Perhaps there shouldn't be a recalculation in this case, but using Units.PIXELS should raise an Exception when the resolution is invalid?

@adamjstewart
Copy link
Collaborator

I just don't know how to tell if the resolution is invalid.

@erob-archim
Copy link
Author

Any change from a Cartesian to an angular crs?

@adamjstewart
Copy link
Collaborator

But what range of resolutions are valid for Cartesian and what range of resolutions are valid for angular? res is just a float.

@erob-archim
Copy link
Author

For any resolution for a cartesian crs, there is no simple analogue in a angular crs, right?
So my suggestion would be that any time we detect a change between those types, we should null the resolution.

i.e.

def set_crs(self, crs):
    old_crs = self.crs
    self.crs = crs
    
    old_system = old_crs.coordinate_system.name
    new_system = self.crs.coordinate_system.name

    if old_system == "cartesian" and new_system == "ellipsoidal":
        self.res = None

@adamjstewart
Copy link
Collaborator

Note to self: crs.coordinate_system.name only works for pyproj, not rasterio. We currently use rasterio, but we could certainly switch to pyproj. It's already a dependency anyway.

I'm a little worried about self.res = None. Things like data loading should work (rasterio.merge.merge supports res=None) but other stuff like IntersectionDataset will probably break. I would be more comfortable with a warning message ("Hey, you might want to change your res too"). It won't be easy to tell if the user is changing crs&res or just crs without some hacking.

I think I'm still leaning towards ignoring this until someone complains, but I could be persuaded.

@erob-archim
Copy link
Author

erob-archim commented Jul 31, 2024

Happy with your decision to ignore this, but in case you come back to it some day, the check could be for units in the rasterio CRS. They're given in CRS.units_factor.
e.g.

CRS.from_epsg(32631).units_factor
# ('metre', 1.0)

CRS.from_epsg(4326).units_factor
# ('degree', 0.017453292519943295)

@adamjstewart adamjstewart modified the milestones: 0.5.3, 0.6.0 Aug 6, 2024
@adamjstewart adamjstewart removed this from the 0.6.0 milestone Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants