Samplers not completely respecting RoI #260

robertomest · 2021-11-23T19:30:11Z

Hi Folks,
I've been using torchgeo for loading data and it's working really well. There seems to be a bug in the samplers (verified it in both RandomGeoSampler and GridGeoSampler). When creating a sampler with an RoI, the sampler currently only uses the RoI to select among the available files in the rtree index. If one of the files goes beyond the RoI, that region will be sampled as well. This is specially problematic in datasets composed of large tiffs (like CDL).

Example:

from torchgeo.samplers import RandomGeoSampler
from torchgeo.datasets import CDL
from torchgeo.datasets import BoundingBox
from shapely import geometry as shpg
import geopandas as gpd

dataset = CDL("/tmp/cdl2")
minx, maxx, miny, maxy, mint, maxt = dataset.bounds
roi = BoundingBox(minx, (minx + maxx) / 2, miny, (miny + maxy) / 2, mint, maxt)
sampler = RandomGeoSampler(dataset, size=1e5, length=200, roi=roi)

dataset_bounds = gpd.GeoSeries(shpg.box(minx, miny, maxx, maxy))
roi_bounds = gpd.GeoSeries(shpg.box(minx, miny, (minx + maxx) / 2, (miny + maxy) / 2))
# Sample some bounding boxes
samples = gpd.GeoSeries([shpg.box(b[0], b[2], b[1], b[3]) for b in sampler])

ax = dataset_bounds.boundary.plot(color="black")
roi_bounds.boundary.plot(ax=ax, color="green")
samples.boundary.plot(ax=ax, color="red")

I think the problem would be fixed by computing the sampling bounds as the intersection of the hit bounds and the roi

bounds = intersection(BoundingBox(*hit.bounds), self.roi)

Let me know if you would like me to open a PR and help out on this.

The text was updated successfully, but these errors were encountered:

adamjstewart · 2021-11-23T19:43:52Z

Duplicate of #149, will be fixed by #144

calebrob6 · 2021-11-23T20:21:41Z

@robertomest, thanks for opening an issue! Please let us know if you run into any others or want to contribute in another way. I'll ping you here when #144 is merged.

adamjstewart · 2021-11-27T05:02:44Z

@robertomest #144 is now ready if you want to test it out. You can clone the repo, check out the feature/zipdatasets branch, and run your code from the same directory or add the directory to your PYTHONPATH. Let me know if you notice any bugs!

robertomest closed this as completed Nov 23, 2021

adamjstewart mentioned this issue Nov 23, 2021

Overhaul BoundingBox and ZipDataset classes #144

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samplers not completely respecting RoI #260

Samplers not completely respecting RoI #260

robertomest commented Nov 23, 2021

adamjstewart commented Nov 23, 2021

calebrob6 commented Nov 23, 2021

adamjstewart commented Nov 27, 2021

Samplers not completely respecting RoI #260

Samplers not completely respecting RoI #260

Comments

robertomest commented Nov 23, 2021

adamjstewart commented Nov 23, 2021

calebrob6 commented Nov 23, 2021

adamjstewart commented Nov 27, 2021