Zero area intersections in IntersectionDataset result in unexpected dataset lengths #1270

calebrob6 · 2023-04-20T20:18:11Z

Issue

We create an IntersectionDataset like this:

train_image_ds = RasterDataset(
    'data/processed/images/',
)
train_mask_ds = RasterDataset(
    'data/processed/masks/',
)
train_mask_ds.is_image = False
train_ds = train_image_ds & train_mask_ds

Here both train_image_ds and train_mask_ds have length of 22, and cover the exact same spatial areas (i.e. there is a 1-to-1 pairing between a tile in train_image_ds and a tile in train_mask_ds). It looks something like this:

The issue is that train_ds unexpectedly has a length of 140. Specifically, the merged index has 140 entries, however only 22 of them (as expected) have an area > 0. I'm guessing this is why we filter out intersection areas with area <= 0 in the samplers, but don't remember the details!

I recommend that we filter areas of intersection with area 0 when merging datasets.

The text was updated successfully, but these errors were encountered:

adamjstewart · 2023-04-20T21:09:23Z

Related to #737, #319, #376, etc.

The reason for this issue is that rtree considers two bounding boxes to be overlapping even if the area of overlap is 0.

It isn't hard to add a check for this and remove them from the intersection, or from the sampler. The reason we haven't done this already is that some datasets have 0 area on purpose. We have several point GeoDatasets, including GBIF, iNaturalist, and EDDMapS, and I have plans to add others for air pollution as well. I'm not actively using these datasets, and I'm not even sure if our builtin samplers would be useful for these kinds of datasets, but that's the reason things are the way they are. I would be open to changing this, but would need to think about how else we could use point datasets without 0 area files. Could add a parameter to control this I suppose.

calebrob6 · 2023-04-20T21:15:46Z

Just clarified the title to emphasize that the problem is that the reported length of the IntersectionDataset does not match the expected length which is confusing to users.

adamjstewart · 2024-04-04T13:37:40Z

@yichiac has this same problem in his dataset

calebrob6 added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets and removed documentation Improvements or additions to documentation labels Apr 20, 2023

calebrob6 changed the title ~~Zero area intersections in IntersectionDataset~~ Zero area intersections in IntersectionDataset result in unexpected dataset lengths Apr 20, 2023

adamjstewart self-assigned this Apr 4, 2024

adamjstewart added this to the 0.6.0 milestone Apr 4, 2024

adamjstewart mentioned this issue Apr 4, 2024

IntersectionDataset: ignore 0 area overlap #1985

Merged

adamjstewart closed this as completed in #1985 Apr 19, 2024

adamjstewart removed this from the 0.6.0 milestone Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero area intersections in IntersectionDataset result in unexpected dataset lengths #1270

Zero area intersections in IntersectionDataset result in unexpected dataset lengths #1270

calebrob6 commented Apr 20, 2023 •

edited

Loading

adamjstewart commented Apr 20, 2023

calebrob6 commented Apr 20, 2023 •

edited

Loading

adamjstewart commented Apr 4, 2024

Zero area intersections in IntersectionDataset result in unexpected dataset lengths #1270

Zero area intersections in IntersectionDataset result in unexpected dataset lengths #1270

Comments

calebrob6 commented Apr 20, 2023 • edited Loading

Issue

adamjstewart commented Apr 20, 2023

calebrob6 commented Apr 20, 2023 • edited Loading

adamjstewart commented Apr 4, 2024

calebrob6 commented Apr 20, 2023 •

edited

Loading

calebrob6 commented Apr 20, 2023 •

edited

Loading