ValueError: empty range for randrange() #319

RitwikGupta · 2021-12-21T09:24:35Z

When using RandomBatchGeoSampler, 50% of the time the following error will occur. With no code change, this runs perfectly fine the other 50% of the time.

code:

sampler = RandomBatchGeoSampler(ds, size=1024, batch_size=5, length=5 * 5)
dl = DataLoader(ds, batch_sampler=sampler, collate_fn=stack_samples)

for idx, batch in enumerate(dl):
    for idx_s, image in enumerate(batch['image']):
        image = torch.squeeze(image)

error:

  File "/shared/ritwik/miniconda3/envs/dino/lib/python3.7/site-packages/torchgeo/samplers/batch.py", line 115, in __iter__
    bounding_box = get_random_bounding_box(bounds, self.size, self.res)
  File "/shared/ritwik/miniconda3/envs/dino/lib/python3.7/site-packages/torchgeo/samplers/utils.py", line 49, in get_random_bounding_box
    minx = random.randrange(int(width)) * res + bounds.minx
  File "/shared/ritwik/miniconda3/envs/dino/lib/python3.7/random.py", line 190, in randrange
    raise ValueError("empty range for randrange()")
ValueError: empty range for randrange()

The text was updated successfully, but these errors were encountered:

adamjstewart · 2021-12-21T17:19:34Z

What version of TorchGeo are you using that gets this bug, 0.1.2 or main? Also, can you share a minimum reproducing script?

RitwikGupta · 2021-12-22T00:49:44Z

main, 0.2.0dev0. And I don't think the test script is really meaningful without the backing data. it's as simple as:

# Custom RasterDataset defined simply with the filename_glob
ds = WatchRaster(root=Path('/home/ritwik/dataset/'))

sampler = RandomBatchGeoSampler(ds, size=1024, batch_size=5, length=5 * 5)
dl = DataLoader(ds, batch_sampler=sampler, collate_fn=stack_samples)


fig, axs = plt.subplots(5, 5)
for idx, batch in enumerate(dl):
    for idx_s, image in enumerate(batch['image']):
        image = torch.squeeze(image)
        axs[idx, idx_s].imshow(image, cmap='inferno')
        axs[idx, idx_s].axis('off')

adamjstewart · 2021-12-23T21:35:02Z

Can't reproduce with any of the data I have lying around, and obviously none of our CI tests hit this bug. What CRS/units are you files in? How large is 1024 in your CRS? Are any of your files very small?

I can't really debug this if I can't reproduce it. You can try creating an R-tree with the bounding box of some fake files, then I wouldn't actually need the data to reproduce. There are some examples of this in our unit tests.

calebrob6 · 2021-12-24T23:09:39Z

The trace from above means that width (https://github.com/microsoft/torchgeo/blob/main/torchgeo/samplers/utils.py#L48) is <= 0.

tritolol · 2022-01-18T14:04:44Z

I can provide data to reproduce this.

data_root = "dops/"
dop_ds = DopData(data_root)

data_root = "dsms/"
dsm_ds = DsmData(data_root)

intersect_ds = dop_ds & dsm_ds

sampler = RandomBatchGeoSampler(intersect_ds, size=100, batch_size=8, length=100)
dl = DataLoader(intersect_ds, batch_sampler=sampler, collate_fn=stack_samples)

for sample in dl:
    image = sample['image']
    print(image.shape)

DopData and DsmData are two simple custom RasterDatasets with the corresponding filename_glob.
I attached the raster files that cause the issue (sorry for multi plart zip, github doesn't accept a large zip).

Simply put the files starting with "dop" in the "dops" folder and the others in the "dsms" folder.

Data removed, see new comment.

adamjstewart · 2022-01-18T17:13:30Z

Looks like the UNIX zip tool doesn't support multi-part zip files. Can you upload separate zip files containing only some of the files? Or maybe just the single GeoTIFF needed to reproduce the issue? Also, can you post the code defining DopData and DsmData?

tritolol · 2022-01-19T08:08:56Z

Here are the files zipped individually. I had to downsample two of the raster files and the error still remains.
The for-loop now iterates from one to four times before the error triggers.
All of the files are required to reproduce the error.

dsm_data.py:

from torchgeo.datasets import RasterDataset

class DsmData(RasterDataset):
    filename_glob = "*.tif"

dop_data.py:

from torchgeo.datasets import RasterDataset

class DopData(RasterDataset):
    filename_glob = "*.tif"

dop10rgbi_32_338_5677_1_nw_0.5.zip
dop10rgbi_32_338_5678_1_nw_0.5.zip
ndom50_32338_5677_1_nw_2019.zip
ndom50_32338_5678_1_nw_2019.zip

adamjstewart · 2022-01-19T17:36:17Z

Thanks, I can reproduce this now. Will let you know when I figure out what's happening here.

adamjstewart · 2022-01-19T17:55:04Z

I think there are two issues here:

Our intersection logic considers tile A and tile B to intersect when A.maxx == B.minx even though the area of the intersection is 0
Our sampler logic doesn't work if the area of intersection is less than the area of the bounding box to sample

1 is easy to solve (replace <=/>= with </>), but 2 is a bit harder to solve. It's important to solve 2 (not just 1) because some tiles may have a very small non-zero region of overlap.

I'll submit a PR to fix this and test it by this afternoon. Thanks for the bug report!

adamjstewart · 2022-01-19T19:42:11Z

1 might not actually be that simple. The intersection logic used to decide how two R-trees are combined comes from rtree, not torchgeo, and rtree considers bounding boxes with a shared edge as having an intersection. We don't want to stray too far from rtree because that will potentially introduce additional bugs. We could forbid bounding boxes with zero area, but that prevents use cases like point data (single x, y, t location) which I want to use torchgeo for.

isaaccorley · 2022-01-19T20:05:11Z

Can we just add a check in the sampler constructor to compare the box size to the area of intersection?

adamjstewart · 2022-01-19T20:29:41Z

@isaaccorley Can we just add a check in the sampler constructor to compare the box size to the area of intersection?

We could, but what would you do with that information? Particularly in regards to:

@adamjstewart We could forbid bounding boxes with zero area, but that prevents use cases like point data (single x, y, t location) which I want to use torchgeo for.

If you skip intersections with zero area, you can't use samplers for point data.

isaaccorley · 2022-01-19T20:33:46Z

I'm thinking maybe for now just a warning that a user requested a box size that is larger than the area of intersection. In the meantime this will give users a more informative response than the above error.

adamjstewart · 2022-01-19T20:41:33Z

I don't think that will help for this particular issue. To help visualize the data that @tritolol shared, we have two datasets: 1 and 2. Both datasets include two tiles, A and B, that are adjacent like so:

+-----+-----+
|     |     |
|  A  |  B  |
|     |     |
+-----+-----+

Let's say A1 is tile A from dataset 1. In this case, the dataset intersection we compute includes four tiles:

A1 & A2: same area as A
B1 & B2: same area as B
A1 & B2: zero area
B1 & A2: zero area

The user didn't request a bounding box larger than the area of intersection because there isn't a single area of intersection, there are multiple. I think it will almost always be the case that some area of intersection in any large dataset will have an area smaller than the requested bounding box. For example, we could also have:

+----++----+
|    ||    |
|  A || B  |
|    ||    |
+----++----+

where A and B have slight overlap but not big enough for a bounding box. We could try to fuse these into a single tile C but that won't work for cases like:

+-----+
|     |
|  A  |
|    ++----+
+----++    |
     |  B  |
     |     |
     +-----+

isaaccorley · 2022-01-20T02:13:15Z

Can't we just compare H/W of intersection to H/W of user defined bounding box? I.e. if a user is requesting a 256x256 size box but the intersection height or width is less than the box shape then we should warn the user about this. Sorry haven't had a chance to sit down and look at this in depth.

Edit: if there are multiple areas of overlap can we not just filter them prior to sampling? I admit I haven't had an in depth use case for rtree so might be ignorant on how simple or not this might be.

adamjstewart · 2022-01-20T02:28:49Z

Can't we just compare H/W of intersection to H/W of user defined bounding box?

You could (it's easy to do), but I'm not sure where to do that and how to handle it.

we should warn the user about this

This is the part that doesn't make sense. With the dataset that @tritolol shared, tiles A and B are much larger than the requested bounding box, yet you still end up with regions of overlap with zero area. In general, just about every dataset will encounter this issue. Warning is useless, we need to avoid crashing, not just warn for every dataset.

adamjstewart · 2022-01-20T03:00:28Z

I think there are a few possible places we could address this issue:

GeoDataset

The index is first created when you instantiate a GeoDataset (usually via RasterDataset or VectorDataset). In the case of adjacent tiles, one solution would be to merge those tiles into a single bounding box. However, this won't work since:

A GeoDataset index entry also needs to point to a file (although we could change to a list of files)
In general, we can't predict a situation like dataset 1 having a single tile A and dataset 2 having a single tile B with overlapping edges/faces but zero area of overlap

So this location won't work.

IntersectionDataset

The first time we recompute the index is when we compute the intersection of two datasets. With the example data above, this is where we get those pesky intersection bounding boxes with zero area. Many users might consider this to be a bug, and it might make sense to remove bounding boxes with zero area. However, there are a couple of problems with this:

Not all datasets will include volumetric/areal data, some may involve point data. We don't want all intersections with these datasets to be empty
Even if we remove bounding boxes with zero area, we will still have very small intersection bboxes (smaller than the query bbox), but we don't know the size of the query bbox until sampling time
Not everyone will use IntersectionDataset, they might only need UnionDataset or not need to combine datasets at all (ChesapeakeCVPR)

So this location won't work either.

GeoSampler

We compute the index that we sample from in the GeoSampler base class, but we can't filter out intersections with a smaller H/W in this class because:

We don't know the size at that time
Not all geospatial samplers will want to do this (i.e. for point data)

So this location won't work either.

GeoSampler subclasses

I think this is where we'll have to do things (remove intersection bboxes smaller than the query bbox). This is the first time we know the size of the query bbox, and these samplers are already specific enough that they only work for volumetric/areal data. If someone wants to work with point data, they would already need to create a custom sampler. It's a shame we need to iterate through the R-tree in 4 different places just to get a list of locations to sample from, but I don't know of a different way to do this.

Another question is what to do when the area of intersection is 0 < bbox < size. Do we throw away those small regions of overlap, or do we still support sampling from them? The former is probably more efficient, but the latter may be important for some applications. It isn't clear to me what a good default would be. For example, if I'm using GridGeoSampler, I probably want to make predictions for all regions of data, even if they are smaller than the query bbox.

adamjstewart · 2022-01-22T19:54:45Z

I think we also have a case of strong sampler bias. Since we first choose a random tile and then choose a random bounding box, very small regions of intersection will have the same number of hits as large areas. I'm not sure how big of an issue this is.

adamjstewart added the samplers Samplers for indexing datasets label Dec 21, 2021

adamjstewart added this to the 0.1.2 milestone Dec 21, 2021

adamjstewart modified the milestones: 0.1.2, 0.2.1 Jan 1, 2022

This was referenced Jan 28, 2022

BoundingBox: add area/volume attributes #375

Merged

Samplers: avoid bounding boxes smaller than patch size #376

Merged

adamjstewart closed this as completed in #376 Feb 1, 2022

remtav mentioned this issue Feb 24, 2022

GridGeoSampler may never reach end of image #431

Closed

sofstef mentioned this issue Feb 28, 2022

Dataloader returning 0x0 images/masks with dev version #439

Closed

adamjstewart mentioned this issue Aug 22, 2022

Different sizes when combining prechipped dataset #737

Closed

adamjstewart mentioned this issue Apr 20, 2023

Zero area intersections in IntersectionDataset result in unexpected dataset lengths #1270

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: empty range for randrange() #319

ValueError: empty range for randrange() #319

RitwikGupta commented Dec 21, 2021

adamjstewart commented Dec 21, 2021

RitwikGupta commented Dec 22, 2021

adamjstewart commented Dec 23, 2021

calebrob6 commented Dec 24, 2021

tritolol commented Jan 18, 2022 •

edited

Loading

adamjstewart commented Jan 18, 2022

tritolol commented Jan 19, 2022 •

edited

Loading

adamjstewart commented Jan 19, 2022

adamjstewart commented Jan 19, 2022

adamjstewart commented Jan 19, 2022

isaaccorley commented Jan 19, 2022

adamjstewart commented Jan 19, 2022 •

edited

Loading

isaaccorley commented Jan 19, 2022

adamjstewart commented Jan 19, 2022 •

edited

Loading

isaaccorley commented Jan 20, 2022 •

edited

Loading

adamjstewart commented Jan 20, 2022

adamjstewart commented Jan 20, 2022 •

edited

Loading

adamjstewart commented Jan 22, 2022

ValueError: empty range for randrange() #319

ValueError: empty range for randrange() #319

Comments

RitwikGupta commented Dec 21, 2021

adamjstewart commented Dec 21, 2021

RitwikGupta commented Dec 22, 2021

adamjstewart commented Dec 23, 2021

calebrob6 commented Dec 24, 2021

tritolol commented Jan 18, 2022 • edited Loading

adamjstewart commented Jan 18, 2022

tritolol commented Jan 19, 2022 • edited Loading

adamjstewart commented Jan 19, 2022

adamjstewart commented Jan 19, 2022

adamjstewart commented Jan 19, 2022

isaaccorley commented Jan 19, 2022

adamjstewart commented Jan 19, 2022 • edited Loading

isaaccorley commented Jan 19, 2022

adamjstewart commented Jan 19, 2022 • edited Loading

isaaccorley commented Jan 20, 2022 • edited Loading

adamjstewart commented Jan 20, 2022

adamjstewart commented Jan 20, 2022 • edited Loading

GeoDataset

IntersectionDataset

GeoSampler

GeoSampler subclasses

adamjstewart commented Jan 22, 2022

tritolol commented Jan 18, 2022 •

edited

Loading

tritolol commented Jan 19, 2022 •

edited

Loading

adamjstewart commented Jan 19, 2022 •

edited

Loading

adamjstewart commented Jan 19, 2022 •

edited

Loading

isaaccorley commented Jan 20, 2022 •

edited

Loading

adamjstewart commented Jan 20, 2022 •

edited

Loading