-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes VectorDataset rounding bug causing sample mask size mismatch #675
Fixes VectorDataset rounding bug causing sample mask size mismatch #675
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can confirm that this fixes the issue, thanks for the PR!
torchgeo/datasets/geo.py
Outdated
@@ -611,12 +611,12 @@ def __getitem__(self, query: BoundingBox) -> Dict[str, Any]: | |||
) | |||
if shapes: | |||
masks = rasterio.features.rasterize( | |||
shapes, out_shape=(int(height), int(width)), transform=transform | |||
shapes, out_shape=(int(round(height)), int(round(width))), transform=transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is too long now. Can you run black torchgeo/datasets/geo.py
to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to cast it to int
again if you're not specifying the number of decimals to round to.
There are several other places where we aren't rounding before casting to an int. I'm not worried about the str -> int conversions, but some of these float -> int conversions could be problematic. Not sure if integer division includes rounding or not. I wonder if any of these should also be fixed: $ grep -PR '\bint\((?!round)'
tests/models/test_changestar.py: assert y[0].shape == (3, 1, int(32 * sf), int(32 * sf))
tests/samplers/test_single.py: rows = int((100 - sampler.size[0]) // sampler.stride[0]) + 1
tests/samplers/test_single.py: cols = int((100 - sampler.size[1]) // sampler.stride[1]) + 1
tests/data/forestdamage/data.py: ET.SubElement(bbox, "xmin").text = str(0 + int(SIZE / 4))
tests/data/forestdamage/data.py: ET.SubElement(bbox, "ymin").text = str(0 + int(SIZE / 4))
tests/data/forestdamage/data.py: ET.SubElement(bbox, "xmax").text = str(SIZE - int(SIZE / 4))
tests/data/forestdamage/data.py: ET.SubElement(bbox, "ymax").text = str(SIZE - int(SIZE / 4))
torchgeo/datasets/forestdamage.py: int(bndbox.find("xmin").text), # type: ignore[union-attr, arg-type]
torchgeo/datasets/forestdamage.py: int(bndbox.find("ymin").text), # type: ignore[union-attr, arg-type]
torchgeo/datasets/forestdamage.py: int(bndbox.find("xmax").text), # type: ignore[union-attr, arg-type]
torchgeo/datasets/forestdamage.py: int(bndbox.find("ymax").text), # type: ignore[union-attr, arg-type]
torchgeo/datasets/so2sat.py: self.size = int(f["label"].shape[0])
torchgeo/datasets/spacenet.py: [int(math.ceil(s / bin_size_mph)) for s in speed_arr_bin]
torchgeo/datasets/spacenet.py: int(feature["properties"]["inferred_speed_mph"]) - 1
torchgeo/datasets/gbif.py: mint = datetime(int(year), 1, 1)
torchgeo/datasets/gbif.py: maxt = datetime(int(year) + 1, 1, 1)
torchgeo/datasets/gbif.py: mint = datetime(int(year), int(month), 1)
torchgeo/datasets/gbif.py: maxt = datetime(int(year) + 1, 1, 1)
torchgeo/datasets/gbif.py: maxt = datetime(int(year), int(month) + 1, 1)
torchgeo/datasets/gbif.py: mint = datetime(int(year), int(month), int(day))
torchgeo/datasets/cowc.py: target = int(self.targets[index])
torchgeo/datasets/idtrees.py: [self.idx2class[int(i)] for i in sample["label"]]
torchgeo/datasets/idtrees.py: [self.idx2class[int(i)] for i in sample["prediction_label"]]
torchgeo/datasets/openbuildings.py: shapes, out_shape=(int(height), int(width)), transform=transform
torchgeo/datasets/openbuildings.py: masks = torch.zeros(size=(1, int(height), int(width)))
torchgeo/datasets/cv4a_kenya_crop_type.py: train_field_ids.append(int(row[0]))
torchgeo/datasets/cv4a_kenya_crop_type.py: test_field_ids.append(int(row[1]))
torchgeo/datasets/cyclone.py: features["relative_time"] = int(features["relative_time"])
torchgeo/datasets/cyclone.py: features["ocean"] = int(features["ocean"])
torchgeo/datasets/cyclone.py: features["label"] = int(features["wind_speed"])
torchgeo/datamodules/etci2021.py: size_train = int(0.8 * size_train_val)
torchgeo/datamodules/chesapeake.py: self.original_patch_size = int(patch_size * 2.0)
torchgeo/datamodules/sen12ms.py: scene_id = int(parts[3])
torchgeo/datamodules/utils.py: val_length = int(len(dataset) * val_pct)
torchgeo/datamodules/utils.py: val_length = int(len(dataset) * val_pct)
torchgeo/datamodules/utils.py: test_length = int(len(dataset) * test_pct)
torchgeo/models/farseg.py: num_upsample = int(math.log2(int(in_feat_os))) - int(
torchgeo/models/farseg.py: math.log2(int(out_feature_output_stride))
torchgeo/samplers/single.py: rows = int((bounds.maxy - bounds.miny - self.size[0]) // self.stride[0]) + 1
torchgeo/samplers/single.py: cols = int((bounds.maxx - bounds.minx - self.size[1]) // self.stride[1]) + 1
torchgeo/samplers/single.py: rows = int((bounds.maxy - bounds.miny - self.size[0]) // self.stride[0]) + 1
torchgeo/samplers/single.py: cols = int((bounds.maxx - bounds.minx - self.size[1]) // self.stride[1]) + 1 |
torchgeo/datasets/geo.py
Outdated
) | ||
else: | ||
# If no features are found in this query, return an empty mask | ||
# with the default fill value and dtype used by rasterize | ||
masks = np.zeros((int(height), int(width)), dtype=np.uint8) | ||
masks = np.zeros((int(round(height)), int(round(width))), dtype=np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. Casting to int not required.
I had encountered a related issue with Fix using if test_pct is None:
- val_length = int(len(dataset) * val_pct)
+ val_length = round(len(dataset) * val_pct)
train_length = len(dataset) - val_length
return random_split(dataset, [train_length, val_length])
else:
- val_length = int(len(dataset) * val_pct)
- test_length = int(len(dataset) * test_pct)
+ val_length = round(len(dataset) * val_pct)
+ test_length = round(len(dataset) * test_pct) |
I don't understand how |
My bad. Meant less than 1. The comment is unrelated to this PR. I was just replying to @adamjstewart's comment on other places where round vs int errors can occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can fix other instances of this bug in a separate PR.
…icrosoft#675) * Fixes VectorDataset rounding bug causing sample mask size mismatch * removes unnecessary casting to int
I've had to adjust the test to also account for the extra pixel generated, which seems right to me (
1.9-1.1 = 0.8
which would be 8 pixels with ares=0.1
).Fixes #674