Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeneralizedRCNN raises an exception when encountering a bad bounding box during forward #6787

Open
jangop opened this issue Oct 18, 2022 · 2 comments

Comments

@jangop
Copy link

jangop commented Oct 18, 2022

The current implementation of GeneralizedRCNN checks for degenerate bounding boxes during its forward pass. While -- in principle -- that is great, it can make training brittle, especially when randomized data augmentation is involved, because an exception is raised (via assert by torch). I see that the current implementation is experimental (it says # TODO: Move this to a function a couple of lines up) but I am wondering what could be more helpful ways of dealing with those bad bounding boxes:

  1. Raise an explicit exception that can be handled during training to, for example, skip the current batch?
  2. Filter out degenerate bounding boxes?
  3. Keep the current implementation because torch's _assert is great and I just don't understand it properly? 🙄
@datumbox
Copy link
Contributor

datumbox commented Oct 20, 2022

Degenerate BBoxes can mess up the training phase of the models and this will lead to a very hard to debug situation. This is why I believe originally this assertion was added. We typically avoid such assertions when possible to improve performance but in cases where the debugging can become extremely difficult we maintain them.

I think the best way to avoid the issue is to filter out degenerate bboxes. Our new Transforms V2 prototype offers a built-in method for doing so by utilizing the ClampBoundingBoxes followed by the RemoveSmallBoundingBoxes. The first will clamp the bboxes to valid values (ensuring xy1<=xy2) and the latter will eliminate any small bboxes that don't pass the expected threshold.

We are preparing a blogpost to announce the Transforms V2 API. Some initial info are recorded at #6753

@jangop
Copy link
Author

jangop commented Oct 21, 2022

Agreed, “the best way” surely is filtering out degenerate bounding boxes. But the issue of bad data and how to deal with it is old and it will likely stick around. See, for example, nonechucks. Its latest release is from 2019, but I find the concept interesting, and in its README, a few points are outlined for why one might want to deal with bad data on the fly.

I struggle with such issues when trying to rely on frameworks intended to make life easier. The pipeline, in which I now noticed the issue of degenerate bounding boxes, loads data using activeloop deeplake, augments it using albumentations, and performs training using pytorch lightning. In theory, everything should just work. In practice, training crashes because of an odd bounding box and there is nothing I can do about it.

Supposedly, by the way, albumentations filters out bounding boxes, too:

https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#min_area-and-min_visibility

This is not-so-easy, though:

albumentations-team/albumentations#1322

If Transforms V2 obsoletes albumentations and turns out more robust: 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants