IDTReeS bugs #684

ashnair1 · 2022-07-21T12:28:28Z

Description

While using IDTReeS to train an object detector, I noticed quite a few bugs within this dataset.

1. Coordinates are swapped

Fixed by #683

The above image is a comparison of the actual ground truth of OSBS_27 and the plot produced by torchgeo.

2. Missing polygon

Shapefile contains 5 polygons whereas the plot only shows 4. This issue seems to be at the dataset level.

3. Negative box coordinates in pixel space

This is a weird one. I noticed that some boxes had negative pixel coordinates. Observed this in MLBS_5 but its possible this could be present in others as well.

Comparing QGIS and the corrected torchgeo plot for MLBS_5 didn't reveal much

As you can see, the top 4 polygons are missing. But unlike the previous issue, these polygons do exist. However they seem to be outside the bounds of the image.

This could explain the negative coordinates but I'm not sure as to why there is a difference between the plots.

Steps to reproduce

Most of these issues can be seen by downloading the IDTReeS dataset and comparing the torchgeo plots vs the shapefiles.

Version

0.4.0.dev0

adamjstewart · 2022-08-21T19:17:48Z

@ashnair1 @isaaccorley any thoughts on 2 and 3? I would like to fix these issues in the 0.3.1 (coming in approximately 1 week).

Could 3 explain 2? In example 2, the missing patch seems to be right on the edge of the image.

isaaccorley · 2022-08-21T23:13:22Z

I think fixing the coordinates would likely fix the issue. I would bet that our bbox plotting method doesn't support boxes that are out of image bounds

ashnair1 · 2022-08-22T10:38:27Z

Point 2 (Missing polygons) is caused by the absence of some boxes in train_data.csv which means they have no classification.

For example, the missing box here has ID OSBS00011 which is not present in train_data.csv. itc_rsFile.csv contains 1180 records while train_data.csv only contains 1166.

At any rate, this is unrelated to Point 3.

adamjstewart · 2022-09-03T17:25:21Z

@ethanwhite do you know anything about points 2 and 3?

It seems that not all polygons have a class label
It seems that some polygons have coordinates outside of the bounds of the image

Are these expected? For 2, should we just ignore those polygons and pretend that they don't exist? For 3, I'm not sure how a model would be able to predict those full polygons. We could either ignore them or crop them to the bounds of the image.

isaaccorley · 2022-09-04T05:08:00Z

I wrote the dataset so I can provide some insight.

Task1 of the dataset is simply to predict the bbox of the tree crown. Task 2 is to perform both (predict + classify the species). For task1 we shouldn't worry about filtering the boxes without labels. For task2 we should filter them.
I think cropping to the image bounds would be better because technically it is still a tree crown it would just be a partial which would definitely be encountered during inference when tiling a larger ortho if the model was deployed. When writing the dataset I assumed all the geometries would be within the image bounds but looks like this was not the case.

adamjstewart · 2022-09-04T14:03:28Z

@isaaccorley sounds good to me, can you make a PR with those changes?

ethanwhite · 2022-09-05T12:58:29Z

@adamjstewart - I've pinged the folks on our team who can best address this. Hopefully they'll chime in after the holiday and once we're all on the same page we can fix things in the core data source as well.

isaaccorley · 2022-09-05T20:52:43Z

Quick update:

I've clipped the boxes to the image bounds and seems that the boxes with negative pixel coordinates touch the image bounds but have an area of zero.

I think the best option is to clip and remove any boxes with an area of zero. Let me know if this works for everyone.

Edit: Did a quick analysis and this affects 52/1129=~5% of boxes in the train set

adamjstewart · 2022-09-05T21:14:18Z

Hmm, that's quite a lot of images. Maybe wait for @ethanwhite's team to get back to us.

isaaccorley · 2022-09-05T21:42:40Z

It's 52 boxes. It affects 19 images. These are the images with boxes that only touch the image bounds.

[
 'train/RemoteSensing/RGB/MLBS_25.tif',
 'train/RemoteSensing/RGB/MLBS_27.tif',
 'train/RemoteSensing/RGB/MLBS_32.tif',
 'train/RemoteSensing/RGB/MLBS_34.tif',
 'train/RemoteSensing/RGB/MLBS_35.tif',
 'train/RemoteSensing/RGB/MLBS_37.tif',
 'train/RemoteSensing/RGB/MLBS_39.tif',
 'train/RemoteSensing/RGB/MLBS_41.tif',
 'train/RemoteSensing/RGB/MLBS_43.tif',
 'train/RemoteSensing/RGB/MLBS_45.tif',
 'train/RemoteSensing/RGB/MLBS_5.tif',
 'train/RemoteSensing/RGB/OSBS_12.tif',
 'train/RemoteSensing/RGB/OSBS_15.tif',
 'train/RemoteSensing/RGB/OSBS_18.tif',
 'train/RemoteSensing/RGB/OSBS_2.tif',
 'train/RemoteSensing/RGB/OSBS_24.tif',
 'train/RemoteSensing/RGB/OSBS_29.tif',
 'train/RemoteSensing/RGB/OSBS_36.tif',
 'train/RemoteSensing/RGB/OSBS_38.tif'
]

adamjstewart · 2022-09-06T21:17:46Z

@isaaccorley does #760 only fix 3 or does it also fix 2? Did you want to submit a separate PR to fix 2 or should we do it in the same PR? No preference from me.

isaaccorley · 2022-09-06T22:09:11Z

The dataset already performs (2) by filtering boxes without labels. So this isn't technically a bug, it was just a matter of deciding to not refactor to load boxes that don't have labels.

#760 cleans up (3) though.

adamjstewart · 2022-09-07T02:24:27Z

The dataset already performs (2) by filtering boxes without labels.

So you're saying the dataset already does:

For task1 we shouldn't worry about filtering the boxes without labels. For task2 we should filter them.

So issue (2) only exists for task2, but that's okay, and issue (2) doesn't exist for task1 because we aren't filtering them?

isaaccorley · 2022-09-07T12:56:33Z

Yes. Right now as it's implemented, the train set doesn't have separate setup for task1/task2. It simply loads all boxes and labels and filters the rest. The test set does however support separate task1/task2.

MarconiS · 2022-09-07T14:36:55Z

Hi, sorry for the delay. I can help with understanding the data and figuring how to fix (2) and (3).

Point 2 (Missing polygons) is caused by the absence of some boxes in train_data.csv which means they have no classification.

Yes, I can confirm that for some individual trees we could not provide a species label. That is because the field sampling was performed in 20x20m quadrants and stems outside the area of interest were not identified from the field. This allowed for multiple crown objects to make it into the images while their stems did not make into the field sampling area. Since we were testing for both detection and classification, we kept these crown objects, but assigned them to no label.

For OSBS_27 although, there seems to be another issue. The field data only cover the top-left quadrant (~10x10m), while in fact it should cover the entire image. I just checked with the field stem position data retrieved from NEON dataset, and it seems to confirm the presence of stems only where the boxes are. I am looking into it to assess whether there was an issue in retrieving stem coordinates for that specific plot. For now, I'd probably suggest to remove plot OSBS_27.

So issue (2) only exists for task2, but that's okay, and issue (2) doesn't exist for task1 because we aren't filtering them?

This sounds correct to me.

It's 52 boxes. It affects 19 images. These are the images with boxes that only touch the image bounds.

I'll try to look into it today and touch back. I need to dive into the function you are using but I believe it has to with the fact that in some cases the images may have been clipped at the exact border of some crowns.

adamjstewart · 2022-09-07T15:57:28Z

Right now as it's implemented, the train set doesn't have separate setup for task1/task2

Why is that the case? I can imagine that if you're only goal is performing well on test task1, you would want to keep all boxes for train even if they don't have labels. Doesn't having different logic for train and test actually make the dataset more complicated?

isaaccorley · 2022-09-07T16:02:16Z

You are right. However when I initially wrote the dataset I assumed that there weren't a few boxes without labels and that boxes wouldn't be outside the bounds of some images. I'll make an issue to rewrite the dataset to have separate task1 and task2 for the train set as well.

adamjstewart · 2022-09-07T17:06:26Z

Cool, I think that can wait until 0.4 since that's an actual API change. Since you're making a new issue for that, I think we can consider this particular issue solved by #683 and #760. Thanks for all of the hard work everyone! Datasets are complicated!!

adamjstewart added the datasets Geospatial or benchmark datasets label Jul 21, 2022

adamjstewart added this to the 0.3.1 milestone Jul 21, 2022

isaaccorley mentioned this issue Sep 6, 2022

IDTReeS - Clip boxes outside of image bounds #760

Merged

adamjstewart closed this as completed Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDTReeS bugs #684

IDTReeS bugs #684

ashnair1 commented Jul 21, 2022 •

edited

Loading

adamjstewart commented Aug 21, 2022

isaaccorley commented Aug 21, 2022

ashnair1 commented Aug 22, 2022 •

edited

Loading

adamjstewart commented Sep 3, 2022

isaaccorley commented Sep 4, 2022

adamjstewart commented Sep 4, 2022

ethanwhite commented Sep 5, 2022 •

edited

Loading

isaaccorley commented Sep 5, 2022 •

edited

Loading

adamjstewart commented Sep 5, 2022

isaaccorley commented Sep 5, 2022 •

edited

Loading

adamjstewart commented Sep 6, 2022

isaaccorley commented Sep 6, 2022

adamjstewart commented Sep 7, 2022

isaaccorley commented Sep 7, 2022

MarconiS commented Sep 7, 2022

adamjstewart commented Sep 7, 2022

isaaccorley commented Sep 7, 2022

adamjstewart commented Sep 7, 2022

IDTReeS bugs #684

IDTReeS bugs #684

Comments

ashnair1 commented Jul 21, 2022 • edited Loading

Description

1. Coordinates are swapped

2. Missing polygon

3. Negative box coordinates in pixel space

Steps to reproduce

Version

adamjstewart commented Aug 21, 2022

isaaccorley commented Aug 21, 2022

ashnair1 commented Aug 22, 2022 • edited Loading

adamjstewart commented Sep 3, 2022

isaaccorley commented Sep 4, 2022

adamjstewart commented Sep 4, 2022

ethanwhite commented Sep 5, 2022 • edited Loading

isaaccorley commented Sep 5, 2022 • edited Loading

adamjstewart commented Sep 5, 2022

isaaccorley commented Sep 5, 2022 • edited Loading

adamjstewart commented Sep 6, 2022

isaaccorley commented Sep 6, 2022

adamjstewart commented Sep 7, 2022

isaaccorley commented Sep 7, 2022

MarconiS commented Sep 7, 2022

adamjstewart commented Sep 7, 2022

isaaccorley commented Sep 7, 2022

adamjstewart commented Sep 7, 2022

ashnair1 commented Jul 21, 2022 •

edited

Loading

ashnair1 commented Aug 22, 2022 •

edited

Loading

ethanwhite commented Sep 5, 2022 •

edited

Loading

isaaccorley commented Sep 5, 2022 •

edited

Loading

isaaccorley commented Sep 5, 2022 •

edited

Loading