Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDTReeS bugs #684

Closed
ashnair1 opened this issue Jul 21, 2022 · 18 comments
Closed

IDTReeS bugs #684

ashnair1 opened this issue Jul 21, 2022 · 18 comments
Labels
datasets Geospatial or benchmark datasets
Milestone

Comments

@ashnair1
Copy link
Collaborator

ashnair1 commented Jul 21, 2022

Description

While using IDTReeS to train an object detector, I noticed quite a few bugs within this dataset.

1. Coordinates are swapped

Fixed by #683

OSBS_27_before

The above image is a comparison of the actual ground truth of OSBS_27 and the plot produced by torchgeo.

2. Missing polygon

Shapefile contains 5 polygons whereas the plot only shows 4. This issue seems to be at the dataset level.

OSBS_27_missingpoly

3. Negative box coordinates in pixel space

This is a weird one. I noticed that some boxes had negative pixel coordinates. Observed this in MLBS_5 but its possible this could be present in others as well.

Comparing QGIS and the corrected torchgeo plot for MLBS_5 didn't reveal much

MLBS_5_initial

As you can see, the top 4 polygons are missing. But unlike the previous issue, these polygons do exist. However they seem to be outside the bounds of the image.

MLBS_5_second

This could explain the negative coordinates but I'm not sure as to why there is a difference between the plots.

Steps to reproduce

Most of these issues can be seen by downloading the IDTReeS dataset and comparing the torchgeo plots vs the shapefiles.

Version

0.4.0.dev0

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Jul 21, 2022
@adamjstewart adamjstewart added this to the 0.3.1 milestone Jul 21, 2022
@adamjstewart
Copy link
Collaborator

@ashnair1 @isaaccorley any thoughts on 2 and 3? I would like to fix these issues in the 0.3.1 (coming in approximately 1 week).

Could 3 explain 2? In example 2, the missing patch seems to be right on the edge of the image.

@isaaccorley
Copy link
Collaborator

I think fixing the coordinates would likely fix the issue. I would bet that our bbox plotting method doesn't support boxes that are out of image bounds

@ashnair1
Copy link
Collaborator Author

ashnair1 commented Aug 22, 2022

Point 2 (Missing polygons) is caused by the absence of some boxes in train_data.csv which means they have no classification.

For example, the missing box here has ID OSBS00011 which is not present in train_data.csv. itc_rsFile.csv contains 1180 records while train_data.csv only contains 1166.

At any rate, this is unrelated to Point 3.

@adamjstewart
Copy link
Collaborator

@ethanwhite do you know anything about points 2 and 3?

  1. It seems that not all polygons have a class label
  2. It seems that some polygons have coordinates outside of the bounds of the image

Are these expected? For 2, should we just ignore those polygons and pretend that they don't exist? For 3, I'm not sure how a model would be able to predict those full polygons. We could either ignore them or crop them to the bounds of the image.

@isaaccorley
Copy link
Collaborator

I wrote the dataset so I can provide some insight.

  1. Task1 of the dataset is simply to predict the bbox of the tree crown. Task 2 is to perform both (predict + classify the species). For task1 we shouldn't worry about filtering the boxes without labels. For task2 we should filter them.

  2. I think cropping to the image bounds would be better because technically it is still a tree crown it would just be a partial which would definitely be encountered during inference when tiling a larger ortho if the model was deployed. When writing the dataset I assumed all the geometries would be within the image bounds but looks like this was not the case.

@adamjstewart
Copy link
Collaborator

@isaaccorley sounds good to me, can you make a PR with those changes?

@ethanwhite
Copy link
Contributor

ethanwhite commented Sep 5, 2022

@adamjstewart - I've pinged the folks on our team who can best address this. Hopefully they'll chime in after the holiday and once we're all on the same page we can fix things in the core data source as well.

@isaaccorley
Copy link
Collaborator

isaaccorley commented Sep 5, 2022

Quick update:

I've clipped the boxes to the image bounds and seems that the boxes with negative pixel coordinates touch the image bounds but have an area of zero.

Screen Shot 2022-09-05 at 3 50 56 PM

I think the best option is to clip and remove any boxes with an area of zero. Let me know if this works for everyone.

Edit: Did a quick analysis and this affects 52/1129=~5% of boxes in the train set

@adamjstewart
Copy link
Collaborator

Hmm, that's quite a lot of images. Maybe wait for @ethanwhite's team to get back to us.

@isaaccorley
Copy link
Collaborator

isaaccorley commented Sep 5, 2022

It's 52 boxes. It affects 19 images. These are the images with boxes that only touch the image bounds.

[
 'train/RemoteSensing/RGB/MLBS_25.tif',
 'train/RemoteSensing/RGB/MLBS_27.tif',
 'train/RemoteSensing/RGB/MLBS_32.tif',
 'train/RemoteSensing/RGB/MLBS_34.tif',
 'train/RemoteSensing/RGB/MLBS_35.tif',
 'train/RemoteSensing/RGB/MLBS_37.tif',
 'train/RemoteSensing/RGB/MLBS_39.tif',
 'train/RemoteSensing/RGB/MLBS_41.tif',
 'train/RemoteSensing/RGB/MLBS_43.tif',
 'train/RemoteSensing/RGB/MLBS_45.tif',
 'train/RemoteSensing/RGB/MLBS_5.tif',
 'train/RemoteSensing/RGB/OSBS_12.tif',
 'train/RemoteSensing/RGB/OSBS_15.tif',
 'train/RemoteSensing/RGB/OSBS_18.tif',
 'train/RemoteSensing/RGB/OSBS_2.tif',
 'train/RemoteSensing/RGB/OSBS_24.tif',
 'train/RemoteSensing/RGB/OSBS_29.tif',
 'train/RemoteSensing/RGB/OSBS_36.tif',
 'train/RemoteSensing/RGB/OSBS_38.tif'
]

@adamjstewart
Copy link
Collaborator

@isaaccorley does #760 only fix 3 or does it also fix 2? Did you want to submit a separate PR to fix 2 or should we do it in the same PR? No preference from me.

@isaaccorley
Copy link
Collaborator

The dataset already performs (2) by filtering boxes without labels. So this isn't technically a bug, it was just a matter of deciding to not refactor to load boxes that don't have labels.

#760 cleans up (3) though.

@adamjstewart
Copy link
Collaborator

The dataset already performs (2) by filtering boxes without labels.

So you're saying the dataset already does:

For task1 we shouldn't worry about filtering the boxes without labels. For task2 we should filter them.

So issue (2) only exists for task2, but that's okay, and issue (2) doesn't exist for task1 because we aren't filtering them?

@isaaccorley
Copy link
Collaborator

Yes. Right now as it's implemented, the train set doesn't have separate setup for task1/task2. It simply loads all boxes and labels and filters the rest. The test set does however support separate task1/task2.

@MarconiS
Copy link

MarconiS commented Sep 7, 2022

Hi, sorry for the delay. I can help with understanding the data and figuring how to fix (2) and (3).

Point 2 (Missing polygons) is caused by the absence of some boxes in train_data.csv which means they have no classification.

Yes, I can confirm that for some individual trees we could not provide a species label. That is because the field sampling was performed in 20x20m quadrants and stems outside the area of interest were not identified from the field. This allowed for multiple crown objects to make it into the images while their stems did not make into the field sampling area. Since we were testing for both detection and classification, we kept these crown objects, but assigned them to no label.

For OSBS_27 although, there seems to be another issue. The field data only cover the top-left quadrant (~10x10m), while in fact it should cover the entire image. I just checked with the field stem position data retrieved from NEON dataset, and it seems to confirm the presence of stems only where the boxes are. I am looking into it to assess whether there was an issue in retrieving stem coordinates for that specific plot. For now, I'd probably suggest to remove plot OSBS_27.

So issue (2) only exists for task2, but that's okay, and issue (2) doesn't exist for task1 because we aren't filtering them?

This sounds correct to me.

It's 52 boxes. It affects 19 images. These are the images with boxes that only touch the image bounds.

I'll try to look into it today and touch back. I need to dive into the function you are using but I believe it has to with the fact that in some cases the images may have been clipped at the exact border of some crowns.

@adamjstewart
Copy link
Collaborator

Right now as it's implemented, the train set doesn't have separate setup for task1/task2

Why is that the case? I can imagine that if you're only goal is performing well on test task1, you would want to keep all boxes for train even if they don't have labels. Doesn't having different logic for train and test actually make the dataset more complicated?

@isaaccorley
Copy link
Collaborator

You are right. However when I initially wrote the dataset I assumed that there weren't a few boxes without labels and that boxes wouldn't be outside the bounds of some images. I'll make an issue to rewrite the dataset to have separate task1 and task2 for the train set as well.

@adamjstewart
Copy link
Collaborator

Cool, I think that can wait until 0.4 since that's an actual API change. Since you're making a new issue for that, I think we can consider this particular issue solved by #683 and #760. Thanks for all of the hard work everyone! Datasets are complicated!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

5 participants