-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDTReeS bugs #684
Comments
@ashnair1 @isaaccorley any thoughts on 2 and 3? I would like to fix these issues in the 0.3.1 (coming in approximately 1 week). Could 3 explain 2? In example 2, the missing patch seems to be right on the edge of the image. |
I think fixing the coordinates would likely fix the issue. I would bet that our bbox plotting method doesn't support boxes that are out of image bounds |
Point 2 (Missing polygons) is caused by the absence of some boxes in For example, the missing box here has ID At any rate, this is unrelated to Point 3. |
@ethanwhite do you know anything about points 2 and 3?
Are these expected? For 2, should we just ignore those polygons and pretend that they don't exist? For 3, I'm not sure how a model would be able to predict those full polygons. We could either ignore them or crop them to the bounds of the image. |
I wrote the dataset so I can provide some insight.
|
@isaaccorley sounds good to me, can you make a PR with those changes? |
@adamjstewart - I've pinged the folks on our team who can best address this. Hopefully they'll chime in after the holiday and once we're all on the same page we can fix things in the core data source as well. |
Quick update: I've clipped the boxes to the image bounds and seems that the boxes with negative pixel coordinates touch the image bounds but have an area of zero. I think the best option is to clip and remove any boxes with an area of zero. Let me know if this works for everyone. Edit: Did a quick analysis and this affects 52/1129=~5% of boxes in the train set |
Hmm, that's quite a lot of images. Maybe wait for @ethanwhite's team to get back to us. |
It's 52 boxes. It affects 19 images. These are the images with boxes that only touch the image bounds. [
'train/RemoteSensing/RGB/MLBS_25.tif',
'train/RemoteSensing/RGB/MLBS_27.tif',
'train/RemoteSensing/RGB/MLBS_32.tif',
'train/RemoteSensing/RGB/MLBS_34.tif',
'train/RemoteSensing/RGB/MLBS_35.tif',
'train/RemoteSensing/RGB/MLBS_37.tif',
'train/RemoteSensing/RGB/MLBS_39.tif',
'train/RemoteSensing/RGB/MLBS_41.tif',
'train/RemoteSensing/RGB/MLBS_43.tif',
'train/RemoteSensing/RGB/MLBS_45.tif',
'train/RemoteSensing/RGB/MLBS_5.tif',
'train/RemoteSensing/RGB/OSBS_12.tif',
'train/RemoteSensing/RGB/OSBS_15.tif',
'train/RemoteSensing/RGB/OSBS_18.tif',
'train/RemoteSensing/RGB/OSBS_2.tif',
'train/RemoteSensing/RGB/OSBS_24.tif',
'train/RemoteSensing/RGB/OSBS_29.tif',
'train/RemoteSensing/RGB/OSBS_36.tif',
'train/RemoteSensing/RGB/OSBS_38.tif'
] |
@isaaccorley does #760 only fix 3 or does it also fix 2? Did you want to submit a separate PR to fix 2 or should we do it in the same PR? No preference from me. |
The dataset already performs (2) by filtering boxes without labels. So this isn't technically a bug, it was just a matter of deciding to not refactor to load boxes that don't have labels. #760 cleans up (3) though. |
So you're saying the dataset already does:
So issue (2) only exists for task2, but that's okay, and issue (2) doesn't exist for task1 because we aren't filtering them? |
Yes. Right now as it's implemented, the train set doesn't have separate setup for task1/task2. It simply loads all boxes and labels and filters the rest. The test set does however support separate task1/task2. |
Hi, sorry for the delay. I can help with understanding the data and figuring how to fix (2) and (3).
Yes, I can confirm that for some individual trees we could not provide a species label. That is because the field sampling was performed in 20x20m quadrants and stems outside the area of interest were not identified from the field. This allowed for multiple crown objects to make it into the images while their stems did not make into the field sampling area. Since we were testing for both detection and classification, we kept these crown objects, but assigned them to no label. For OSBS_27 although, there seems to be another issue. The field data only cover the top-left quadrant (~10x10m), while in fact it should cover the entire image. I just checked with the field stem position data retrieved from NEON dataset, and it seems to confirm the presence of stems only where the boxes are. I am looking into it to assess whether there was an issue in retrieving stem coordinates for that specific plot. For now, I'd probably suggest to remove plot OSBS_27.
This sounds correct to me.
I'll try to look into it today and touch back. I need to dive into the function you are using but I believe it has to with the fact that in some cases the images may have been clipped at the exact border of some crowns. |
Why is that the case? I can imagine that if you're only goal is performing well on test task1, you would want to keep all boxes for train even if they don't have labels. Doesn't having different logic for train and test actually make the dataset more complicated? |
You are right. However when I initially wrote the dataset I assumed that there weren't a few boxes without labels and that boxes wouldn't be outside the bounds of some images. I'll make an issue to rewrite the dataset to have separate task1 and task2 for the train set as well. |
Description
While using IDTReeS to train an object detector, I noticed quite a few bugs within this dataset.
1. Coordinates are swapped
Fixed by #683
The above image is a comparison of the actual ground truth of OSBS_27 and the plot produced by torchgeo.
2. Missing polygon
Shapefile contains 5 polygons whereas the plot only shows 4. This issue seems to be at the dataset level.
3. Negative box coordinates in pixel space
This is a weird one. I noticed that some boxes had negative pixel coordinates. Observed this in MLBS_5 but its possible this could be present in others as well.
Comparing QGIS and the corrected torchgeo plot for MLBS_5 didn't reveal much
As you can see, the top 4 polygons are missing. But unlike the previous issue, these polygons do exist. However they seem to be outside the bounds of the image.
This could explain the negative coordinates but I'm not sure as to why there is a difference between the plots.
Steps to reproduce
Most of these issues can be seen by downloading the IDTReeS dataset and comparing the torchgeo plots vs the shapefiles.
Version
0.4.0.dev0
The text was updated successfully, but these errors were encountered: