-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IDTReeS dataset #201
Add IDTReeS dataset #201
Conversation
02b26cb
to
150cd61
Compare
I'm giving a talk to a group of biologists who use drone imagery for spectral biology on Monday. If this PR is close to being ready it would be great to get this merged. |
Working on it tonight actually. |
150cd61
to
bc02b3f
Compare
91bfeaf
to
e2e89d4
Compare
Ok looks like tests are passing now. Had to skip the lidar plot test for python 3.9 until open3d v0.14 and on windows and mac due to segmentation faults but works fine on ubuntu 3.6-3.8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the .las
files are manually created random noise and not real data? Can you add a section to tests/data/README.md
explaining how to create this data for the next person who needs to?
4946516
to
9c922de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my only remaining concerns relate to minimum dependency versions, everything else looks good!
8317b95
to
33ddf1a
Compare
a78faf3
to
e05cd51
Compare
else: | ||
directory = os.path.join(root, self.task) | ||
if self.task == "task1": | ||
geoms = None # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While debugging the recent mypy issues I noticed a lot of typing errors in this file. For example, the _load
function says that it returns Tuple[List[str], Dict[int, Dict[str, Any]], Any]
, but many of these return values can also be None
. The type: ignore
here is masking this for geoms
, and the Any
is masking this for labels
, but the real issue is that the function is not typed correctly. @isaaccorley can you fix this? We should be ignoring typing errors very sparingly for issues that are beyond our control (PyTorch is missing type hints for large parts of the library), not for things like this. For the same reason, we should try to avoid Any
as it basically turns off all type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll refactor but so you know, this dataset has no structure to the nested dictionaries whatsoever so I opted for Dict[str, Any]
in most cases. Also some of the methods return Any
due to them returning an open3d object or a pandas DataFrame but I'm not able to use those types because I'm not importing them at the top level, only inside the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://mypy.readthedocs.io/en/stable/runtime_troubles.html, it may be possible to declare the type in a string or comment instead to avoid the top-level import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ok I'll submit a PR.
coords = [f.index(x, y) for x, y in geom] | ||
xmin = min([coord[0] for coord in coords]) | ||
xmax = max([coord[0] for coord in coords]) | ||
ymin = min([coord[1] for coord in coords]) | ||
ymax = max([coord[1] for coord in coords]) | ||
boxes.append([xmin, ymin, xmax, ymax]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isaaccorley Had a question about the bounding box representation here. rasterio's index method is used to convert from (lon,lat) to pixel coordinates. However I noticed that this returns negative bbox coordinates for certain samples.
torchvision's draw_bounding_boxes
states that it accepts box coordinates to be w.r.t image i.e. they must satisfy 0 <= xmin < xmax < W
and 0 <= ymin < ymax < H
. Is this not an issue?
I looked into this because I recently tried training an object detector (torchvision's Faster RCNN) on this dataset and the result looked a bit weird. In the docs they mention they need bboxes to satisfy the above constraint so I was wondering if this was the cause of my problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you're right on the coords being swapped. Some of the boxes look transposed and others look like they've been cut off because they probably were outside the bounds of the image.
* add IDTReeS dataset * dataset loads data now * add optional laspy and pandas dependencies * fixed docs failing * format * refactor verify and resample chm/hsi to 200x200 * add open3d optional dep * overhaul * temporarily remove open3d install bc their pypi is broken * mypy fixes * fixes per suggestions * general cleanup * test passing * add min version for laspy and pandas * add open3d dependency * add open3d to mypy tests * add hard install for python 3.9 open3d to actions * attempt microsoft#2 * I think I got it now * updated tests.yaml * make open3d dep require python<3.9 * open3d has issues with macos python 3.6 * same for 3.7 * skip open3d plot test for macos * formatting * skip open3d plot test for windows * update per suggestions * update test data readme for las files * updated per suggestions * more changes per suggestions * last change per suggestion * Grammar fix in pandas dep requirement comment Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
This PR adds the IDTReeS dataset for tree crown detection and classification. Link to paper
torchgeo.datasets.IDTReeS
Some notes on the dataset:
train
split contains both bbox and labels per bbox. Thetest
splittask1
contains only images andtask2
contains bounding boxes but no labels.test
splittask2
images seem to only be partially labeled (missing many boxes) and some images have no boxes entirely. This would cause issues when benchmarking since your false positives may actually be true positives. I'm guessing the assumption when labeling was that you need to extract the ground truth boxes and build a classifier on top of the subset images. Who knows. See below:TODO:
upsample CHM and HSI to same res as RGB (200x200)figure out how to deal with test set being formatted dramatically different than train setwill refactor in future PR if neededadd plot methodUsage (for clarity):
Example images:
Example point cloud:
plot_las_example.mp4
plot_las_example_no_colors.mp4
plot_las_example_with_colors.mp4