Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IDTReeS dataset #201

Merged
merged 32 commits into from
Dec 5, 2021
Merged

Conversation

isaaccorley
Copy link
Collaborator

@isaaccorley isaaccorley commented Oct 17, 2021

This PR adds the IDTReeS dataset for tree crown detection and classification. Link to paper

  • Adds torchgeo.datasets.IDTReeS

Some notes on the dataset:

  • The train split contains both bbox and labels per bbox. The test split task1 contains only images and task2 contains bounding boxes but no labels.
  • Many of the RGB images appear to have this warped effect. I opened several in QGIS and verified this is in the raw image.
  • Bounding boxes are very noisy and sometimes contain open ground and other times can contain only a small part of the tree. Maybe it's just not clear to me what part of the image is a "tree crown".
  • The test split task2 images seem to only be partially labeled (missing many boxes) and some images have no boxes entirely. This would cause issues when benchmarking since your false positives may actually be true positives. I'm guessing the assumption when labeling was that you need to extract the ground truth boxes and build a classifier on top of the subset images. Who knows. See below:

image
image
image

TODO:

  • tests
  • upsample CHM and HSI to same res as RGB (200x200)
  • figure out how to deal with test set being formatted dramatically different than train set will refactor in future PR if needed
  • add plot method

Usage (for clarity):

from torchgeo.datasets import IDTReeS

dataset = IDTReeS(root, split="train", task="task1")
sample = ds[0]

for k, v in sample.items():
   print(k, v.shape, v.dtype)

# image torch.Size([3, 200, 200]) torch.uint8     <- RGB image
# hsi torch.Size([369, 200, 200]) torch.int16     <- Hyperspectral image
# chm torch.Size([1, 200, 200]) torch.float32     <- Canopy Height Model (CHM)
# las torch.Size([3, 5739]) torch.float64         <- Point Cloud (x, y, z)
# boxes torch.Size([12, 4]) torch.int64           <- Object Bounding Boxes (xyxy)
# label torch.Size([12]) torch.int64              <- object labels

Example images:

# without predictions
dataset.plot(sample, hsi_indices=(0, 1, 2)) # use 3 hyperspectral indices to make a false color image

# with predictions
sample["prediction_boxes"] = sample["boxes"]
sample["prediction_label"] = sample["label"]
dataset.plot(sample, hsi_indices=(0, 1, 2))

sample

sample_with_preds

Example point cloud:

# with colormap
dataset.plot_las(index=3, colormap="BrBG")

# no colormap and no colors in .las file
dataset.plot_las(index=3, colormap=None)

# no colormap and colors exist in .las file
dataset.plot_las(index=70, colormap=None)
plot_las_example.mp4
plot_las_example_no_colors.mp4
plot_las_example_with_colors.mp4

@isaaccorley isaaccorley marked this pull request as draft October 17, 2021 19:27
@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Oct 19, 2021
@adamjstewart adamjstewart added this to the 0.2.0 milestone Nov 20, 2021
@adamjstewart
Copy link
Collaborator

I'm giving a talk to a group of biologists who use drone imagery for spectral biology on Monday. If this PR is close to being ready it would be great to get this merged.

@isaaccorley
Copy link
Collaborator Author

Working on it tonight actually.

setup.cfg Outdated Show resolved Hide resolved
environment.yml Outdated Show resolved Hide resolved
torchgeo/datasets/idtrees.py Show resolved Hide resolved
torchgeo/datasets/idtrees.py Outdated Show resolved Hide resolved
torchgeo/datasets/idtrees.py Show resolved Hide resolved
torchgeo/datasets/idtrees.py Show resolved Hide resolved
@isaaccorley isaaccorley marked this pull request as ready for review November 22, 2021 07:02
@isaaccorley isaaccorley reopened this Nov 22, 2021
@isaaccorley isaaccorley force-pushed the datasets/idtrees branch 2 times, most recently from 91bfeaf to e2e89d4 Compare November 28, 2021 01:27
@isaaccorley
Copy link
Collaborator Author

Ok looks like tests are passing now. Had to skip the lidar plot test for python 3.9 until open3d v0.14 and on windows and mac due to segmentation faults but works fine on ubuntu 3.6-3.8.

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the .las files are manually created random noise and not real data? Can you add a section to tests/data/README.md explaining how to create this data for the next person who needs to?

setup.cfg Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
tests/datasets/test_idtrees.py Show resolved Hide resolved
torchgeo/datasets/idtrees.py Show resolved Hide resolved
torchgeo/datasets/idtrees.py Outdated Show resolved Hide resolved
torchgeo/datasets/idtrees.py Show resolved Hide resolved
torchgeo/datasets/idtrees.py Show resolved Hide resolved
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my only remaining concerns relate to minimum dependency versions, everything else looks good!

environment.yml Outdated Show resolved Hide resolved
setup.cfg Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
environment.yml Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
@adamjstewart adamjstewart enabled auto-merge (squash) December 5, 2021 22:27
@adamjstewart adamjstewart merged commit 0434f3c into microsoft:main Dec 5, 2021
@isaaccorley isaaccorley deleted the datasets/idtrees branch December 6, 2021 03:56
else:
directory = os.path.join(root, self.task)
if self.task == "task1":
geoms = None # type: ignore[assignment]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While debugging the recent mypy issues I noticed a lot of typing errors in this file. For example, the _load function says that it returns Tuple[List[str], Dict[int, Dict[str, Any]], Any], but many of these return values can also be None. The type: ignore here is masking this for geoms, and the Any is masking this for labels, but the real issue is that the function is not typed correctly. @isaaccorley can you fix this? We should be ignoring typing errors very sparingly for issues that are beyond our control (PyTorch is missing type hints for large parts of the library), not for things like this. For the same reason, we should try to avoid Any as it basically turns off all type checking.

Copy link
Collaborator Author

@isaaccorley isaaccorley Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor but so you know, this dataset has no structure to the nested dictionaries whatsoever so I opted for Dict[str, Any] in most cases. Also some of the methods return Any due to them returning an open3d object or a pandas DataFrame but I'm not able to use those types because I'm not importing them at the top level, only inside the method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://mypy.readthedocs.io/en/stable/runtime_troubles.html, it may be possible to declare the type in a string or comment instead to avoid the top-level import.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ok I'll submit a PR.

@adamjstewart adamjstewart added utilities Utilities for working with geospatial data and removed utilities Utilities for working with geospatial data labels Jan 2, 2022
Comment on lines +290 to +295
coords = [f.index(x, y) for x, y in geom]
xmin = min([coord[0] for coord in coords])
xmax = max([coord[0] for coord in coords])
ymin = min([coord[1] for coord in coords])
ymax = max([coord[1] for coord in coords])
boxes.append([xmin, ymin, xmax, ymax])
Copy link
Collaborator

@ashnair1 ashnair1 Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isaaccorley Had a question about the bounding box representation here. rasterio's index method is used to convert from (lon,lat) to pixel coordinates. However I noticed that this returns negative bbox coordinates for certain samples.

torchvision's draw_bounding_boxes states that it accepts box coordinates to be w.r.t image i.e. they must satisfy 0 <= xmin < xmax < W and 0 <= ymin < ymax < H. Is this not an issue?

I looked into this because I recently tried training an object detector (torchvision's Faster RCNN) on this dataset and the result looked a bit weird. In the docs they mention they need bboxes to satisfy the above constraint so I was wondering if this was the cause of my problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think there might be an issue.

The following is a comparison of the image - MLBS_5

Left is QGIS (polygons from train_MLBS), right is plot by torchgeo

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you're right on the coords being swapped. Some of the boxes look transposed and others look like they've been cut off because they probably were outside the bounds of the image.

yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* add IDTReeS dataset

* dataset loads data now

* add optional laspy and pandas dependencies

* fixed docs failing

* format

* refactor verify and resample chm/hsi to 200x200

* add open3d optional dep

* overhaul

* temporarily remove open3d install bc their pypi is broken

* mypy fixes

* fixes per suggestions

* general cleanup

* test passing

* add min version for laspy and pandas

* add open3d dependency

* add open3d to mypy tests

* add hard install for python 3.9 open3d to actions

* attempt microsoft#2

* I think I got it now

* updated tests.yaml

* make open3d dep require python<3.9

* open3d has issues with macos python 3.6

* same for 3.7

* skip open3d plot test for macos

* formatting

* skip open3d plot test for windows

* update per suggestions

* update test data readme for las files

* updated per suggestions

* more changes per suggestions

* last change per suggestion

* Grammar fix in pandas dep requirement comment

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants