Add IDTReeS dataset #201

isaaccorley · 2021-10-17T19:27:42Z

This PR adds the IDTReeS dataset for tree crown detection and classification. Link to paper

Adds torchgeo.datasets.IDTReeS

Some notes on the dataset:

The train split contains both bbox and labels per bbox. The test split task1 contains only images and task2 contains bounding boxes but no labels.
Many of the RGB images appear to have this warped effect. I opened several in QGIS and verified this is in the raw image.
Bounding boxes are very noisy and sometimes contain open ground and other times can contain only a small part of the tree. Maybe it's just not clear to me what part of the image is a "tree crown".
The test split task2 images seem to only be partially labeled (missing many boxes) and some images have no boxes entirely. This would cause issues when benchmarking since your false positives may actually be true positives. I'm guessing the assumption when labeling was that you need to extract the ground truth boxes and build a classifier on top of the subset images. Who knows. See below:

TODO:

tests
~~upsample CHM and HSI to same res as RGB (200x200)~~
~~figure out how to deal with test set being formatted dramatically different than train set~~ will refactor in future PR if needed
~~add plot method~~

Usage (for clarity):

from torchgeo.datasets import IDTReeS

dataset = IDTReeS(root, split="train", task="task1")
sample = ds[0]

for k, v in sample.items():
   print(k, v.shape, v.dtype)

# image torch.Size([3, 200, 200]) torch.uint8     <- RGB image
# hsi torch.Size([369, 200, 200]) torch.int16     <- Hyperspectral image
# chm torch.Size([1, 200, 200]) torch.float32     <- Canopy Height Model (CHM)
# las torch.Size([3, 5739]) torch.float64         <- Point Cloud (x, y, z)
# boxes torch.Size([12, 4]) torch.int64           <- Object Bounding Boxes (xyxy)
# label torch.Size([12]) torch.int64              <- object labels

Example images:

# without predictions
dataset.plot(sample, hsi_indices=(0, 1, 2)) # use 3 hyperspectral indices to make a false color image

# with predictions
sample["prediction_boxes"] = sample["boxes"]
sample["prediction_label"] = sample["label"]
dataset.plot(sample, hsi_indices=(0, 1, 2))

Example point cloud:

# with colormap
dataset.plot_las(index=3, colormap="BrBG")

# no colormap and no colors in .las file
dataset.plot_las(index=3, colormap=None)

# no colormap and colors exist in .las file
dataset.plot_las(index=70, colormap=None)

plot_las_example.mp4

plot_las_example_no_colors.mp4

plot_las_example_with_colors.mp4

adamjstewart · 2021-11-20T23:11:55Z

I'm giving a talk to a group of biologists who use drone imagery for spectral biology on Monday. If this PR is close to being ready it would be great to get this merged.

isaaccorley · 2021-11-21T01:10:44Z

Working on it tonight actually.

setup.cfg

environment.yml

torchgeo/datasets/idtrees.py

isaaccorley · 2021-11-28T04:05:56Z

Ok looks like tests are passing now. Had to skip the lidar plot test for python 3.9 until open3d v0.14 and on windows and mac due to segmentation faults but works fine on ubuntu 3.6-3.8.

adamjstewart

I assume the .las files are manually created random noise and not real data? Can you add a section to tests/data/README.md explaining how to create this data for the next person who needs to?

setup.cfg

tests/datasets/test_idtrees.py

torchgeo/datasets/idtrees.py

adamjstewart

I think my only remaining concerns relate to minimum dependency versions, everything else looks good!

environment.yml

setup.cfg

environment.yml

setup.cfg

adamjstewart · 2021-12-17T16:15:36Z

torchgeo/datasets/idtrees.py

+        else:
+            directory = os.path.join(root, self.task)
+            if self.task == "task1":
+                geoms = None  # type: ignore[assignment]


While debugging the recent mypy issues I noticed a lot of typing errors in this file. For example, the _load function says that it returns Tuple[List[str], Dict[int, Dict[str, Any]], Any], but many of these return values can also be None. The type: ignore here is masking this for geoms, and the Any is masking this for labels, but the real issue is that the function is not typed correctly. @isaaccorley can you fix this? We should be ignoring typing errors very sparingly for issues that are beyond our control (PyTorch is missing type hints for large parts of the library), not for things like this. For the same reason, we should try to avoid Any as it basically turns off all type checking.

I'll refactor but so you know, this dataset has no structure to the nested dictionaries whatsoever so I opted for Dict[str, Any] in most cases. Also some of the methods return Any due to them returning an open3d object or a pandas DataFrame but I'm not able to use those types because I'm not importing them at the top level, only inside the method.

See https://mypy.readthedocs.io/en/stable/runtime_troubles.html, it may be possible to declare the type in a string or comment instead to avoid the top-level import.

Nice, ok I'll submit a PR.

ashnair1 · 2022-07-05T10:00:22Z

torchgeo/datasets/idtrees.py

+                coords = [f.index(x, y) for x, y in geom]
+                xmin = min([coord[0] for coord in coords])
+                xmax = max([coord[0] for coord in coords])
+                ymin = min([coord[1] for coord in coords])
+                ymax = max([coord[1] for coord in coords])
+                boxes.append([xmin, ymin, xmax, ymax])


@isaaccorley Had a question about the bounding box representation here. rasterio's index method is used to convert from (lon,lat) to pixel coordinates. However I noticed that this returns negative bbox coordinates for certain samples.

torchvision's draw_bounding_boxes states that it accepts box coordinates to be w.r.t image i.e. they must satisfy 0 <= xmin < xmax < W and 0 <= ymin < ymax < H. Is this not an issue?

I looked into this because I recently tried training an object detector (torchvision's Faster RCNN) on this dataset and the result looked a bit weird. In the docs they mention they need bboxes to satisfy the above constraint so I was wondering if this was the cause of my problem.

Think there might be an issue.

The following is a comparison of the image - MLBS_5

Left is QGIS (polygons from train_MLBS), right is plot by torchgeo

Seems like you're right on the coords being swapped. Some of the boxes look transposed and others look like they've been cut off because they probably were outside the bounds of the image.

* add IDTReeS dataset * dataset loads data now * add optional laspy and pandas dependencies * fixed docs failing * format * refactor verify and resample chm/hsi to 200x200 * add open3d optional dep * overhaul * temporarily remove open3d install bc their pypi is broken * mypy fixes * fixes per suggestions * general cleanup * test passing * add min version for laspy and pandas * add open3d dependency * add open3d to mypy tests * add hard install for python 3.9 open3d to actions * attempt microsoft#2 * I think I got it now * updated tests.yaml * make open3d dep require python<3.9 * open3d has issues with macos python 3.6 * same for 3.7 * skip open3d plot test for macos * formatting * skip open3d plot test for windows * update per suggestions * update test data readme for las files * updated per suggestions * more changes per suggestions * last change per suggestion * Grammar fix in pandas dep requirement comment Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

isaaccorley assigned calebrob6 and isaaccorley Oct 17, 2021

isaaccorley marked this pull request as draft October 17, 2021 19:27

adamjstewart added the datasets Geospatial or benchmark datasets label Oct 19, 2021

isaaccorley force-pushed the datasets/idtrees branch from 02b26cb to 150cd61 Compare October 27, 2021 01:18

adamjstewart added this to the 0.2.0 milestone Nov 20, 2021

isaaccorley force-pushed the datasets/idtrees branch from 150cd61 to bc02b3f Compare November 21, 2021 05:28

isaaccorley unassigned calebrob6 Nov 21, 2021

isaaccorley requested review from adamjstewart and calebrob6 November 21, 2021 20:59

adamjstewart reviewed Nov 21, 2021

View reviewed changes

isaaccorley marked this pull request as ready for review November 22, 2021 07:02

isaaccorley closed this Nov 22, 2021

isaaccorley reopened this Nov 22, 2021

isaaccorley force-pushed the datasets/idtrees branch 2 times, most recently from 91bfeaf to e2e89d4 Compare November 28, 2021 01:27

adamjstewart reviewed Nov 28, 2021

View reviewed changes

isaaccorley force-pushed the datasets/idtrees branch from 4946516 to 9c922de Compare December 1, 2021 02:38

adamjstewart reviewed Dec 1, 2021

View reviewed changes

environment.yml Outdated Show resolved Hide resolved

setup.cfg Show resolved Hide resolved

setup.cfg Outdated Show resolved Hide resolved

isaaccorley force-pushed the datasets/idtrees branch from 8317b95 to 33ddf1a Compare December 5, 2021 19:30

adamjstewart reviewed Dec 5, 2021

View reviewed changes

environment.yml Outdated Show resolved Hide resolved

adamjstewart reviewed Dec 5, 2021

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

adamjstewart reviewed Dec 5, 2021

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

isaaccorley added 4 commits December 5, 2021 15:09

add IDTReeS dataset

9a96ddf

dataset loads data now

6e3fc0f

add optional laspy and pandas dependencies

6307d23

fixed docs failing

33ac168

isaaccorley added 18 commits December 5, 2021 15:09

add min version for laspy and pandas

9eb5f6b

add open3d dependency

86e912a

add open3d to mypy tests

8134fef

add hard install for python 3.9 open3d to actions

9417629

attempt #2

6c1e8ab

I think I got it now

ad6b9c6

updated tests.yaml

6e8ecfa

make open3d dep require python<3.9

6c14acb

open3d has issues with macos python 3.6

e19e902

same for 3.7

8453866

skip open3d plot test for macos

c3a9bf5

formatting

d5de83b

skip open3d plot test for windows

1e6df71

update per suggestions

365b6af

update test data readme for las files

6df667d

updated per suggestions

d042b75

more changes per suggestions

58a59fe

last change per suggestion

e05cd51

isaaccorley force-pushed the datasets/idtrees branch from a78faf3 to e05cd51 Compare December 5, 2021 21:09

Grammar fix in pandas dep requirement comment

54cfabf

adamjstewart approved these changes Dec 5, 2021

View reviewed changes

adamjstewart enabled auto-merge (squash) December 5, 2021 22:27

adamjstewart merged commit 0434f3c into microsoft:main Dec 5, 2021

isaaccorley deleted the datasets/idtrees branch December 6, 2021 03:56

adamjstewart reviewed Dec 17, 2021

View reviewed changes

adamjstewart added utilities Utilities for working with geospatial data and removed utilities Utilities for working with geospatial data labels Jan 2, 2022

ashnair1 reviewed Jul 5, 2022

View reviewed changes

weiji14 mentioned this pull request Sep 9, 2022

VectorShapesDataset for loading geometries from vector files #458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IDTReeS dataset #201

Add IDTReeS dataset #201

isaaccorley commented Oct 17, 2021 •

edited

Loading

adamjstewart commented Nov 20, 2021

isaaccorley commented Nov 21, 2021

isaaccorley commented Nov 28, 2021

adamjstewart left a comment

adamjstewart left a comment

adamjstewart Dec 17, 2021

isaaccorley Dec 17, 2021 •

edited

Loading

adamjstewart Dec 17, 2021

isaaccorley Dec 17, 2021

ashnair1 Jul 5, 2022 •

edited

Loading

ashnair1 Jul 5, 2022

isaaccorley Jul 5, 2022

Add IDTReeS dataset #201

Add IDTReeS dataset #201

Conversation

isaaccorley commented Oct 17, 2021 • edited Loading

adamjstewart commented Nov 20, 2021

isaaccorley commented Nov 21, 2021

isaaccorley commented Nov 28, 2021

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart Dec 17, 2021

Choose a reason for hiding this comment

isaaccorley Dec 17, 2021 • edited Loading

Choose a reason for hiding this comment

adamjstewart Dec 17, 2021

Choose a reason for hiding this comment

isaaccorley Dec 17, 2021

Choose a reason for hiding this comment

ashnair1 Jul 5, 2022 • edited Loading

Choose a reason for hiding this comment

ashnair1 Jul 5, 2022

Choose a reason for hiding this comment

isaaccorley Jul 5, 2022

Choose a reason for hiding this comment

isaaccorley commented Oct 17, 2021 •

edited

Loading

isaaccorley Dec 17, 2021 •

edited

Loading

ashnair1 Jul 5, 2022 •

edited

Loading