Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CanadianBuildingFootprints dataset #69

Merged
merged 8 commits into from
Aug 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# Do not change line endings on test data, it will change the MD5
/tests/data/** binary
/tests/data/*/** binary
2 changes: 1 addition & 1 deletion .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
if: ${{ runner.os == 'Windows' }}
- name: Install conda dependencies (Windows)
run: |
conda install h5py 'rasterio>=1.0'
conda install fiona h5py 'rasterio>=1.0'
conda list
conda info
if: ${{ runner.os == 'Windows' }}
Expand Down
5 changes: 5 additions & 0 deletions docs/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ Geospatial Datasets

:class:`GeoDataset` is designed for datasets that contain geospatial information, like latitude, longitude, coordinate system, and projection. Datasets containing this kind of information can be combined using :class:`ZipDataset`.

Canadian Building Footprints
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: CanadianBuildingFootprints

Chesapeake Bay High-Resolution Land Cover Project
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
276 changes: 276 additions & 0 deletions docs/notebooks/Canadian Building Footprints Dataset.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ channels:
- conda-forge
dependencies:
- cudatoolkit
- fiona
- h5py
- numpy
- pip
Expand All @@ -17,6 +18,7 @@ dependencies:
- black[colorama]>=21b
- flake8
- isort[colors]>=4.3.5
- jupyterlab
- mypy>=0.900
- omegaconf
- opencv-python
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
affine
black[colorama]>=21b
fiona
flake8
h5py
isort[colors]>=4.3.5
jupyterlab
matplotlib
mypy>=0.900
numpy
Expand Down
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ setup_requires =
setuptools>=42
install_requires =
affine
fiona
matplotlib
numpy
pillow
Expand All @@ -50,6 +51,7 @@ datasets =

# Optional developer requirements
docs =
jupyterlab
sphinx
pydocstyle[toml]>=6.1
pytorch-sphinx-theme
Expand Down
2 changes: 2 additions & 0 deletions spack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ spack:
- "python@3.7:+bz2"
- py-affine
- "py-black@21:+colorama"
- py-fiona
- py-flake8
- py-h5py
- "py-isort@4.3.5:+colors"
- py-jupyterlab
- py-matplotlib
- "py-mypy@0.900:"
- py-numpy
Expand Down
20 changes: 18 additions & 2 deletions tests/data/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
This directory contains fake data used to test torchgeo. Depending on the type of dataset, fake data can be created in one of two ways:
This directory contains fake data used to test torchgeo. Depending on the type of dataset, fake data can be created in multiple ways:

## GeoDataset

GeoDataset data can be created like so. We first open an existing data example and use it to copy the driver/CRS/transform to the fake data.

### Raster data

```python
import os

Expand All @@ -25,7 +27,21 @@ cmap = src.colormap(1)
dst.write_colormap(1, cmap)
```

If the dataset expects multiple files, you can simply copy and rename the file you created.
### Vector data

```python
import os

import fiona

ROOT = "/mnt/blobfuse/adam-scratch/cbf"
FILENAME = "Ontario.geojson"

src = fiona.open(os.path.join(ROOT, FILENAME))
dst = fiona.open(FILENAME, "w", **src.meta)
rec = next(iter(src))
dst.write(rec)
```

## VisionDataset

Expand Down
Binary file added tests/data/cbf/Alberta.zip
Binary file not shown.
Binary file added tests/data/cbf/BritishColumbia.zip
Binary file not shown.
Binary file added tests/data/cbf/Manitoba.zip
Binary file not shown.
Binary file added tests/data/cbf/NewBrunswick.zip
Binary file not shown.
Binary file added tests/data/cbf/NewfoundlandAndLabrador.zip
Binary file not shown.
Binary file added tests/data/cbf/NorthwestTerritories.zip
Binary file not shown.
Binary file added tests/data/cbf/NovaScotia.zip
Binary file not shown.
Binary file added tests/data/cbf/Nunavut.zip
Binary file not shown.
Binary file added tests/data/cbf/Ontario.zip
Binary file not shown.
Binary file added tests/data/cbf/PrinceEdwardIsland.zip
Binary file not shown.
Binary file added tests/data/cbf/Quebec.zip
Binary file not shown.
Binary file added tests/data/cbf/Saskatchewan.zip
Binary file not shown.
Binary file added tests/data/cbf/YukonTerritory.zip
Binary file not shown.
92 changes: 92 additions & 0 deletions tests/datasets/test_cbf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import os
import shutil
from pathlib import Path
from typing import Generator

import matplotlib.pyplot as plt
import pytest
import torch
from _pytest.fixtures import SubRequest
from pytest import MonkeyPatch
from rasterio.crs import CRS

import torchgeo.datasets.utils
from torchgeo.datasets import BoundingBox, CanadianBuildingFootprints, ZipDataset
from torchgeo.transforms import Identity


def download_url(url: str, root: str, *args: str) -> None:
shutil.copy(url, root)


class TestCanadianBuildingFootprints:
@pytest.fixture
def dataset(
self,
monkeypatch: Generator[MonkeyPatch, None, None],
tmp_path: Path,
request: SubRequest,
) -> CanadianBuildingFootprints:
monkeypatch.setattr( # type: ignore[attr-defined]
torchgeo.datasets.utils, "download_url", download_url
)
md5s = [
"aef9a3deb3297f225d6cdb221cb48527",
"2b7872c4121116fda8f96490daf89d29",
"c71ded923e22569b62b00da2d2a61076",
"75a8f652531790c3c3aefc0655400d6d",
"89ff9c6257efa99365a8b709dde9579b",
"d4d6a36ed834df5cbf5254effca78a4d",
"cce85f6183427e3034704cf35919c985",
"0149c7ec5101c0309c79b7e695dcb394",
"b05216155725f48937804371b945f8ae",
"72d0e6d7196345ca520c825697cc4947",
"77e1c6c71ff0efbdd221b7e7d4a5f2df",
"86e32374f068c7bbb76aa81af0736733",
"5e453a3426b0bb986b2837b85e8b8850",
]
monkeypatch.setattr( # type: ignore[attr-defined]
CanadianBuildingFootprints, "md5s", md5s
)
url = os.path.join("tests", "data", "cbf") + os.sep
monkeypatch.setattr( # type: ignore[attr-defined]
CanadianBuildingFootprints, "url", url
)
monkeypatch.setattr( # type: ignore[attr-defined]
plt, "show", lambda *args: None
)
(tmp_path / "cbf").mkdir()
root = str(tmp_path)
transforms = Identity()
return CanadianBuildingFootprints(
root, transforms=transforms, download=True, checksum=True
)

def test_getitem(self, dataset: CanadianBuildingFootprints) -> None:
x = dataset[dataset.bounds]
assert isinstance(x, dict)
assert isinstance(x["crs"], CRS)
assert isinstance(x["masks"], torch.Tensor)

def test_add(self, dataset: CanadianBuildingFootprints) -> None:
ds = dataset + dataset
assert isinstance(ds, ZipDataset)

def test_already_downloaded(self, dataset: CanadianBuildingFootprints) -> None:
CanadianBuildingFootprints(root=dataset.root, download=True)

def test_plot(self, dataset: CanadianBuildingFootprints) -> None:
query = dataset.bounds
x = dataset[query]
dataset.plot(x["masks"])

def test_not_downloaded(self, tmp_path: Path) -> None:
with pytest.raises(RuntimeError, match="Dataset not found or corrupted."):
CanadianBuildingFootprints(str(tmp_path))

def test_invalid_query(self, dataset: CanadianBuildingFootprints) -> None:
query = BoundingBox(0, 0, 0, 0, 0, 0)
with pytest.raises(
IndexError, match="query: .* is not within bounds of the index:"
):
dataset[query]
2 changes: 2 additions & 0 deletions torchgeo/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""TorchGeo datasets."""

from .benin_cashews import BeninSmallHolderCashews
from .cbf import CanadianBuildingFootprints
from .cdl import CDL
from .chesapeake import (
Chesapeake,
Expand Down Expand Up @@ -42,6 +43,7 @@
__all__ = (
"BeninSmallHolderCashews",
"BoundingBox",
"CanadianBuildingFootprints",
"CDL",
"collate_dict",
"Chesapeake",
Expand Down
Loading