Skip to content

Commit

Permalink
Merge branch 'develop' into hydra
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaRom authored May 16, 2024
2 parents 7bbad2f + 41c7710 commit b151a79
Show file tree
Hide file tree
Showing 23 changed files with 404 additions and 113 deletions.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,37 @@ In **Geo-Deep-Learning**, the learning process comprises two broad stages: tilin
## **Requirement**
This project comprises a set of commands to be run at a shell command prompt. Examples used here are for a bash shell in an Ubuntu GNU/Linux environment.

- [Python 3.9](https://www.python.org/downloads/release/python-390/), see the full list of dependencies in [environment.yml](environment.yml)
- [Python 3.10](https://www.python.org/downloads/release/python-3100/), see the full list of dependencies in [environment.yml](environment.yml)
- [hydra](https://hydra.cc/docs/intro/)
- [mlflow](https://mlflow.org/)
- [miniconda](https://docs.conda.io/en/latest/miniconda.html) (highly recommended)
- nvidia GPU (highly recommended)

## **Installation**
Miniconda is suggested as the package manager for GDL. However, users are advised to [switch to libmamba](https://github.com/NRCan/geo-deep-learning#quickstart-with-conda) as conda's default solver or to __directly use mamba__ instead of conda if they are facing extended installation time or other issues. Additional problems are grouped in the [troubleshooting section](https://github.com/NRCan/geo-deep-learning#troubleshooting). If issues persist, users are encouraged to open a new issue for assistance.

> Tested on Ubuntu 20.04, Windows 10 and WSL 2.
### Quickstart with conda
To execute scripts in this project, first create and activate your python environment with the following commands:
```shell
$ conda env create -f environment.yml
$ conda activate geo_deep_env
```
> Tested on Ubuntu 20.04 and Windows 10 using miniconda.
>

### Change conda's default solver for faster install (__Optional__)
```shell
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
```

### Troubleshooting
- *ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found*
- Export path to library or set it permenantly in your .bashrc file (example with conda) :
```bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
```

## **How to use?**
This is an example of how to run GDL with hydra in simple steps with the _**massachusetts buildings**_ dataset in the `tests/data/` folder, for segmentation on buildings:

Expand Down
1 change: 1 addition & 0 deletions config/training/default_training.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ training:
max_used_perc:
state_dict_path:
state_dict_strict_load: True
script_model: False
compute_sampler_weights: False

# precision: 16
Expand Down
178 changes: 114 additions & 64 deletions dataset/create_dataset.py
Original file line number Diff line number Diff line change
@@ -1,119 +1,169 @@
import numpy as np
from pathlib import Path
from typing import Any, Dict, cast
import sys
from pathlib import Path
from typing import Any, Dict, List, cast

from rasterio.windows import from_bounds
import kornia as K
import numpy as np
import pandas as pd
import rasterio
import torch
from affine import Affine
from osgeo import ogr
# These two import statements prevent exception when using eval(metadata) in SegmentationDataset()'s __init__()
from rasterio.crs import CRS
from rasterio.io import DatasetReader
from rasterio.plot import reshape_as_image
from rasterio.vrt import WarpedVRT
from rasterio.windows import from_bounds
from torch.utils.data import Dataset
from torchgeo.datasets import GeoDataset
from rasterio.vrt import WarpedVRT
from torchgeo.datasets.utils import BoundingBox
import torch
from osgeo import ogr

from utils.logger import get_logger

# These two import statements prevent exception when using eval(metadata) in SegmentationDataset()'s __init__()
from rasterio.crs import CRS
from affine import Affine

# Set the logging file
logging = get_logger(__name__) # import logging


def append_to_dataset(dataset, sample):
"""
Append a new sample to a provided dataset. The dataset has to be expanded before we can add value to it.
:param dataset:
:param sample: data to append
:return: Index of the newly added sample.
"""
old_size = dataset.shape[0] # this function always appends samples on the first axis
dataset.resize(old_size + 1, axis=0)
dataset[old_size, ...] = sample
return old_size


class SegmentationDataset(Dataset):
"""Semantic segmentation dataset based on input csvs listing pairs of imagery and ground truth patches as .tif."""
"""Semantic segmentation dataset based on input csvs listing pairs of imagery and ground truth patches as .tif.
Args:
dataset_list_path (str): The path to the dataset list file.
num_bands (int): The number of bands in the imagery.
dontcare (Optional[int]): The value to be ignored in the label.
max_sample_count (Optional[int]): The maximum number of samples to load from the dataset.
radiom_transform (Optional[Callable]): The radiometric transform function to be applied to the samples.
geom_transform (Optional[Callable]): The geometric transform function to be applied to the samples.
totensor_transform (Optional[Callable]): The transform function to convert samples to tensors.
debug (bool): Whether to enable debug mode.
Attributes:
max_sample_count (int): The maximum number of samples to load from the dataset.
num_bands (int): The number of bands in the imagery.
radiom_transform (Optional[Callable]): The radiometric transform function to be applied to the samples.
geom_transform (Optional[Callable]): The geometric transform function to be applied to the samples.
totensor_transform (Optional[Callable]): The transform function to convert samples to tensors.
debug (bool): Whether debug mode is enabled.
dontcare (Optional[int]): The value to be ignored in the label.
list_path (str): The path to the dataset list file.
assets (List[Dict[str, str]]): The list of filepaths to images and labels.
"""

def __init__(self,
dataset_list_path,
dataset_type,
num_bands,
dontcare=None,
max_sample_count=None,
radiom_transform=None,
geom_transform=None,
totensor_transform=None,
debug=False):
# note: if 'max_sample_count' is None, then it will be read from the dataset at runtime
self.max_sample_count = max_sample_count
self.dataset_type = dataset_type
self.num_bands = num_bands
self.radiom_transform = radiom_transform
self.geom_transform = geom_transform
self.totensor_transform = totensor_transform
self.debug = debug
self.dontcare = dontcare
self.list_path = dataset_list_path
self.parent_folder = Path(self.list_path).parent

if not Path(self.list_path).is_file():
logging.error(f"Couldn't locate dataset list file: {self.list_path}.\n"
f"If purposely omitting test set, this error can be ignored")
self.max_sample_count = 0
else:
with open(self.list_path, 'r') as datafile:
datalist = datafile.readlines()
if self.max_sample_count is None:
self.max_sample_count = len(datalist)

self.assets = self._load_data()

def __len__(self):
return self.max_sample_count

return len(self.assets)
def __getitem__(self, index):
with open(self.list_path, 'r') as datafile:
datalist = datafile.readlines()
data_line = datalist[index]
with rasterio.open(data_line.split(';')[0], 'r') as sat_handle:
sat_img = reshape_as_image(sat_handle.read())
metadata = sat_handle.meta
with rasterio.open(data_line.split(';')[1].rstrip('\n'), 'r') as label_handle:
map_img = reshape_as_image(label_handle.read())
map_img = map_img[..., 0]

assert self.num_bands <= sat_img.shape[-1]

if isinstance(metadata, np.ndarray) and len(metadata) == 1:
metadata = metadata[0]
elif isinstance(metadata, bytes):
metadata = metadata.decode('UTF-8')
try:
metadata = eval(metadata)
except TypeError:
pass


sat_img, metadata = self._load_image(index)
map_img = self._load_label(index)

if isinstance(metadata, np.ndarray) and len(metadata) == 1:
metadata = metadata[0]
elif isinstance(metadata, bytes):
metadata = metadata.decode('UTF-8')
try:
metadata = eval(metadata)
except TypeError:
pass

sample = {"image": sat_img, "mask": map_img, "metadata": metadata, "list_path": self.list_path}

if self.radiom_transform: # radiometric transforms should always precede geometric ones
# radiometric transforms should always precede geometric ones
if self.radiom_transform:
sample = self.radiom_transform(sample)
if self.geom_transform: # rotation, geometric scaling, flip and crop. Will also put channels first and convert to torch tensor from numpy.
# rotation, geometric scaling, flip and crop.
# Will also put channels first and convert to torch tensor from numpy.
if self.geom_transform:
sample = self.geom_transform(sample)

sample = self.totensor_transform(sample)
if self.totensor_transform:
sample = self.totensor_transform(sample)

if self.debug:
# assert no new class values in map_img
initial_class_ids = set(np.unique(map_img))
final_class_ids = set(np.unique(sample["mask"].numpy()))
if self.dontcare is not None:
initial_class_ids.add(self.dontcare)
final_class_ids = set(np.unique(sample['mask'].numpy()))
if not final_class_ids.issubset(initial_class_ids):
logging.debug(f"WARNING: Class ids for label before and after augmentations don't match. "
f"Ignore if overwritting ignore_index in ToTensorTarget")
logging.warning(f"\nWARNING: Class values for label before and after augmentations don't match."
f"\nUnique values before: {initial_class_ids}"
f"\nUnique values after: {final_class_ids}"
f"\nIgnore if some augmentations have padded with dontcare value.")
sample['index'] = index

return sample

def _load_data(self) -> List[str]:
"""Load the filepaths to images and labels
Returns:
List[str]: a list of filepaths to train/test data
"""
df = pd.read_csv(self.list_path, sep=';', header=None, usecols=[i for i in range(2)])
assets = [{"image": x, "label": y} for x, y in zip(df[0], df[1])]

return assets

def _load_image(self, index: int):
""" Load image
Args:
index: poosition of image
Returns:
image array and metadata
"""
image_path = self.parent_folder.joinpath(self.assets[index]["image"])
with rasterio.open(image_path, 'r') as image_handle:
image = reshape_as_image(image_handle.read())
metadata = image_handle.meta
assert self.num_bands <= image.shape[-1]

return image, metadata

def _load_label(self, index: int):
""" Load label
Args:
index: poosition of label
Returns:
label array and metadata
"""
label_path = self.parent_folder.joinpath(self.assets[index]["label"])
with rasterio.open(label_path, 'r') as label_handle:
label = reshape_as_image(label_handle.read())
label = label[..., 0]

return label


class DRDataset(GeoDataset):
def __init__(self, dr_ds: DatasetReader) -> None:
Expand Down
13 changes: 13 additions & 0 deletions docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: "2"

build:
os: "ubuntu-22.04"
tools:
python: "3.10"

python:
install:
- requirements: docs/requirements.txt

sphinx:
configuration: docs/source/conf.py
Binary file added docs/img/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sphinx==7.1.2
sphinx-rtd-theme==1.3.0rc1
39 changes: 27 additions & 12 deletions docs/source/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@
Dataset
+++++++

The dataset configuration can be found :ref:`here <configurationdefaultparam>`, this file contain
specific information about input data parameters for the execution of your command. The documentation
on the parameters use is explain in the :ref:`yaml <yamlparameters>` section.
The dataset configuration defines the data (images, ground truth) and their parameters. The documentation
on the parameters used is explained in the :ref:`yaml <yamlparameters>` section.

The sampling and inference steps requires a csv referencing input data. An example of input csv for
The tiling and inference steps requires a csv referencing input data. An example of input csv for
massachusetts buildings dataset can be found in
`tests <https://github.com/NRCan/geo-deep-learning/blob/develop/tests/tiling/tiling_segmentation_binary_ci.csv>`_.
Each row of this csv is considered, in geo-deep-learning terms, to be an
Expand Down Expand Up @@ -46,7 +45,22 @@ Dataset splits

Split in csv should be either "trn", "tst" or "inference". The tiling script outputs lists of
patches for "trn", "val" and "tst" and these lists are used as is during training.
Its proportion is set by the :ref:`tiling config <datatiling>`.
Its proportion is set by the :ref:`tiling config <datatiling>`.

AOI
---
An AOI is defined as an image (single imagery scene or mosaic), its content and metadata and the associated ground truth vector (optional).

.. note::

AOI without ground truth vector can only be used for inference purposes.


The AOI's implementation in the code is as follow:

.. autoclass:: dataset.aoi.AOI
:members:
:special-members:

Raster and vector file compatibility
------------------------------------
Expand All @@ -66,15 +80,15 @@ Remote sensing is known to deal with raster files from a wide variety of formats
To provide as much
flexibility as possible with variable input formats for raster data, geo-deep-learning supports:

#. Multi-band raster files, to be used as is (all bands needed, all bands is expected order)
#. Multi-band raster files with more bands than needed (e.g. Actual is "BGRN", needed is "BGR")
#. Multi-band raster files with bands in different order than needed (e.g. Actual is "BGR", needed is "RGB")
#. Single-band raster files, identified with a common string pattern (see details below)
#. Single-band raster files, identified as assets in a stac item (see details below)
#. :ref:`Multi-band raster files, used as is <datasetmultiband>` (all bands needed, all bands is in the expected order)
#. :ref:`Multi-band raster files with more bands or different order than needed <datasetmultibandmorebands>` (e.g. Actual is "BGRN", needed is "BGR" OR Actual is "BGR", needed is "RGB")
#. :ref:`Single-band raster files, identified with a common string pattern <datasetsingleband>` (see details below)
#. :ref:`Single-band raster files, identified as assets in a stac item <datasetstacitem>` (see details below)

To support these variable inputs, geo-deep-learning expects the first column of an input csv to be in the
following formats.

.. _datasetmultiband:
Multi-band raster files, used as is
====================================

Expand All @@ -87,7 +101,8 @@ This is the default and basic use.
- ...
* - my_dir/my_multiband_geofile.tif
- ...


.. _datasetmultibandmorebands:
Multi-band raster files with more bands or different order than needed
======================================================================

Expand Down Expand Up @@ -116,7 +131,7 @@ The ``bands`` parameter is set in the
indexed from 1
(`docs <https://rasterio.readthedocs.io/en/latest/quickstart.html#reading-raster-data>`_).


.. _datasetsingleband:
Single-band raster files, identified with a common string pattern
=================================================================

Expand Down
Loading

0 comments on commit b151a79

Please sign in to comment.