Skip to content

Commit

Permalink
Merge pull request #110 from IGNF/workflow-thresolds-optim
Browse files Browse the repository at this point in the history
Workflow for building validation thresholds optimization
  • Loading branch information
leavauchier authored May 13, 2024
2 parents 969019c + f9a51a3 commit fff0149
Show file tree
Hide file tree
Showing 10 changed files with 204 additions and 4 deletions.
92 changes: 92 additions & 0 deletions .github/workflows/building_validation_thresholds_optimization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Workflow name
name: "Building validation thresholds optimization"

on:
# Run workflow on user request
workflow_dispatch:
inputs:
sampling_name:
description: |
Sampling name :
Nom du dataset sur lequel le modèle a été entraîné.
Utilisé pour générer un chemin standard pour les entrées et sorties dans le
dossier IA du store (projet-LHD/IA/LIDAR-PROD-OPTIMIZATION/$SAMPLING_NAME/$MODEL_ID)
Eg. YYYYMMDD_MonBeauDataset
required: true
model_id:
description: |
Identifiant du modèle :
Utilisé pour générer un chemin standard pour les entrées et sorties dans le
dossier IA du store (projet-LHD/IA/LIDAR-PROD-OPTIMIZATION/$SAMPLING_NAME/$MODEL_ID)
Exemple : YYYMMDD_MonBeauSampling_epochXXX_Myria3Dx.y.z
required: true

jobs:
optimize-building-validation-thresholds:
runs-on: self-hosted
env:
WORKDIR: /var/data/LIDAR-PROD-OPTIMIZATION/
IO_DIR: /var/data/LIDAR-PROD-OPTIMIZATION/${{ github.event.inputs.sampling_name }}/${{ github.event.inputs.model_id }}/
DATA: /var/data/LIDAR-PROD-OPTIMIZATION/20221018_lidar-prod-optimization-on-151-proto/Comparison/
THRESHOLDS_FILE: valset-opti-results/optimized_thresholds.yaml
OUTPUT_CONFIG_FILE: LIDAR-PROD-${{ github.event.inputs.model_id }}.yaml
nexus_server: docker-registry.ign.fr

steps:
- name: Log configuration
run: |
echo "Optimize building validation threshold for a given trained model"
echo "Model ID ${{ github.event.inputs.model_id }}"
echo "input/output dir: ${{env.IO_DIR}}"
echo "data: ${{env.DATA}}"
echo "validation input_las_dir: ${{env.IO_DIR}}/preds-valset/"
echo "test input_las_dir: ${{env.IO_DIR}}/preds-testset/"
echo "output thresholds file: ${{env.IO_DIR}}/${{env.THRESHOLDS_FILE}}"
echo "output config file: ${{env.IO_DIR}}/${{env.OUTPUT_CONFIG_FILE}}"
echo "evaluation metrics (on test dataset): ${{env.IO_DIR}}/preds-testset/evaluation.yaml"
- name: Checkout branch
uses: actions/checkout@v4

# get version number, to retrieve the docker image corresponding to the current version
- name: Get version number
run: |
echo "VERSION=$(docker run lidar_prod python -m lidar_prod.version)" >> $GITHUB_ENV
- name: pull docker image tagged with current version
run: |
docker pull ${{ env.nexus_server }}/lidar_hd/lidar_prod:${{ env.VERSION }}
- name: Optimization and evaluation on validation dataset
run: >
docker run --network host
-v ${{env.IO_DIR}}:/io_dir
${{ env.nexus_server }}/lidar_hd/lidar_prod:${{ env.VERSION }}
python lidar_prod/run.py
++task=optimize_building
building_validation.optimization.todo='prepare+optimize+evaluate+update'
building_validation.optimization.paths.input_las_dir=/io_dir/preds-valset/
building_validation.optimization.paths.results_output_dir=/io_dir/valset-opti-results/
building_validation.optimization.paths.output_optimized_config=/io_dir/${{env.OUTPUT_CONFIG_FILE}}
hydra.run.dir=/io_dir/valset-opti-results/
- name: Evaluation on test dataset
run: >
docker run --network=host
-v ${{env.IO_DIR}}:/io_dir
${{ env.nexus_server }}/lidar_hd/lidar_prod:${{ env.VERSION }}
python lidar_prod/run.py
++task=optimize_building
building_validation.optimization.todo='prepare+evaluate+update'
building_validation.optimization.paths.input_las_dir=/io_dir/preds-testset/
building_validation.optimization.paths.results_output_dir=/io_dir/testset-opti-results/
building_validation.optimization.paths.building_validation_thresholds=/io_dir/${{env.THRESHOLDS_FILE}}
building_validation.optimization.paths.evaluation_results_yaml=/io_dir/preds-testset/evaluation.yaml
hydra.run.dir=/io_dir/testset-opti-results/
- name: Log evaluation results on test dataset
run: |
echo "Evaluation results on the test dataset"
echo "The most important metric to inspect is: p_auto (automation proportion)"
echo ""
cat ${{env.IO_DIR}}/preds-testset/evaluation.yaml
4 changes: 4 additions & 0 deletions .github/workflows/cicd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,14 @@ jobs:
- name: push main docker on nexus (tagged with a date)
# we push on nexus an image from the main branch when it has been updated (push or accepted pull request)
# The version is tagged once with version only to make sure to be able to retrieve the last version without
# knowing when it has been published, once with version + date to ensure a unique tag when needed
if: ((github.ref_name == 'main') && (github.event_name == 'push'))
run: |
docker tag lidar_prod $nexus_server/lidar_hd/lidar_prod:${{ env.VERSION }}
docker tag lidar_prod $nexus_server/lidar_hd/lidar_prod:${{ env.VERSION }}-${{ env.DATE }}
docker login $nexus_server --username svc_lidarhd --password ${{ secrets.PASSWORD_SVC_LIDARHD }}
docker push $nexus_server/lidar_hd/lidar_prod:${{ env.VERSION }}
docker push $nexus_server/lidar_hd/lidar_prod:${{ env.VERSION }}-${{ env.DATE }}
- name: push branch docker on nexus (tagged with the branch name)
Expand Down
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# main

Save optimized thresholds as yaml instead of pickle to make it easier to read
### 1.10.3
- Save optimized thresholds as yaml instead of pickle to make it easier to read
- Save updated config file during building validation thresholds optimization

### 1.10.2
- Add support for metadata propagation through compound pdal pipelines:
Expand Down
4 changes: 3 additions & 1 deletion configs/building_validation/optimization/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ paths:
group_info_pickle_path: ${.results_output_dir}/group_info.pickle
prepared_las_dir: ${.results_output_dir}/prepared/
updated_las_dir: ${.results_output_dir}/updated/
building_validation_thresholds: ${.results_output_dir}/optimized_thresholds.yaml # Wher
evaluation_results_yaml: ${.results_output_dir}/evaluation.yaml
building_validation_thresholds: ${.results_output_dir}/optimized_thresholds.yaml
output_optimized_config: ${.results_output_dir}/config_with_optimized_thresholds.yaml

# CLASSIFICATION CODES of a dataset which was inspected
# and labeled post TerraSolid macro
Expand Down
3 changes: 2 additions & 1 deletion docs/source/guides/thresholds_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ python lidar_prod/run.py \
building_validation.optimization.todo='prepare+evaluate+update' \
building_validation.optimization.paths.input_las_dir=[path/to/labelled/test/dataset/] \
building_validation.optimization.paths.results_output_dir=[path/to/save/results] \
building_validation.optimization.paths.building_validation_thresholds=[path/to/optimized_thresholds.yaml]
building_validation.optimization.paths.building_validation_thresholds=[path/to/optimized_thresholds.yaml] \
building_validation.optimization.paths.evaluation_results_yaml=[path/to/saved/metrics.yaml]
```

### Utils
Expand Down
5 changes: 5 additions & 0 deletions lidar_prod/optimization.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ def optimize_building(
config (DictConfig): Hydra config passed from run.py
"""
# Copy config before resolving it to be able so save it unresolved
config_to_save = config.copy()
commons.extras(config)

bvo: BuildingValidationOptimizer = hydra.utils.instantiate(
Expand All @@ -37,6 +39,9 @@ def optimize_building(
bvo.bv.bd_uni_connection_params = bd_uni_connection_params
bvo.run()

# Save output config with updated thresholds
bvo.save_config_with_optimized_thresolds(config_to_save)


def optimize_vegetation(
config: DictConfig,
Expand Down
25 changes: 25 additions & 0 deletions lidar_prod/tasks/building_validation_optimization.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

import numpy as np
import optuna
import yaml
from omegaconf import DictConfig, OmegaConf
from sklearn.metrics import confusion_matrix
from tqdm import tqdm

Expand Down Expand Up @@ -190,6 +192,8 @@ def evaluate(self) -> dict:
mts_gt = np.array([c.target for c in clusters])
metrics_dict = self.evaluate_decisions(mts_gt, decisions)
log.info(f"\n Results:\n{self._get_results_logs_str(metrics_dict)}")
self._save_results_to_yaml(metrics_dict)

return metrics_dict

def _set_thresholds_from_file_if_available(self):
Expand Down Expand Up @@ -526,3 +530,24 @@ def _get_results_logs_str(self, metrics_dict: dict):
+ str(metrics_dict[self.design.metrics.confusion_matrix_norm].round(3))
)
return results_logs

def _save_results_to_yaml(self, metrics_dict: dict):
out_dict = metrics_dict.copy()
for k, v in out_dict.items():
if isinstance(v, np.ndarray):
out_dict[k] = v.tolist()
elif isinstance(v, np.float64):
out_dict[k] = float(v)

if self.paths.evaluation_results_yaml:
with open(self.paths.evaluation_results_yaml, "w") as f:
yaml.safe_dump(out_dict, f)

def save_config_with_optimized_thresolds(self, config: DictConfig):
"""Save config the thresholds in the building_validation.application
part replaced by optimized thresholds"""
if "optimize" in self.todo:
optimized_cfg = config.copy()
optimized_cfg.building_validation.application.thresholds = self.thresholds
out_path = config.building_validation.optimization.paths.output_optimized_config
OmegaConf.save(config=optimized_cfg, f=out_path, resolve=False)
2 changes: 1 addition & 1 deletion lidar_prod/version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "V1.10.2"
__version__ = "V1.10.3"


if __name__ == "__main__":
Expand Down
20 changes: 20 additions & 0 deletions tests/lidar_prod/tasks/test_building_validation_optimization.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import hydra
import numpy as np
import pytest
import yaml

from lidar_prod.tasks.building_validation import thresholds
from lidar_prod.tasks.building_validation_optimization import (
Expand All @@ -28,6 +29,14 @@
TMP_DIR = Path("tmp/lidar_prod/tasks/building_validation_optimization")


def setup_module(module):
try:
shutil.rmtree(TMP_DIR)
except FileNotFoundError:
pass
TMP_DIR.mkdir(parents=True, exist_ok=True)


# Small LAS, for which we optimize thresholds and reach perfect validation,
# to quickly check optimization logic.
LAS_SUBSET_FILE = "tests/files/870000_6618000.subset.postIA.corrected.las"
Expand Down Expand Up @@ -100,6 +109,17 @@ def test_BVOptimization_on_subset(hydra_cfg):
# prepared data and the threshold from previous run
metrics_dict = bvo.evaluate()
print(metrics_dict)

# Check that metrics are correctly saved to file
assert os.path.isfile(bvo.paths.evaluation_results_yaml)
with open(bvo.paths.evaluation_results_yaml, "r") as f:
saved_metrics_dict = yaml.safe_load(f)
for k, v in metrics_dict.items():
if isinstance(v, np.ndarray):
assert saved_metrics_dict[k] == v.tolist()
else:
assert saved_metrics_dict[k] == v

# Assert inclusion
assert SUBSET_EXPECTED_METRICS["exact"].items() <= metrics_dict.items()
# Assert <= with a relative tolerance
Expand Down
49 changes: 49 additions & 0 deletions tests/lidar_prod/test_optimization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import os
import os.path as osp
import shutil
from pathlib import Path

from omegaconf import OmegaConf

from lidar_prod.optimization import optimize_building

TMP_DIR = Path("tmp/lidar_prod/optimization")
LAS_SUBSET_FILE = "tests/files/870000_6618000.subset.postIA.corrected.las"


def setup_module(module):
try:
shutil.rmtree(TMP_DIR)
except FileNotFoundError:
pass
TMP_DIR.mkdir(parents=True, exist_ok=True)


def test_optimize_building_on_subset(hydra_cfg):
out_dir = str(TMP_DIR / "subset")
# Optimization output (thresholds and prepared/updated LASfiles) saved to out_dir
hydra_cfg.building_validation.optimization.paths.results_output_dir = out_dir

# We isolate the input file in a subdir, and prepare it for optimization
input_las_dir = osp.join(out_dir, "inputs/")
hydra_cfg.building_validation.optimization.paths.input_las_dir = input_las_dir
hydra_cfg.building_validation.application.thresholds = "NO THRESHOLDS"
os.makedirs(input_las_dir, exist_ok=False)
src_las_copy_path = osp.join(input_las_dir, "copy.las")
shutil.copy(LAS_SUBSET_FILE, src_las_copy_path)

optimize_building(hydra_cfg)

# Check that the expected outputs are saved successfully
th_yaml = hydra_cfg.building_validation.optimization.paths.building_validation_thresholds
assert os.path.isfile(th_yaml)
cfg_yaml = hydra_cfg.building_validation.optimization.paths.output_optimized_config
assert os.path.isfile(cfg_yaml)

assert os.path.isfile(osp.join(out_dir, "prepared", osp.basename(src_las_copy_path)))
updated_las_path = osp.join(out_dir, "updated", osp.basename(src_las_copy_path))
assert os.path.isfile(updated_las_path)

# Check that thte thresholds are saved correctly in output config file
out_cfg = OmegaConf.load(cfg_yaml)
assert out_cfg.building_validation.application.thresholds != "NO THRESHOLDS"

0 comments on commit fff0149

Please sign in to comment.