Implement optuna hyperparameter optimization #278

remtav · 2022-02-11T15:24:51Z

…on (add returns to train_segmentation.py and run_gdl())

metrics.py: add try/except if missing key
model_choice.py: temporarily raise gpu threshold to 100% until issue #246 is addressed
train_segmentation.py:

fix bug for dontcare value in metrics
always calculate metrics at test time
inference_segmentation.py: warn if inference has only background values
test hyperparameter optimization in CI with github actions

closes #243

…on (add returns to train_segmentation.py and run_gdl()) metrics.py: add try/except if missing key model_choice.py: temporarily raise gpu threshold to 100% until issue NRCan#246 is addressed train_segmentation.py: - fix bug for dontcare value in metrics - always calculate metrics at test time inference_segmentation.py: warn if inference has only background values test hyperparameter optimization in CI with github actions

remtav · 2022-03-16T15:15:32Z

models/model_choice.py

@@ -244,7 +244,7 @@ def net(model_name: str,
        else:
            checkpoint = None
        # list of GPU devices that are available and unused. If no GPUs, returns empty list
-        gpu_devices_dict = get_device_ids(num_devices)
+        gpu_devices_dict = get_device_ids(num_devices, max_used_perc=100, max_used_ram_perc=100)  # FIXME: set back to default after issue #246


the gpus don't empty between each run, so gpu usage threshold must be 100% otherwise each training after the first one runs on cpu (ie gpus get excluded)

Or get_device_ids should be called once, before all runs.

remtav · 2022-03-16T18:34:20Z

train_segmentation.py

@@ -432,12 +435,13 @@ def evaluation(eval_loader,
                    a, segmentation = torch.max(outputs_flatten, dim=1)
                    eval_metrics = iou(segmentation, labels_flatten, batch_size, num_classes, eval_metrics)
                    eval_metrics = report_classification(segmentation, labels_flatten, batch_size, eval_metrics,
-                                                         ignore_index=eval_loader.dataset.dontcare)
-            elif (dataset == 'tst') and (batch_metrics is not None):
+                                                         ignore_index=dontcare)


fixes missing attribute bug. Dontcare value was previously stored in dataloader object, but not anymore (since when?)

remtav · 2022-03-16T18:34:55Z

train_segmentation.py

-            elif (dataset == 'tst') and (batch_metrics is not None):
+                                                         ignore_index=dontcare)
+            elif dataset == 'tst':
+                batch_metrics = True


force metrics at test time. I don't see any reason why we shouldn'y systematically output metrics at test time.

# Conflicts: # gdl_hyperopt_template.py # train_segmentation.py

mpelchat04 · 2022-03-17T12:24:09Z

models/model_choice.py

@@ -244,7 +244,7 @@ def net(model_name: str,
        else:
            checkpoint = None
        # list of GPU devices that are available and unused. If no GPUs, returns empty list
-        gpu_devices_dict = get_device_ids(num_devices)
+        gpu_devices_dict = get_device_ids(num_devices, max_used_perc=100, max_used_ram_perc=100)  # FIXME: set back to default after issue #246


Or get_device_ids should be called once, before all runs.

valhassan

Can we get a mini tutorial on how optuna works?

inception: unrelevant without classification ternausnet: unused fcn_resnet101: unadapted for >3 bands

…ce.py to DeepLabV3_dualhead class (NRCan#290) * replace torchvision's version of deeplabv3 to smp's version (>3 band implementation, reuse pretrained weights, etc.) encapsulate dual head deeplabv3 to deeplabv3_dualhead.py create temporary unit test in model_choice.py fcn_resnet101: unadapted for >3 bands for baseline deeplabv3, use smp's version as is. * oups, forgot to commit deeplabv3_dualhead.py

- set gpu max ram threshold to 100% only when using optuna (don't hardcode in model_choice.py)

CharlesAuthier · 2022-03-18T14:27:15Z

config/model/deeplabv3_pretrained.yaml

@@ -1,5 +1,5 @@
 # @package _global_
 model:
-  model_name: deeplabv3_resnet101
+  model_name: deeplabv3_pretrained


I think that keeping the encoder in the name can be nice

CharlesAuthier

When it will pass the test I will approve

remtav · 2022-09-12T18:30:02Z

It seems geo-deep-learning will rely more and more on pytorch lightning (PL) and torchgeo (TG) for training. The hyperparameter optimization add-on is not a priority until the transition to PL and TG is cleared.

remtav mentioned this pull request Feb 11, 2022

Housecleaning #270

Merged

remtav commented Mar 16, 2022

View reviewed changes

critical log when missing key for metrics.py

9105692

remtav marked this pull request as ready for review March 16, 2022 18:36

Merge branch 'develop' into 243-hyperparam-opt

d200aac

# Conflicts: # gdl_hyperopt_template.py # train_segmentation.py

mpelchat04 previously approved these changes Mar 17, 2022

View reviewed changes

valhassan previously approved these changes Mar 17, 2022

View reviewed changes

remtav and others added 3 commits March 17, 2022 14:47

remove models: (NRCan#289)

7112c4a

inception: unrelevant without classification ternausnet: unused fcn_resnet101: unadapted for >3 bands

- add link to optuna-with-hydra documentation

10ddc7a

- set gpu max ram threshold to 100% only when using optuna (don't hardcode in model_choice.py)

remtav dismissed stale reviews from valhassan and mpelchat04 via 10ddc7a March 17, 2022 18:48

remtav added 2 commits March 17, 2022 15:55

bugfix for model name with optuna

af72927

another bugfix for model name with optuna

1c5732c

CharlesAuthier reviewed Mar 18, 2022

View reviewed changes

Merge branch 'develop' into 243-hyperparam-opt

ec70859

CharlesAuthier previously approved these changes Mar 21, 2022

View reviewed changes

CharlesAuthier reviewed Mar 21, 2022

View reviewed changes

bugfix for github actions

c882445

remtav dismissed CharlesAuthier’s stale review via c882445 March 21, 2022 18:37

model_choice.py: remove if __name__ == '__main__'. Already in unit tests

72bdd6b

remtav closed this Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement optuna hyperparameter optimization #278

Implement optuna hyperparameter optimization #278

remtav commented Feb 11, 2022

remtav Mar 16, 2022

mpelchat04 Mar 17, 2022

remtav Mar 16, 2022

remtav Mar 16, 2022

mpelchat04 Mar 17, 2022

valhassan left a comment

CharlesAuthier Mar 18, 2022

CharlesAuthier left a comment

remtav commented Sep 12, 2022

Implement optuna hyperparameter optimization #278

Implement optuna hyperparameter optimization #278

Conversation

remtav commented Feb 11, 2022

remtav Mar 16, 2022

Choose a reason for hiding this comment

mpelchat04 Mar 17, 2022

Choose a reason for hiding this comment

remtav Mar 16, 2022

Choose a reason for hiding this comment

remtav Mar 16, 2022

Choose a reason for hiding this comment

mpelchat04 Mar 17, 2022

Choose a reason for hiding this comment

valhassan left a comment

Choose a reason for hiding this comment

CharlesAuthier Mar 18, 2022

Choose a reason for hiding this comment

CharlesAuthier left a comment

Choose a reason for hiding this comment

remtav commented Sep 12, 2022