Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optuna hyperparameter optimization #278

Closed
wants to merge 11 commits into from

Conversation

remtav
Copy link
Collaborator

@remtav remtav commented Feb 11, 2022

…on (add returns to train_segmentation.py and run_gdl())

metrics.py: add try/except if missing key
model_choice.py: temporarily raise gpu threshold to 100% until issue #246 is addressed
train_segmentation.py:

  • fix bug for dontcare value in metrics
  • always calculate metrics at test time
    inference_segmentation.py: warn if inference has only background values
    test hyperparameter optimization in CI with github actions

closes #243

…on (add returns to train_segmentation.py and run_gdl())

metrics.py: add try/except if missing key
model_choice.py: temporarily raise gpu threshold to 100% until issue NRCan#246 is addressed
train_segmentation.py:
- fix bug for dontcare value in metrics
- always calculate metrics at test time
inference_segmentation.py: warn if inference has only background values
test hyperparameter optimization in CI with github actions
@remtav remtav mentioned this pull request Feb 11, 2022
@@ -244,7 +244,7 @@ def net(model_name: str,
else:
checkpoint = None
# list of GPU devices that are available and unused. If no GPUs, returns empty list
gpu_devices_dict = get_device_ids(num_devices)
gpu_devices_dict = get_device_ids(num_devices, max_used_perc=100, max_used_ram_perc=100) # FIXME: set back to default after issue #246
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gpus don't empty between each run, so gpu usage threshold must be 100% otherwise each training after the first one runs on cpu (ie gpus get excluded)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or get_device_ids should be called once, before all runs.

@@ -432,12 +435,13 @@ def evaluation(eval_loader,
a, segmentation = torch.max(outputs_flatten, dim=1)
eval_metrics = iou(segmentation, labels_flatten, batch_size, num_classes, eval_metrics)
eval_metrics = report_classification(segmentation, labels_flatten, batch_size, eval_metrics,
ignore_index=eval_loader.dataset.dontcare)
elif (dataset == 'tst') and (batch_metrics is not None):
ignore_index=dontcare)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes missing attribute bug. Dontcare value was previously stored in dataloader object, but not anymore (since when?)

elif (dataset == 'tst') and (batch_metrics is not None):
ignore_index=dontcare)
elif dataset == 'tst':
batch_metrics = True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force metrics at test time. I don't see any reason why we shouldn'y systematically output metrics at test time.

@remtav remtav marked this pull request as ready for review March 16, 2022 18:36
# Conflicts:
#	gdl_hyperopt_template.py
#	train_segmentation.py
mpelchat04
mpelchat04 previously approved these changes Mar 17, 2022
@@ -244,7 +244,7 @@ def net(model_name: str,
else:
checkpoint = None
# list of GPU devices that are available and unused. If no GPUs, returns empty list
gpu_devices_dict = get_device_ids(num_devices)
gpu_devices_dict = get_device_ids(num_devices, max_used_perc=100, max_used_ram_perc=100) # FIXME: set back to default after issue #246
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or get_device_ids should be called once, before all runs.

valhassan
valhassan previously approved these changes Mar 17, 2022
Copy link
Collaborator

@valhassan valhassan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a mini tutorial on how optuna works?

remtav and others added 3 commits March 17, 2022 14:47
inception: unrelevant without classification
ternausnet: unused
fcn_resnet101: unadapted for >3 bands
…ce.py to DeepLabV3_dualhead class (NRCan#290)

* replace torchvision's version of deeplabv3 to smp's version (>3 band implementation, reuse pretrained weights, etc.)
encapsulate dual head deeplabv3 to deeplabv3_dualhead.py
create temporary unit test in model_choice.py

fcn_resnet101: unadapted for >3 bands
for baseline deeplabv3, use smp's version as is.

* oups, forgot to commit deeplabv3_dualhead.py
- set gpu max ram threshold to 100% only when using optuna (don't hardcode in model_choice.py)
@remtav remtav dismissed stale reviews from valhassan and mpelchat04 via 10ddc7a March 17, 2022 18:48
@@ -1,5 +1,5 @@
# @package _global_
model:
model_name: deeplabv3_resnet101
model_name: deeplabv3_pretrained
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that keeping the encoder in the name can be nice

CharlesAuthier
CharlesAuthier previously approved these changes Mar 21, 2022
Copy link
Collaborator

@CharlesAuthier CharlesAuthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it will pass the test I will approve

@remtav
Copy link
Collaborator Author

remtav commented Sep 12, 2022

It seems geo-deep-learning will rely more and more on pytorch lightning (PL) and torchgeo (TG) for training. The hyperparameter optimization add-on is not a priority until the transition to PL and TG is cleared.

@remtav remtav closed this Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reimplement hyperparameter optimization with hyperopt, optuna or other
4 participants