-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement optuna hyperparameter optimization #278
Conversation
…on (add returns to train_segmentation.py and run_gdl()) metrics.py: add try/except if missing key model_choice.py: temporarily raise gpu threshold to 100% until issue NRCan#246 is addressed train_segmentation.py: - fix bug for dontcare value in metrics - always calculate metrics at test time inference_segmentation.py: warn if inference has only background values test hyperparameter optimization in CI with github actions
models/model_choice.py
Outdated
@@ -244,7 +244,7 @@ def net(model_name: str, | |||
else: | |||
checkpoint = None | |||
# list of GPU devices that are available and unused. If no GPUs, returns empty list | |||
gpu_devices_dict = get_device_ids(num_devices) | |||
gpu_devices_dict = get_device_ids(num_devices, max_used_perc=100, max_used_ram_perc=100) # FIXME: set back to default after issue #246 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the gpus don't empty between each run, so gpu usage threshold must be 100% otherwise each training after the first one runs on cpu (ie gpus get excluded)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or get_device_ids
should be called once, before all runs.
@@ -432,12 +435,13 @@ def evaluation(eval_loader, | |||
a, segmentation = torch.max(outputs_flatten, dim=1) | |||
eval_metrics = iou(segmentation, labels_flatten, batch_size, num_classes, eval_metrics) | |||
eval_metrics = report_classification(segmentation, labels_flatten, batch_size, eval_metrics, | |||
ignore_index=eval_loader.dataset.dontcare) | |||
elif (dataset == 'tst') and (batch_metrics is not None): | |||
ignore_index=dontcare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes missing attribute bug. Dontcare value was previously stored in dataloader object, but not anymore (since when?)
elif (dataset == 'tst') and (batch_metrics is not None): | ||
ignore_index=dontcare) | ||
elif dataset == 'tst': | ||
batch_metrics = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force metrics at test time. I don't see any reason why we shouldn'y systematically output metrics at test time.
# Conflicts: # gdl_hyperopt_template.py # train_segmentation.py
models/model_choice.py
Outdated
@@ -244,7 +244,7 @@ def net(model_name: str, | |||
else: | |||
checkpoint = None | |||
# list of GPU devices that are available and unused. If no GPUs, returns empty list | |||
gpu_devices_dict = get_device_ids(num_devices) | |||
gpu_devices_dict = get_device_ids(num_devices, max_used_perc=100, max_used_ram_perc=100) # FIXME: set back to default after issue #246 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or get_device_ids
should be called once, before all runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a mini tutorial on how optuna works?
inception: unrelevant without classification ternausnet: unused fcn_resnet101: unadapted for >3 bands
…ce.py to DeepLabV3_dualhead class (NRCan#290) * replace torchvision's version of deeplabv3 to smp's version (>3 band implementation, reuse pretrained weights, etc.) encapsulate dual head deeplabv3 to deeplabv3_dualhead.py create temporary unit test in model_choice.py fcn_resnet101: unadapted for >3 bands for baseline deeplabv3, use smp's version as is. * oups, forgot to commit deeplabv3_dualhead.py
- set gpu max ram threshold to 100% only when using optuna (don't hardcode in model_choice.py)
@@ -1,5 +1,5 @@ | |||
# @package _global_ | |||
model: | |||
model_name: deeplabv3_resnet101 | |||
model_name: deeplabv3_pretrained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that keeping the encoder in the name can be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it will pass the test I will approve
It seems geo-deep-learning will rely more and more on pytorch lightning (PL) and torchgeo (TG) for training. The hyperparameter optimization add-on is not a priority until the transition to PL and TG is cleared. |
…on (add returns to train_segmentation.py and run_gdl())
metrics.py: add try/except if missing key
model_choice.py: temporarily raise gpu threshold to 100% until issue #246 is addressed
train_segmentation.py:
inference_segmentation.py: warn if inference has only background values
test hyperparameter optimization in CI with github actions
closes #243