Manage device choice at a single place with set_device #272

remtav · 2022-02-09T15:59:40Z

closes #269

- remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256

…ts the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug)

valhassan · 2022-02-11T03:50:24Z

utils/logger.py

+    logger = logging.getLogger(name)
+    logger.setLevel(level)
+
+    # this ensures all logging levels get marked with the rank zero decorator


this was @CharlesAuthier addition though. I'm just moving it from utils.py to logger.py :)

valhassan · 2022-02-11T03:54:40Z

utils/utils.py

+    else:
+        logging.warning(f"\nNo Cuda device available. This process will only run on CPU")
+        device = torch.device('cpu')
+        try:


I don't think this try statement necessary

valhassan · 2022-02-11T03:57:13Z

utils/utils.py

+        device = torch.device('cpu')
+        try:
+            models.resnet18().to(device)  # test with a small model
+        except (RuntimeError, AssertionError):  # HPC: when device 0 not available. Error: Cuda invalid device ordinal.


seems the comment there is no longer valid there, over the summer we solved the problem of Cuda invalid device ordinal. We can just say # Error !

Ok! If you say the problem is solved, I can simply remove the try/except all together. Thanks!

valhassan · 2022-02-11T04:03:58Z

inference_segmentation.py

+    # list of GPU devices that are available and unused. If no GPUs, returns empty dict
+    gpu_devices_dict = get_device_ids(num_devices, max_used_ram_perc=max_used_ram, max_used_perc=max_used_perc)
+    chunk_size = get_key_def('chunk_size', params['inference'], default=512, expected_type=int)
+    chunk_size = calc_inference_chunk_size(gpu_devices_dict=gpu_devices_dict, max_pix_per_mb_gpu=50, default=chunk_size)


just a note: calculating chunk size dynamically here specifically at inference may cause problems (memory) with test time augmentation and smoothing. A default size of 512 is advised. 512 is padded to 512 * 2 during test time routines

ok, do you agree to keep the "automatic chunk_size" option? it's meant to be tunable, i.e. max_pix_per_mb_gpu can be lowered if the automatic way busts memory.

As discussed we can lower the max_pix_per_mb_gpu for now, keeping in mind that we could later optimize its definition during inference.

True. I'll address this in PR #276

# Conflicts: # utils/data_analysis.py # utils/utils.py

* - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * remove try/except statement for old HPC bug (device ordinal error)

remtav added 2 commits February 9, 2022 10:11

- remove unused functions

bf57a12

- remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256

- create set_device function: rom dictionary of available devices, se…

d46d928

…ts the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug)

remtav requested review from CharlesAuthier, mpelchat04 and valhassan February 9, 2022 15:59

valhassan reviewed Feb 11, 2022

View reviewed changes

remtav mentioned this pull request Feb 11, 2022

Manage tracker initialization with set_tracker() function #273

Merged

remtav added 2 commits February 11, 2022 08:00

remove try/except statement for old HPC bug (device ordinal error)

8041fb0

Merge branch 'develop' into 269-set_device

7e93970

# Conflicts: # utils/data_analysis.py # utils/utils.py

CharlesAuthier approved these changes Feb 15, 2022

View reviewed changes

mpelchat04 approved these changes Feb 15, 2022

View reviewed changes

remtav merged commit 685f960 into NRCan:develop Feb 15, 2022

remtav deleted the 269-set_device branch February 16, 2022 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage device choice at a single place with set_device #272

Manage device choice at a single place with set_device #272

remtav commented Feb 9, 2022

valhassan Feb 11, 2022

remtav Feb 11, 2022 •

edited

Loading

valhassan Feb 11, 2022

valhassan Feb 11, 2022

remtav Feb 11, 2022

valhassan Feb 11, 2022 •

edited

Loading

remtav Feb 11, 2022

mpelchat04 Feb 15, 2022

remtav Feb 15, 2022

Manage device choice at a single place with set_device #272

Manage device choice at a single place with set_device #272

Conversation

remtav commented Feb 9, 2022

valhassan Feb 11, 2022

Choose a reason for hiding this comment

remtav Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

valhassan Feb 11, 2022

Choose a reason for hiding this comment

valhassan Feb 11, 2022

Choose a reason for hiding this comment

remtav Feb 11, 2022

Choose a reason for hiding this comment

valhassan Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

remtav Feb 11, 2022

Choose a reason for hiding this comment

mpelchat04 Feb 15, 2022

Choose a reason for hiding this comment

remtav Feb 15, 2022

Choose a reason for hiding this comment

remtav Feb 11, 2022 •

edited

Loading

valhassan Feb 11, 2022 •

edited

Loading