switch to hydra #208

CharlesAuthier · 2021-09-16T20:58:50Z

New features introduced by the Hydra structure:

monitor the logs generated by the code and save it inside a file.
new documentations on this new structure.
a config file structured by sections and compartiments to easily find the information wanted or to add new features to GDL.
a new way to launch task using a main code to redirect the code to the wanted task.
introduce the early stopping option on plateau

Removed features (to be reimplemented if desired):

In sampling mode, choose a subset of classes from ground truth based on values for given attribute in gpkg (see parameter "target_ids")
In inference mode, if state_dict provided from command line, set architecture accordingly (i.e. overwrite default architecture). For example, the following command doesn't work yet:
python GDL.py mode=inference inference.state_dict_path="/path/to/custom_checkpoint.pth.tar"
use of hyperopt
use of metamap and coordconv features from PR CoordConv and spatial resolution #82

* Adding files to create a docker image * - arbitrary band support - minor refactor * - url assets support added specifically for inference - minor fixes and changes * - support for arbitrary number of bands * - fixed! support for arbitrary number of bands * - cuda device fix * - added vectorisation * - added module collections - removed redundant log_artifact function * - added new sample creation script * - fixed tst set data allocation * - changes to inference.py * - minor fix for debugging * - minor fix * - fix num_classes/class weight mismatch * - added sample creation by sensorID filter * - minor fix, get_num_samples function * - stratification trial fix * - debug print statements added * removed debug print statements * - hyperopt template modified - dice loss fixed - added weight to duo loss * - minor change, file location * - commented to solve file permission errors * - target_size modified to solve out of memory issues * - temporary template added for training on HPC - customizable file paths added for hyperopt assets * - minor fix * - temporary fix for TypeError: load_state_dict() got an unexpected keyword argument 'strict' * - updated * - added pretrained weights param Hpc, local * - model space added * - minor fix to input tensor mismatch * minor fixes: addressing code review - losses: __init__.py - utils: geoutils.py, visualization.py - gdl_hyperopt_HPC removed - README.md correction - train_segmentation.py: space indent fixes, checkpoint path logged on mlflow * fixes and new features: - models: model_choice.py #fixed cuda runtime error - utils: augmentation.py #logging info statement removed, visualization.py #indentation fix - inference.py # new memory management features added, smoothing func improved - train_segmentation.py: space indent fixes, cuda runtime error fixes * - fix: param dict passed explicitly to avoid global calls. * - fix, clipped raster * - minor fixes: removed dask, added time checks * Fixed sugggested changes by reviewers - gdl_hyperopt_template.py - inference.py Co-authored-by: Yves Choquette <yves.choquette@canada.ca> Co-authored-by: valhassan <victor.alhassan@canada.ca>

remtav · 2021-10-06T16:10:38Z

models/model_choice.py

            del checkpoint
            checkpoint = temp_checkpoint
        return checkpoint
    except FileNotFoundError:
-        raise FileNotFoundError(f"=> No model found at '{filename}'")
+        raise logging.critical(FileNotFoundError(f"\n=> No model found at '{filename}'"))


Es-tu certain que ça fonctionne ?
De mon côté, non en tout cas

tu me montre un message derreure mais c'est pas la bonne ligne ou tu as mit ton commentaire, je regarde sa

C'était dans mon code, juste un exemple, mais je crois que ça fera la même chose dans ton code.
Je crois qu'il faut séparer en 2 lignes:

except FileNotFoundError as e: logging.error(message) raise e

Est-ce que ça fonctionne finalement?

Not really still show the same think

…tracker

fix bugs readd calc_inference_chunk_size

remtav

Some lessons will have to be learned from this PR:

plan major developments like this one and request team members' approval before implementation: how will hydra change the way GDL works (classes and functions affected, changes in the use of major scripts within GDL, what are the major steps of implementing hydra in GDL, create and discuss over diagrams, etc.);
Share a detailed timeline for developments, with regulard updates. Adjust dev plan if unexpected delays.
breaking developments into smaller pieces than can be tested by other team members (and merged) one at the time. This would also help share the workload across the team if a single developer has difficulties with implementation;
focusing on one feature (hydra) and keeping other features for future PRs;
prevent major conflits in code by making sure points 1,2, 3 and 4 are followed;
extensive testing via unit tests or at least a few manuals tests (bugs will be need to be addressed in the coming days/weeks).

mpelchat04 · 2021-12-17T18:15:06Z

closes #179
closes #189
closes #157
closes #142

…thlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise

…#274) * - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR #256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR #208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise

* - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR #256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR #208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise * - verifications.py: validate_raster() -> add extended check move input_band_count == num_bands assertion to separate function - refactor segmentation() function - refactor gen_img_sample() function - use itetools.product in evaluate_segmentation - inference: refactor num_devices,default_max_ram_used - default_inference.yaml: update parameters with current usage * softcode max_pix_per_mb_gpu and default to 25 in default_inference.yaml

…n#208 (NRCan#274) * - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise

…an#276) * - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise * - verifications.py: validate_raster() -> add extended check move input_band_count == num_bands assertion to separate function - refactor segmentation() function - refactor gen_img_sample() function - use itetools.product in evaluate_segmentation - inference: refactor num_devices,default_max_ram_used - default_inference.yaml: update parameters with current usage * softcode max_pix_per_mb_gpu and default to 25 in default_inference.yaml

CharlesAuthier added 9 commits July 12, 2021 16:42

config template

86f6c8a

make files for new config

87d6afe

readme and some changes

5b877cb

build the base of GDL with hydra

8bd0de9

commit befor vacation

4d9bd43

commit on sampling segmentation

7ab547d

test with sampling segmentation pass

fbb4d23

start on read parameters for segmentation

ff0bc40

pass inference_segmentation.py test

99e1fdd

CharlesAuthier requested review from ymoisan, valhassan, remtav and mpelchat04 September 16, 2021 20:58

remtav reviewed Oct 6, 2021

View reviewed changes

CharlesAuthier added 15 commits October 6, 2021 15:23

light correction and adding travis

2704f09

remove files

54356f5

remove files

11c0ee4

clean up plus logging test with sampling_segmentation.py

297b38e

clean up plus logging test with training_segmentation.py plus mlflow …

fc8c34c

…tracker

clean up plus logging test with training_segmentation.py plus mlflow …

edcacb2

…tracker

clean up plus logging test with inference_segmentation.py

c7a0b7a

correction in travis

d441f66

correction in travis

8ce1d7f

correction in travis

cd4e9d3

travis env update

9eb1817

update env yaml

3ec6927

update

1e9a380

update

f94275e

update

c02cb5e

CharlesAuthier and others added 5 commits December 16, 2021 14:10

ls to lr

3a93651

fix conflicts with PR NRCan#205 and 206

413382e

fix conflicts with PR NRCan#205 and 206 (inference and sampling)

16cb523

Merge branch 'CharlesAuthier-develop' into develop

9cd2d21

add commonly used models (until new PR for model_choice.py)

e97ea1d

fix bugs readd calc_inference_chunk_size

remtav requested review from mpelchat04 and valhassan December 17, 2021 17:00

remtav approved these changes Dec 17, 2021

View reviewed changes

mpelchat04 approved these changes Dec 17, 2021

View reviewed changes

CharlesAuthier merged commit f7e6033 into NRCan:develop Dec 17, 2021

This was referenced Dec 20, 2021

normalization parameters should be in global section of yaml, not training #198

Closed

ignore_index parameter should be in "global" section, not "training" #194

Closed

This was referenced Feb 3, 2022

Reimplement classification task #260

Closed

Inference: refactoring needed #265

Closed

This was referenced Feb 9, 2022

Refactor path management and restore some interface usage to pre #208 #274

Merged

Housecleaning #270

Merged

#274 Refactor path management and restore some interface usage to pre #208 remtav/geo-deep-learning#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch to hydra #208

switch to hydra #208

CharlesAuthier commented Sep 16, 2021 •

edited by remtav

Loading

remtav Oct 6, 2021 •

edited

Loading

CharlesAuthier Oct 6, 2021

remtav Oct 18, 2021

remtav Nov 10, 2021

CharlesAuthier Nov 12, 2021

remtav left a comment •

edited

Loading

mpelchat04 commented Dec 17, 2021

switch to hydra #208

switch to hydra #208

Conversation

CharlesAuthier commented Sep 16, 2021 • edited by remtav Loading

remtav Oct 6, 2021 • edited Loading

Choose a reason for hiding this comment

CharlesAuthier Oct 6, 2021

Choose a reason for hiding this comment

remtav Oct 18, 2021

Choose a reason for hiding this comment

remtav Nov 10, 2021

Choose a reason for hiding this comment

CharlesAuthier Nov 12, 2021

Choose a reason for hiding this comment

remtav left a comment • edited Loading

Choose a reason for hiding this comment

mpelchat04 commented Dec 17, 2021

CharlesAuthier commented Sep 16, 2021 •

edited by remtav

Loading

remtav Oct 6, 2021 •

edited

Loading

remtav left a comment •

edited

Loading