Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to hydra #208

Merged
merged 53 commits into from
Dec 17, 2021
Merged

switch to hydra #208

merged 53 commits into from
Dec 17, 2021

Conversation

CharlesAuthier
Copy link
Collaborator

@CharlesAuthier CharlesAuthier commented Sep 16, 2021

New features introduced by the Hydra structure:

  • monitor the logs generated by the code and save it inside a file.
  • new documentations on this new structure.
  • a config file structured by sections and compartiments to easily find the information wanted or to add new features to GDL.
  • a new way to launch task using a main code to redirect the code to the wanted task.
  • introduce the early stopping option on plateau

Removed features (to be reimplemented if desired):

  • In sampling mode, choose a subset of classes from ground truth based on values for given attribute in gpkg (see parameter "target_ids")
  • In inference mode, if state_dict provided from command line, set architecture accordingly (i.e. overwrite default architecture). For example, the following command doesn't work yet:
    python GDL.py mode=inference inference.state_dict_path="/path/to/custom_checkpoint.pth.tar"
  • use of hyperopt
  • use of metamap and coordconv features from PR CoordConv and spatial resolution #82

* Adding files to create a docker image

* - arbitrary band support
- minor refactor

* - url assets support added specifically for inference
- minor fixes and changes

* - support for arbitrary number of bands

* - fixed! support for arbitrary number of bands

* - cuda device fix

* - added vectorisation

* - added module collections
- removed redundant log_artifact function

* - added new sample creation script

* - fixed tst set data allocation

* - changes to inference.py

* - minor fix for debugging

* - minor fix

* - fix num_classes/class weight mismatch

* - added sample creation by sensorID filter

* - minor fix, get_num_samples function

* - stratification trial fix

* - debug print statements added

* removed debug print statements

* - hyperopt template modified
- dice loss fixed
- added weight to duo loss

* - minor change, file location

* - commented to solve file permission errors

* - target_size modified to solve out of memory issues

* - temporary template added for training on HPC
- customizable file paths added for hyperopt assets

* - minor fix

* - temporary fix for
TypeError: load_state_dict() got an unexpected keyword argument 'strict'

* - updated

* - added pretrained weights param Hpc, local

* - model space added

* - minor fix to input tensor mismatch

* minor fixes: addressing code review
- losses: __init__.py
- utils: geoutils.py, visualization.py
- gdl_hyperopt_HPC removed
- README.md correction
- train_segmentation.py: space indent fixes, checkpoint path logged on mlflow

* fixes and new features:
- models: model_choice.py #fixed cuda runtime error
- utils: augmentation.py #logging info statement removed, visualization.py #indentation fix
- inference.py # new memory management features added, smoothing func improved
- train_segmentation.py: space indent fixes, cuda runtime error fixes

* - fix: param dict passed explicitly to avoid global calls.

* - fix, clipped raster

* - minor fixes: removed dask, added time checks

* Fixed sugggested changes by reviewers
- gdl_hyperopt_template.py
- inference.py

Co-authored-by: Yves Choquette <yves.choquette@canada.ca>
Co-authored-by: valhassan <victor.alhassan@canada.ca>
del checkpoint
checkpoint = temp_checkpoint
return checkpoint
except FileNotFoundError:
raise FileNotFoundError(f"=> No model found at '{filename}'")
raise logging.critical(FileNotFoundError(f"\n=> No model found at '{filename}'"))
Copy link
Collaborator

@remtav remtav Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Es-tu certain que ça fonctionne ?
De mon côté, non en tout cas
Screenshot from 2021-10-06 12-08-04

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tu me montre un message derreure mais c'est pas la bonne ligne ou tu as mit ton commentaire, je regarde sa

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'était dans mon code, juste un exemple, mais je crois que ça fera la même chose dans ton code.
Je crois qu'il faut séparer en 2 lignes:

except FileNotFoundError as e:
    logging.error(message)
    raise e

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Est-ce que ça fonctionne finalement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really still show the same think

Copy link
Collaborator

@remtav remtav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some lessons will have to be learned from this PR:

  1. plan major developments like this one and request team members' approval before implementation: how will hydra change the way GDL works (classes and functions affected, changes in the use of major scripts within GDL, what are the major steps of implementing hydra in GDL, create and discuss over diagrams, etc.);
  2. Share a detailed timeline for developments, with regulard updates. Adjust dev plan if unexpected delays.
  3. breaking developments into smaller pieces than can be tested by other team members (and merged) one at the time. This would also help share the workload across the team if a single developer has difficulties with implementation;
  4. focusing on one feature (hydra) and keeping other features for future PRs;
  5. prevent major conflits in code by making sure points 1,2, 3 and 4 are followed;
  6. extensive testing via unit tests or at least a few manuals tests (bugs will be need to be addressed in the coming days/weeks).

@mpelchat04
Copy link
Collaborator

closes #179
closes #189
closes #157
closes #142

@CharlesAuthier CharlesAuthier merged commit f7e6033 into NRCan:develop Dec 17, 2021
remtav added a commit to remtav/geo-deep-learning that referenced this pull request Feb 9, 2022
…thlib.Path object

- remove error-handling with try2read_csv and in_case_of_path
- use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls
- revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train)
- replace warnings with logging.warning
- replace assert with raise
remtav added a commit that referenced this pull request Feb 16, 2022
…#274)

* - remove unused functions
- remove ruamel_yaml import from active scripts
- fix dontcare2background related to PR #256

* - create set_device function: rom dictionary of available devices, sets the device to be used
- check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug)

* manage tracker initialization with set_tracker() function
in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary

* - use get_key_def() to validate path existence and to convert to a pathlib.Path object
- remove error-handling with try2read_csv and in_case_of_path
- use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls
- revert usage of paths to before PR #208 (remove error-handling, remove find_first_file(), set unique model directory at train)
- replace warnings with logging.warning
- replace assert with raise
remtav added a commit that referenced this pull request Feb 18, 2022
* - remove unused functions
- remove ruamel_yaml import from active scripts
- fix dontcare2background related to PR #256

* - create set_device function: rom dictionary of available devices, sets the device to be used
- check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug)

* manage tracker initialization with set_tracker() function
in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary

* - use get_key_def() to validate path existence and to convert to a pathlib.Path object
- remove error-handling with try2read_csv and in_case_of_path
- use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls
- revert usage of paths to before PR #208 (remove error-handling, remove find_first_file(), set unique model directory at train)
- replace warnings with logging.warning
- replace assert with raise

* - verifications.py: validate_raster() -> add extended check move input_band_count == num_bands assertion to separate function
- refactor segmentation() function
- refactor gen_img_sample() function
- use itetools.product in evaluate_segmentation
- inference: refactor num_devices,default_max_ram_used
- default_inference.yaml: update parameters with current usage

* softcode max_pix_per_mb_gpu
and default to 25 in default_inference.yaml
remtav added a commit to remtav/geo-deep-learning that referenced this pull request Jul 5, 2022
…n#208 (NRCan#274)

* - remove unused functions
- remove ruamel_yaml import from active scripts
- fix dontcare2background related to PR NRCan#256

* - create set_device function: rom dictionary of available devices, sets the device to be used
- check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug)

* manage tracker initialization with set_tracker() function
in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary

* - use get_key_def() to validate path existence and to convert to a pathlib.Path object
- remove error-handling with try2read_csv and in_case_of_path
- use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls
- revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train)
- replace warnings with logging.warning
- replace assert with raise
remtav added a commit to remtav/geo-deep-learning that referenced this pull request Jul 5, 2022
…an#276)

* - remove unused functions
- remove ruamel_yaml import from active scripts
- fix dontcare2background related to PR NRCan#256

* - create set_device function: rom dictionary of available devices, sets the device to be used
- check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug)

* manage tracker initialization with set_tracker() function
in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary

* - use get_key_def() to validate path existence and to convert to a pathlib.Path object
- remove error-handling with try2read_csv and in_case_of_path
- use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls
- revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train)
- replace warnings with logging.warning
- replace assert with raise

* - verifications.py: validate_raster() -> add extended check move input_band_count == num_bands assertion to separate function
- refactor segmentation() function
- refactor gen_img_sample() function
- use itetools.product in evaluate_segmentation
- inference: refactor num_devices,default_max_ram_used
- default_inference.yaml: update parameters with current usage

* softcode max_pix_per_mb_gpu
and default to 25 in default_inference.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants