-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch to hydra #208
switch to hydra #208
Conversation
* Adding files to create a docker image * - arbitrary band support - minor refactor * - url assets support added specifically for inference - minor fixes and changes * - support for arbitrary number of bands * - fixed! support for arbitrary number of bands * - cuda device fix * - added vectorisation * - added module collections - removed redundant log_artifact function * - added new sample creation script * - fixed tst set data allocation * - changes to inference.py * - minor fix for debugging * - minor fix * - fix num_classes/class weight mismatch * - added sample creation by sensorID filter * - minor fix, get_num_samples function * - stratification trial fix * - debug print statements added * removed debug print statements * - hyperopt template modified - dice loss fixed - added weight to duo loss * - minor change, file location * - commented to solve file permission errors * - target_size modified to solve out of memory issues * - temporary template added for training on HPC - customizable file paths added for hyperopt assets * - minor fix * - temporary fix for TypeError: load_state_dict() got an unexpected keyword argument 'strict' * - updated * - added pretrained weights param Hpc, local * - model space added * - minor fix to input tensor mismatch * minor fixes: addressing code review - losses: __init__.py - utils: geoutils.py, visualization.py - gdl_hyperopt_HPC removed - README.md correction - train_segmentation.py: space indent fixes, checkpoint path logged on mlflow * fixes and new features: - models: model_choice.py #fixed cuda runtime error - utils: augmentation.py #logging info statement removed, visualization.py #indentation fix - inference.py # new memory management features added, smoothing func improved - train_segmentation.py: space indent fixes, cuda runtime error fixes * - fix: param dict passed explicitly to avoid global calls. * - fix, clipped raster * - minor fixes: removed dask, added time checks * Fixed sugggested changes by reviewers - gdl_hyperopt_template.py - inference.py Co-authored-by: Yves Choquette <yves.choquette@canada.ca> Co-authored-by: valhassan <victor.alhassan@canada.ca>
del checkpoint | ||
checkpoint = temp_checkpoint | ||
return checkpoint | ||
except FileNotFoundError: | ||
raise FileNotFoundError(f"=> No model found at '{filename}'") | ||
raise logging.critical(FileNotFoundError(f"\n=> No model found at '{filename}'")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tu me montre un message derreure mais c'est pas la bonne ligne ou tu as mit ton commentaire, je regarde sa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C'était dans mon code, juste un exemple, mais je crois que ça fera la même chose dans ton code.
Je crois qu'il faut séparer en 2 lignes:
except FileNotFoundError as e:
logging.error(message)
raise e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Est-ce que ça fonctionne finalement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really still show the same think
fix bugs readd calc_inference_chunk_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some lessons will have to be learned from this PR:
- plan major developments like this one and request team members' approval before implementation: how will hydra change the way GDL works (classes and functions affected, changes in the use of major scripts within GDL, what are the major steps of implementing hydra in GDL, create and discuss over diagrams, etc.);
- Share a detailed timeline for developments, with regulard updates. Adjust dev plan if unexpected delays.
- breaking developments into smaller pieces than can be tested by other team members (and merged) one at the time. This would also help share the workload across the team if a single developer has difficulties with implementation;
- focusing on one feature (hydra) and keeping other features for future PRs;
- prevent major conflits in code by making sure points 1,2, 3 and 4 are followed;
- extensive testing via unit tests or at least a few manuals tests (bugs will be need to be addressed in the coming days/weeks).
…thlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise
…#274) * - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR #256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR #208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise
* - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR #256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR #208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise * - verifications.py: validate_raster() -> add extended check move input_band_count == num_bands assertion to separate function - refactor segmentation() function - refactor gen_img_sample() function - use itetools.product in evaluate_segmentation - inference: refactor num_devices,default_max_ram_used - default_inference.yaml: update parameters with current usage * softcode max_pix_per_mb_gpu and default to 25 in default_inference.yaml
…n#208 (NRCan#274) * - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise
…an#276) * - remove unused functions - remove ruamel_yaml import from active scripts - fix dontcare2background related to PR NRCan#256 * - create set_device function: rom dictionary of available devices, sets the device to be used - check if model can be pushed to device, else catch exception and try with cuda, not cuda:0 (HPC bug) * manage tracker initialization with set_tracker() function in utils.py, adapt get_key_def() to recursively check for parameter value in dictionary of dictionary * - use get_key_def() to validate path existence and to convert to a pathlib.Path object - remove error-handling with try2read_csv and in_case_of_path - use hydra's to_absolute_path utils (remove most calls to ${hydra:runtime.cwd} in yamls - revert usage of paths to before PR NRCan#208 (remove error-handling, remove find_first_file(), set unique model directory at train) - replace warnings with logging.warning - replace assert with raise * - verifications.py: validate_raster() -> add extended check move input_band_count == num_bands assertion to separate function - refactor segmentation() function - refactor gen_img_sample() function - use itetools.product in evaluate_segmentation - inference: refactor num_devices,default_max_ram_used - default_inference.yaml: update parameters with current usage * softcode max_pix_per_mb_gpu and default to 25 in default_inference.yaml
New features introduced by the Hydra structure:
Removed features (to be reimplemented if desired):
python GDL.py mode=inference inference.state_dict_path="/path/to/custom_checkpoint.pth.tar"