Major and Minor updates to GDL #206

valhassan · 2021-08-12T16:20:56Z

Main Highlights

New sample creation script
Major updates to inference.py
minor fixes across GDL

Docker image automatisation

- minor refactor

- minor fixes and changes

- removed redundant log_artifact function

…into develop # Conflicts: # inference.py # models/model_choice.py # train_segmentation.py # utils/utils.py added sample_creation.py script

- dice loss fixed - added weight to duo loss

- customizable file paths added for hyperopt assets

TypeError: load_state_dict() got an unexpected keyword argument 'strict'

- unet_pretrained_101 model added - minor changes to geoutils.py - updated to metrics.py - minor fix to windowing func utils.py - update to images_to_samples.py - major changes to inference.py - minor changes to sample_creation.py

…into develop # Conflicts: # inference.py # train_segmentation.py

CharlesAuthier · 2021-08-12T16:30:29Z

README.md

 ```
-> For Windows OS:
+##### For Docker


thanks that nice

CharlesAuthier · 2021-08-12T16:33:03Z

gdl_hyperopt_HPC.py

+be modified here, as well as GDL config modification logic within the objective_with_args
+function.
+
+"""


Can you put a section in the README.md on how to launch it pls :)

will be removed

CharlesAuthier · 2021-08-12T16:34:05Z

gdl_hyperopt_HPC.py

+#             'optimizer': hp.choice('optimizer', ['adam', 'adabound']),
+#             'learning_rate': hp.loguniform('learning_rate', np.log(1e-7), np.log(0.1))}
+
+my_space = {'loss_fn': hp.choice('loss_fn', ['CrossEntropy', 'Lovasz', 'Duo']),


Can that dict be a yaml or a csv ? maybe in a future PR otherwise

this template will be removed, it is experiment specific supposed to live on a private branch

CharlesAuthier · 2021-08-12T16:35:20Z

gdl_hyperopt_template.py

-            'rivers_weight': hp.uniform('rivers_weight', 1.0, 10.0),
-            'flood_weight': hp.uniform('flood_weight', 1.0, 10.0),
-            'noise': hp.choice('noise', [0.0, 1.0])}
+my_space = {'model_name': hp.choice('model_name', ['unet_pretrained', 'deeplabv3_resnet101']),


Same comment that the HPC one

this would be tackled in a future PR

CharlesAuthier · 2021-08-12T16:39:30Z

utils/visualization.py

@@ -123,7 +124,8 @@ def vis(vis_params,
    assert vis_path.parent.is_dir()
    vis_path.mkdir(exist_ok=True)

-    if not vis_params['inference_input_path']:  # FIXME: function parameters should not come in as different types if inference or not.
+    if not vis_params[


Why? just put the comment upper the if and not redirect on a second line

I pulled and merged this change, will fix

CharlesAuthier · 2021-08-12T16:42:26Z

utils/utils.py

@@ -71,8 +73,8 @@ def load_from_checkpoint(checkpoint, model, optimizer=None, inference:str=''):
    strict_loading = False if not inference else True
    model.load_state_dict(checkpoint['model'], strict=strict_loading)
    logging.info(f"=> loaded model\n")
-    if optimizer and 'optimizer' in checkpoint.keys():    # 2nd condition if loading a model without optimizer
-        optimizer.load_state_dict(checkpoint['optimizer'], strict=False)
+    # if optimizer and 'optimizer' in checkpoint.keys():    # 2nd condition if loading a model without optimizer


Can we delete it ?

I think this should stay (although I dont feel strongly either way here, so feel free to disagree)

as if we keep it:
if optimizer == None then it will not change anything
if optimizer != None it will load a better/more accurate/more pre-trained checkpoint
^ for lack of better words

if we remove it
we should remove the optimizer var from the params, docs, & return line
(vic's PR's Lines 55, 60, 78)

@victorlazio109 I was having issues with this line too & the way I was able to solve it was to remove the strict=False from line 77

my code:

if optimizer and 'optimizer' in checkpoint.keys(): # 2nd condition if loading a model without optimizer optimizer.load_state_dict(checkpoint['optimizer'])#, strict=False)

lastly, if it is removed to be able to load checkpoints with diff optimizers..
maybe we could add an if statement?
here is a helpful link if this is the case:
https://stackoverflow.com/questions/64301566/how-to-check-if-an-object-is-a-certain-pytorch-optimizer

is there a difference between your code and the commented lines?

oh nvm, you commented out strict=False, I will try

CharlesAuthier · 2021-08-12T16:49:42Z

train_segmentation.py

@@ -411,14 +411,15 @@ def evaluation(eval_loader,
                if batch_index in range(min_vis_batch, max_vis_batch, increment):
                    vis_path = progress_log.parent.joinpath('visualization')
                    if ep_idx == 0 and batch_index == min_vis_batch:
-                        logging.info(f'Visualizing on {dataset} outputs for batches in range {vis_params["vis_batch_range"]}. All '
-                                   f'images will be saved to {vis_path}\n')
+                        logging.info(f'Visualizing on {dataset} outputs for batches in range {vis_params[


Can you do that instead ? (Little tic from me sorry)

logging.info( f'Visualizing on {dataset} outputs for batches in range {vis_params["vis_batch_range"]}. ' f' All images will be saved to {vis_path}\n')

CharlesAuthier · 2021-08-12T16:51:44Z

train_segmentation.py

@@ -317,13 +316,14 @@ def train(train_loader,
            if batch_index in range(min_vis_batch, max_vis_batch, increment):
                vis_path = progress_log.parent.joinpath('visualization')
                if ep_idx == 0:
-                    logging.info(f'Visualizing on train outputs for batches in range {vis_params["vis_batch_range"]}. All images will be saved to {vis_path}\n')
+                    logging.info(f'Visualizing on train outputs for batches in range {vis_params[


Can you do that instead ? (Little tic from me sorry)

logging.info( f'Visualizing on train outputs for batches in range {vis_params["vis_batch_range"]}.' f' All images will be saved to {vis_path}\n')

ymoisan · 2021-08-13T12:14:04Z

README.md

@@ -32,7 +32,7 @@ The final step in the process is to assign every pixel in the original image a v
 ## **Requirement**
 This project comprises a set of commands to be run at a shell command prompt.  Examples used here are for a bash shell in an Ubuntu GNU/Linux environment.

- [Python 3.6](https://www.python.org/downloads/release/python-360/), see the full list of dependencies in [requirements.txt](requirements.txt)
+- [Python 3.7.6](https://www.python.org/downloads/release/python-376/), see the full list of dependencies in [environment.yml](environment.yml)
 - [mlflow](https://mlflow.org/)
 - [minicanda](https://docs.conda.io/en/latest/miniconda.html) (highly recommended)


typo : miniconda

ymoisan · 2021-08-13T12:28:58Z

gdl_hyperopt_template.py

    params['global']['model_name'] = hparams['model_name']
-    # ToDo: Should adjust batch size as a function of model and target size...
-    params['training']['class_weights'] = [1.0, hparams['permanent_water_weight'], hparams['rivers_weight'],


A (hyperopt) template should not have parameters clearly linked to a specific application, e.g. floods like in this case. "specific structure of the GDL config file" indeed ... We may need a structure of YAML files as a function of application that a template could make use of.

mwesleyj · 2021-08-25T13:06:59Z

utils/geoutils.py

    # Get extent of gpkg data with fiona
    with fiona.open(gpkg, 'r') as src:
+        gpkg_crs = src.crs
+        assert gpkg_crs == raster.crs


maybe use utils.verification.assert_crs_match(...) instead?

as a few times I have noticed they dont always match
or
I need to check src.crs['init'] instead

... I realize src.crs['init'] opens up a whole new can of worms
so mb ignore for now ( I have put on my # ToDo: list )

I will remove that assertion, it isn't meant to be there

mwesleyj · 2021-08-25T13:34:52Z

utils/geoutils.py

+                dest.write(out_img)
+            return out_tif
+        except ValueError as e:  # if gpkg's extent outside raster: "ValueError: Input shapes do not overlap raster."
+            # TODO: warning or exception? if warning, except must be set in images_to_samples


does utils.readers.image_reader_as_array( ... ) still need to call clip_raster_with_gpkg( ... ) ?
because it still expects 2 returns (not 1)

Warning (I think)
But this (Lines 102-104) should prob also return something

thinking off the top of my head to fix: I would say
return False
pass thru samples.creation.process_raster_img
after rst_pth, r_ = process_raster_img(img_pan, gpkg) test if rst_path==False then trigger a flag?
then have the flag break out of loop

mwesleyj · 2021-08-26T20:09:35Z

losses/__init__.py

@@ -38,7 +38,7 @@ def forward(self, preds, labels):
            cals = []
            for obj in self.criterion:
                cals.append(obj(preds, labels))
-            loss = sum(cals)
+            loss = sum(cals) * 0.5


just to add a lil more 'robustness' i would suggest maybe:

loss = sum(cals) / len(self.criterion)

- losses: __init__.py - utils: geoutils.py, visualization.py - gdl_hyperopt_HPC removed - README.md correction - train_segmentation.py: space indent fixes, checkpoint path logged on mlflow

- models: model_choice.py #fixed cuda runtime error - utils: augmentation.py #logging info statement removed, visualization.py #indentation fix - inference.py # new memory management features added, smoothing func improved - train_segmentation.py: space indent fixes, cuda runtime error fixes

CharlesAuthier · 2021-09-03T20:14:47Z

inference.py

+                                                            raster_info={})
+
+        sample['metadata'] = image_metadata
+        totensor_transform = augmentation.compose_transforms(params,


params is called but never asign

I investigated this behavior; params is not passed to the local function but called globally. Difficult to detect because no error is thrown. This is not ideal and has been fixed with the latest commit. Thanks for pointing it out.

mpelchat04 · 2021-09-24T16:32:43Z

sample_creation.py

@@ -0,0 +1,569 @@
+import argparse


Is this script a replacement for Images_to_samples.py? If so, I don't think it's very usefull to have both.

not necessarily, it exists to support the new way our images(*single bands) are called and processed. Let me know if you have suggestions where you want it to live.

remtav · 2021-09-28T18:00:34Z

gdl_hyperopt_template.py


    try:
        mlrun = get_latest_mlrun(params)
        run_name_split = mlrun.data.tags['mlflow.runName'].split('_')
-        params['global']['mlflow_run_name'] = run_name_split[0] + f'_{int(run_name_split[1])+1}'
+        params['global']['mlflow_run_name'] = run_name_split[0] + f'_{int(run_name_split[1]) + 1}'
    except:
        pass


why try/except statement here? If necessary, maybe it should be narrowed down to catch only known errors. Not a priority.

can be worked on, previous work from an intern

remtav · 2021-09-28T18:03:08Z

gdl_hyperopt_template.py

+        params['training']['state_dict_path'] = params['training']['dict_unet']
+    elif params['global']['model_name'] == "deeplabv3_resnet101":
+        params['training']['state_dict_path'] = params['training']['dict_deeplab']


the dict_[model] parameter comes from where? Is it an output of a previous run? Maybe a comment or two would help here.

This will be taken off, it is unique to my special use case of hyperopt.

remtav · 2021-09-28T18:04:49Z

images_to_samples.py

@@ -247,6 +248,7 @@ def samples_preparation(in_img_array,
                # Stratification bias
                if (stratd is not None) and (dataset == 'trn'):
                    tile_size = target.size
+                    u, count = np.unique(target, return_counts=True)


thanks for fixing this bug.

remtav · 2021-09-28T18:07:38Z

inference.py

@@ -43,112 +45,223 @@
 logging.getLogger(__name__)


-def calc_inference_chunk_size(gpu_devices_dict: dict, max_pix_per_mb_gpu: int = 350):


why remove this function? Did it cause problems?

Yeah, it does not work well with the revised smoothening function. I can explain further. It has to do with the padding of the image x2 of its dimension, so you have to provide a static chunk_size which will not burst the memory during operations.

remtav · 2021-09-28T18:17:24Z

inference.py

+    step = int(chunk_size / subdiv)
+    for row in range(0, src.height, step):
+        for column in range(0, src.width, step):
+            window = Window.from_slices(slice(row, row + chunk_size),


yeah Window!

remtav · 2021-09-28T18:35:43Z

inference.py

+    print("Number of features written: {}".format(i))
+
+
+def gen_img_samples(src, chunk_size, *band_order):


nice generator function. please add docstring to this new function :)

remtav · 2021-09-28T18:43:40Z

inference.py



 @torch.no_grad()
-def segmentation(img_array,
+def segmentation(param,


here again, docstrings could help understand what arguments are expected. Not sure I understand the tp_mem argument. Does it stand for "temporary memory"? It's the image to be inferred?

I would add docstrings amd make proper comments. tp_mem is a temp_file to write very large arrays to disk at inference

remtav · 2021-09-28T18:44:06Z

inference.py

+    WINDOW_SPLINE_2D = torch.as_tensor(np.moveaxis(WINDOW_SPLINE_2D, 2, 0), ).type(torch.float)
+    WINDOW_SPLINE_2D = WINDOW_SPLINE_2D.to(device)
+
+    fp = np.memmap(tp_mem, dtype='float16', mode='w+', shape=(h_, w_, num_classes))


can you add more comments from here on? Why do you create a memory map from tp_mem? What does it become further on?

basically a temporary memory map file was introduced to aid inference on very large tifs

remtav · 2021-09-28T18:47:05Z

inference.py

+    fp = np.memmap(tp_mem, dtype='float16', mode='r', shape=(h_, w_, num_classes))
+    subdiv = 2.0
+    step = int(chunk_size / subdiv)
+    pred_img = np.zeros((h_, w_), dtype=np.uint8)
+    for row in tqdm(range(0, input_image.height, step), position=2, leave=False):
+        for col in tqdm(range(0, input_image.width, step), position=3, leave=False):
+            arr1 = fp[row:row + chunk_size, col:col + chunk_size, :] / (2 ** 2)
+            arr1 = arr1.argmax(axis=-1).astype('uint8')
+            pred_img[row:row + chunk_size, col:col + chunk_size] = arr1
+    pred_img = pred_img[:h, :w]
+    end_seg = time.time() - start_seg


here again, comments would help follow what's going on. Many lines are similar to those in gen_img_sample(). Could this be refactored?

most probably, code improvement is always work in progress

remtav · 2021-09-28T18:50:33Z

utils/utils.py

@@ -487,37 +527,6 @@ def ordereddict_eval(str_to_eval: str):
        return str_to_eval


-def defaults_from_params(params, key=None):


this was added by Blaise, right? Why remove it?

I think this was refactored somewhere else and it became redundant

- gdl_hyperopt_template.py - inference.py

remtav · 2021-10-05T19:26:04Z

train_segmentation.py

-    # list of GPU devices that are available and unused. If no GPUs, returns empty list
-    gpu_devices_dict = get_device_ids(num_devices,
-                                      max_used_ram_perc=max_used_ram,
-                                      max_used_perc=max_used_perc)
-    logging.info(f'GPUs devices available: {gpu_devices_dict}')
-    num_devices = len(gpu_devices_dict.keys())
-    device = torch.device(f'cuda:{list(gpu_devices_dict.keys())[0]}' if gpu_devices_dict else 'cpu')
-
-    logging.info(f'Creating dataloaders from data in {samples_folder}...\n')
-


Vic, any reason why you removed this? Were you having trouble with it?

moved to model_choice.py. made a fix too of the frequent bug on HPC of not being able to use GPUs with ids other than 0

remtav · 2021-10-05T19:51:33Z

inference.py

-                    logging.debug(f'Unique values in loaded raster: {np.unique(img_array)}\n'
-                                  f'Shape of raster: {img_array.shape}')
+        for info in tqdm(list_img, desc='Inferring from images', position=0, leave=True):
+            with start_run(run_name=Path(info['tif']).name, nested=True):


There's a little bug here. at line 475, start_run is imported only if a mlflow uri is inputted. We can just import everything at the top of the script no matter if the uri is given or not. I can address this bugfix in my next PR.

ychoquet and others added 30 commits March 18, 2021 09:23

Adding files to create a docker image

ac80352

Merge branch 'develop' of https://github.com/victorlazio109/geo-deep-…

31f4a64

…learning into develop

Merge pull request #1 from ychoquet/develop

052a985

Docker image automatisation

- arbitrary band support

63cd6a0

- minor refactor

- url assets support added specifically for inference

069e030

- minor fixes and changes

- support for arbitrary number of bands

4a10cc3

- fixed! support for arbitrary number of bands

9e87901

- cuda device fix

7409695

- added vectorisation

fa7c1d7

- added module collections

fb58e3c

- removed redundant log_artifact function

- added new sample creation script

55f83f0

- fixed tst set data allocation

200d48f

Merge branch 'develop' of https://github.com/NRCan/geo-deep-learning …

a8a095a

…into develop # Conflicts: # inference.py # models/model_choice.py # train_segmentation.py # utils/utils.py added sample_creation.py script

- changes to inference.py

e77f8a1

- minor fix for debugging

0a25fbc

- minor fix

81ed707

- fix num_classes/class weight mismatch

9b4368b

- added sample creation by sensorID filter

c1233a4

- minor fix, get_num_samples function

4fae864

- stratification trial fix

3f4d3fb

- debug print statements added

8a38726

removed debug print statements

d7e8d35

- hyperopt template modified

e060066

- dice loss fixed - added weight to duo loss

- minor change, file location

ec21084

- commented to solve file permission errors

1db54e6

- target_size modified to solve out of memory issues

f7c06f0

- temporary template added for training on HPC

c1d21b4

- customizable file paths added for hyperopt assets

Merge remote-tracking branch 'origin/develop' into develop

fc4ab9f

- minor fix

469000e

- temporary fix for

33bca73

TypeError: load_state_dict() got an unexpected keyword argument 'strict'

valhassan added 5 commits July 22, 2021 12:18

- added pretrained weights param Hpc, local

30f43b8

- model space added

8cd4407

- minor fix to input tensor mismatch

59443ec

- reference comment added dice_loss.py

2e8758d

- unet_pretrained_101 model added - minor changes to geoutils.py - updated to metrics.py - minor fix to windowing func utils.py - update to images_to_samples.py - major changes to inference.py - minor changes to sample_creation.py

Merge branch 'develop' of https://github.com/NRCan/geo-deep-learning …

c69b616

…into develop # Conflicts: # inference.py # train_segmentation.py

valhassan requested review from ymoisan, CharlesAuthier, remtav, mpelchat04 and mwesleyj August 12, 2021 16:20

CharlesAuthier reviewed Aug 12, 2021

View reviewed changes

ymoisan reviewed Aug 13, 2021

View reviewed changes

mwesleyj reviewed Aug 25, 2021

View reviewed changes

mwesleyj reviewed Aug 26, 2021

View reviewed changes

valhassan added 2 commits September 3, 2021 12:47

minor fixes: addressing code review

636e084

- losses: __init__.py - utils: geoutils.py, visualization.py - gdl_hyperopt_HPC removed - README.md correction - train_segmentation.py: space indent fixes, checkpoint path logged on mlflow

CharlesAuthier reviewed Sep 3, 2021

View reviewed changes

valhassan added 3 commits September 8, 2021 09:55

- fix: param dict passed explicitly to avoid global calls.

427fcaf

- fix, clipped raster

563191f

- minor fixes: removed dask, added time checks

7fa444d

mpelchat04 reviewed Sep 24, 2021

View reviewed changes

remtav reviewed Sep 28, 2021

View reviewed changes

Fixed sugggested changes by reviewers

9fde1b1

- gdl_hyperopt_template.py - inference.py

mpelchat04 approved these changes Oct 5, 2021

View reviewed changes

CharlesAuthier approved these changes Oct 5, 2021

View reviewed changes

remtav merged commit d00c443 into NRCan:develop Oct 5, 2021

remtav reviewed Oct 5, 2021

View reviewed changes

		@@ -43,112 +45,223 @@
		logging.getLogger(__name__)


		def calc_inference_chunk_size(gpu_devices_dict: dict, max_pix_per_mb_gpu: int = 350):

		print("Number of features written: {}".format(i))


		def gen_img_samples(src, chunk_size, *band_order):

		@@ -487,37 +527,6 @@ def ordereddict_eval(str_to_eval: str):
		return str_to_eval


		def defaults_from_params(params, key=None):

Major and Minor updates to GDL #206

Major and Minor updates to GDL #206

Conversation

valhassan commented Aug 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwesleyj Aug 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwesleyj Aug 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valhassan Oct 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwesleyj Aug 25, 2021 •

edited

Loading

mwesleyj Aug 25, 2021 •

edited

Loading

valhassan Oct 6, 2021 •

edited

Loading