Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major and Minor updates to GDL #206

Merged
merged 42 commits into from
Oct 5, 2021
Merged

Major and Minor updates to GDL #206

merged 42 commits into from
Oct 5, 2021

Conversation

valhassan
Copy link
Collaborator

Main Highlights

  • New sample creation script
  • Major updates to inference.py
  • minor fixes across GDL

ychoquet and others added 30 commits March 18, 2021 09:23
Docker image automatisation
- minor refactor
- removed redundant log_artifact function
…into develop

# Conflicts:
#	inference.py
#	models/model_choice.py
#	train_segmentation.py
#	utils/utils.py

added sample_creation.py script
- dice loss fixed
- added weight to duo loss
- customizable file paths added for hyperopt assets
TypeError: load_state_dict() got an unexpected keyword argument 'strict'
valhassan added 5 commits July 22, 2021 12:18
- unet_pretrained_101 model added
- minor changes to geoutils.py
- updated to metrics.py
- minor fix to windowing func utils.py
- update to images_to_samples.py
- major changes to inference.py
- minor changes to sample_creation.py
…into develop

# Conflicts:
#	inference.py
#	train_segmentation.py
```
> For Windows OS:
##### For Docker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks that nice

be modified here, as well as GDL config modification logic within the objective_with_args
function.

"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a section in the README.md on how to launch it pls :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be removed

# 'optimizer': hp.choice('optimizer', ['adam', 'adabound']),
# 'learning_rate': hp.loguniform('learning_rate', np.log(1e-7), np.log(0.1))}

my_space = {'loss_fn': hp.choice('loss_fn', ['CrossEntropy', 'Lovasz', 'Duo']),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that dict be a yaml or a csv ? maybe in a future PR otherwise

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this template will be removed, it is experiment specific supposed to live on a private branch

'rivers_weight': hp.uniform('rivers_weight', 1.0, 10.0),
'flood_weight': hp.uniform('flood_weight', 1.0, 10.0),
'noise': hp.choice('noise', [0.0, 1.0])}
my_space = {'model_name': hp.choice('model_name', ['unet_pretrained', 'deeplabv3_resnet101']),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment that the HPC one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be tackled in a future PR

@@ -123,7 +124,8 @@ def vis(vis_params,
assert vis_path.parent.is_dir()
vis_path.mkdir(exist_ok=True)

if not vis_params['inference_input_path']: # FIXME: function parameters should not come in as different types if inference or not.
if not vis_params[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? just put the comment upper the if and not redirect on a second line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled and merged this change, will fix

@@ -71,8 +73,8 @@ def load_from_checkpoint(checkpoint, model, optimizer=None, inference:str=''):
strict_loading = False if not inference else True
model.load_state_dict(checkpoint['model'], strict=strict_loading)
logging.info(f"=> loaded model\n")
if optimizer and 'optimizer' in checkpoint.keys(): # 2nd condition if loading a model without optimizer
optimizer.load_state_dict(checkpoint['optimizer'], strict=False)
# if optimizer and 'optimizer' in checkpoint.keys(): # 2nd condition if loading a model without optimizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete it ?

Copy link
Collaborator

@mwesleyj mwesleyj Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should stay (although I dont feel strongly either way here, so feel free to disagree)

  • as if we keep it:
    if optimizer == None then it will not change anything
    if optimizer != None it will load a better/more accurate/more pre-trained checkpoint
    ^ for lack of better words
  • if we remove it
    we should remove the optimizer var from the params, docs, & return line
    (vic's PR's Lines 55, 60, 78)

@victorlazio109 I was having issues with this line too & the way I was able to solve it was to remove the strict=False from line 77

my code:

    if optimizer and 'optimizer' in checkpoint.keys():    # 2nd condition if loading a model without optimizer
        optimizer.load_state_dict(checkpoint['optimizer'])#, strict=False)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a difference between your code and the commented lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nvm, you commented out strict=False, I will try

@@ -411,14 +411,15 @@ def evaluation(eval_loader,
if batch_index in range(min_vis_batch, max_vis_batch, increment):
vis_path = progress_log.parent.joinpath('visualization')
if ep_idx == 0 and batch_index == min_vis_batch:
logging.info(f'Visualizing on {dataset} outputs for batches in range {vis_params["vis_batch_range"]}. All '
f'images will be saved to {vis_path}\n')
logging.info(f'Visualizing on {dataset} outputs for batches in range {vis_params[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do that instead ? (Little tic from me sorry)

logging.info(
    f'Visualizing on {dataset} outputs for batches in range {vis_params["vis_batch_range"]}. '
    f' All images will be saved to {vis_path}\n')

@@ -317,13 +316,14 @@ def train(train_loader,
if batch_index in range(min_vis_batch, max_vis_batch, increment):
vis_path = progress_log.parent.joinpath('visualization')
if ep_idx == 0:
logging.info(f'Visualizing on train outputs for batches in range {vis_params["vis_batch_range"]}. All images will be saved to {vis_path}\n')
logging.info(f'Visualizing on train outputs for batches in range {vis_params[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do that instead ? (Little tic from me sorry)

logging.info(
    f'Visualizing on train outputs for batches in range {vis_params["vis_batch_range"]}.'
    f' All images will be saved to {vis_path}\n')

README.md Outdated
@@ -32,7 +32,7 @@ The final step in the process is to assign every pixel in the original image a v
## **Requirement**
This project comprises a set of commands to be run at a shell command prompt. Examples used here are for a bash shell in an Ubuntu GNU/Linux environment.

- [Python 3.6](https://www.python.org/downloads/release/python-360/), see the full list of dependencies in [requirements.txt](requirements.txt)
- [Python 3.7.6](https://www.python.org/downloads/release/python-376/), see the full list of dependencies in [environment.yml](environment.yml)
- [mlflow](https://mlflow.org/)
- [minicanda](https://docs.conda.io/en/latest/miniconda.html) (highly recommended)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo : miniconda

params['global']['model_name'] = hparams['model_name']
# ToDo: Should adjust batch size as a function of model and target size...
params['training']['class_weights'] = [1.0, hparams['permanent_water_weight'], hparams['rivers_weight'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A (hyperopt) template should not have parameters clearly linked to a specific application, e.g. floods like in this case. "specific structure of the GDL config file" indeed ... We may need a structure of YAML files as a function of application that a template could make use of.

# Get extent of gpkg data with fiona
with fiona.open(gpkg, 'r') as src:
gpkg_crs = src.crs
assert gpkg_crs == raster.crs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use utils.verification.assert_crs_match(...) instead?

as a few times I have noticed they dont always match
or
I need to check src.crs['init'] instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I realize src.crs['init'] opens up a whole new can of worms
so mb ignore for now ( I have put on my # ToDo: list )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove that assertion, it isn't meant to be there

dest.write(out_img)
return out_tif
except ValueError as e: # if gpkg's extent outside raster: "ValueError: Input shapes do not overlap raster."
# TODO: warning or exception? if warning, except must be set in images_to_samples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does utils.readers.image_reader_as_array( ... ) still need to call clip_raster_with_gpkg( ... ) ?
because it still expects 2 returns (not 1)

Copy link
Collaborator

@mwesleyj mwesleyj Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning (I think)
But this (Lines 102-104) should prob also return something

thinking off the top of my head to fix: I would say
return False
pass thru samples.creation.process_raster_img
after rst_pth, r_ = process_raster_img(img_pan, gpkg) test if rst_path==False then trigger a flag?
then have the flag break out of loop

@@ -38,7 +38,7 @@ def forward(self, preds, labels):
cals = []
for obj in self.criterion:
cals.append(obj(preds, labels))
loss = sum(cals)
loss = sum(cals) * 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to add a lil more 'robustness' i would suggest maybe:

loss = sum(cals) / len(self.criterion)

valhassan added 2 commits September 3, 2021 12:47
- losses: __init__.py
- utils: geoutils.py, visualization.py
- gdl_hyperopt_HPC removed
- README.md correction
- train_segmentation.py: space indent fixes, checkpoint path logged on mlflow
- models: model_choice.py #fixed cuda runtime error
- utils: augmentation.py #logging info statement removed, visualization.py #indentation fix
- inference.py # new memory management features added, smoothing func improved
- train_segmentation.py: space indent fixes, cuda runtime error fixes
inference.py Outdated
raster_info={})

sample['metadata'] = image_metadata
totensor_transform = augmentation.compose_transforms(params,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params is called but never asign

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated this behavior; params is not passed to the local function but called globally. Difficult to detect because no error is thrown. This is not ideal and has been fixed with the latest commit. Thanks for pointing it out.

@@ -0,0 +1,569 @@
import argparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this script a replacement for Images_to_samples.py? If so, I don't think it's very usefull to have both.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily, it exists to support the new way our images(*single bands) are called and processed. Let me know if you have suggestions where you want it to live.


try:
mlrun = get_latest_mlrun(params)
run_name_split = mlrun.data.tags['mlflow.runName'].split('_')
params['global']['mlflow_run_name'] = run_name_split[0] + f'_{int(run_name_split[1])+1}'
params['global']['mlflow_run_name'] = run_name_split[0] + f'_{int(run_name_split[1]) + 1}'
except:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why try/except statement here? If necessary, maybe it should be narrowed down to catch only known errors. Not a priority.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be worked on, previous work from an intern

Comment on lines 78 to 80
params['training']['state_dict_path'] = params['training']['dict_unet']
elif params['global']['model_name'] == "deeplabv3_resnet101":
params['training']['state_dict_path'] = params['training']['dict_deeplab']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dict_[model] parameter comes from where? Is it an output of a previous run? Maybe a comment or two would help here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be taken off, it is unique to my special use case of hyperopt.

@@ -247,6 +248,7 @@ def samples_preparation(in_img_array,
# Stratification bias
if (stratd is not None) and (dataset == 'trn'):
tile_size = target.size
u, count = np.unique(target, return_counts=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this bug.

@@ -43,112 +45,223 @@
logging.getLogger(__name__)


def calc_inference_chunk_size(gpu_devices_dict: dict, max_pix_per_mb_gpu: int = 350):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this function? Did it cause problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does not work well with the revised smoothening function. I can explain further. It has to do with the padding of the image x2 of its dimension, so you have to provide a static chunk_size which will not burst the memory during operations.

step = int(chunk_size / subdiv)
for row in range(0, src.height, step):
for column in range(0, src.width, step):
window = Window.from_slices(slice(row, row + chunk_size),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah Window!

print("Number of features written: {}".format(i))


def gen_img_samples(src, chunk_size, *band_order):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice generator function. please add docstring to this new function :)



@torch.no_grad()
def segmentation(img_array,
def segmentation(param,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here again, docstrings could help understand what arguments are expected. Not sure I understand the tp_mem argument. Does it stand for "temporary memory"? It's the image to be inferred?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add docstrings amd make proper comments. tp_mem is a temp_file to write very large arrays to disk at inference

WINDOW_SPLINE_2D = torch.as_tensor(np.moveaxis(WINDOW_SPLINE_2D, 2, 0), ).type(torch.float)
WINDOW_SPLINE_2D = WINDOW_SPLINE_2D.to(device)

fp = np.memmap(tp_mem, dtype='float16', mode='w+', shape=(h_, w_, num_classes))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more comments from here on? Why do you create a memory map from tp_mem? What does it become further on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically a temporary memory map file was introduced to aid inference on very large tifs

Comment on lines +243 to +253
fp = np.memmap(tp_mem, dtype='float16', mode='r', shape=(h_, w_, num_classes))
subdiv = 2.0
step = int(chunk_size / subdiv)
pred_img = np.zeros((h_, w_), dtype=np.uint8)
for row in tqdm(range(0, input_image.height, step), position=2, leave=False):
for col in tqdm(range(0, input_image.width, step), position=3, leave=False):
arr1 = fp[row:row + chunk_size, col:col + chunk_size, :] / (2 ** 2)
arr1 = arr1.argmax(axis=-1).astype('uint8')
pred_img[row:row + chunk_size, col:col + chunk_size] = arr1
pred_img = pred_img[:h, :w]
end_seg = time.time() - start_seg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here again, comments would help follow what's going on. Many lines are similar to those in gen_img_sample(). Could this be refactored?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most probably, code improvement is always work in progress

@@ -487,37 +527,6 @@ def ordereddict_eval(str_to_eval: str):
return str_to_eval


def defaults_from_params(params, key=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was added by Blaise, right? Why remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was refactored somewhere else and it became redundant

- gdl_hyperopt_template.py
- inference.py
@remtav remtav merged commit d00c443 into NRCan:develop Oct 5, 2021
Comment on lines -619 to -628
# list of GPU devices that are available and unused. If no GPUs, returns empty list
gpu_devices_dict = get_device_ids(num_devices,
max_used_ram_perc=max_used_ram,
max_used_perc=max_used_perc)
logging.info(f'GPUs devices available: {gpu_devices_dict}')
num_devices = len(gpu_devices_dict.keys())
device = torch.device(f'cuda:{list(gpu_devices_dict.keys())[0]}' if gpu_devices_dict else 'cpu')

logging.info(f'Creating dataloaders from data in {samples_folder}...\n')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vic, any reason why you removed this? Were you having trouble with it?

Copy link
Collaborator Author

@valhassan valhassan Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to model_choice.py. made a fix too of the frequent bug on HPC of not being able to use GPUs with ids other than 0

logging.debug(f'Unique values in loaded raster: {np.unique(img_array)}\n'
f'Shape of raster: {img_array.shape}')
for info in tqdm(list_img, desc='Inferring from images', position=0, leave=True):
with start_run(run_name=Path(info['tif']).name, nested=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a little bug here. at line 475, start_run is imported only if a mlflow uri is inputted. We can just import everything at the top of the script no matter if the uri is given or not. I can address this bugfix in my next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants