-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major and Minor updates to GDL #206
Conversation
Docker image automatisation
- minor refactor
- minor fixes and changes
- removed redundant log_artifact function
…into develop # Conflicts: # inference.py # models/model_choice.py # train_segmentation.py # utils/utils.py added sample_creation.py script
- dice loss fixed - added weight to duo loss
- customizable file paths added for hyperopt assets
TypeError: load_state_dict() got an unexpected keyword argument 'strict'
- unet_pretrained_101 model added - minor changes to geoutils.py - updated to metrics.py - minor fix to windowing func utils.py - update to images_to_samples.py - major changes to inference.py - minor changes to sample_creation.py
…into develop # Conflicts: # inference.py # train_segmentation.py
``` | ||
> For Windows OS: | ||
##### For Docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks that nice
gdl_hyperopt_HPC.py
Outdated
be modified here, as well as GDL config modification logic within the objective_with_args | ||
function. | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put a section in the README.md
on how to launch it pls :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be removed
gdl_hyperopt_HPC.py
Outdated
# 'optimizer': hp.choice('optimizer', ['adam', 'adabound']), | ||
# 'learning_rate': hp.loguniform('learning_rate', np.log(1e-7), np.log(0.1))} | ||
|
||
my_space = {'loss_fn': hp.choice('loss_fn', ['CrossEntropy', 'Lovasz', 'Duo']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can that dict be a yaml or a csv ? maybe in a future PR otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this template will be removed, it is experiment specific supposed to live on a private branch
'rivers_weight': hp.uniform('rivers_weight', 1.0, 10.0), | ||
'flood_weight': hp.uniform('flood_weight', 1.0, 10.0), | ||
'noise': hp.choice('noise', [0.0, 1.0])} | ||
my_space = {'model_name': hp.choice('model_name', ['unet_pretrained', 'deeplabv3_resnet101']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment that the HPC one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be tackled in a future PR
@@ -123,7 +124,8 @@ def vis(vis_params, | |||
assert vis_path.parent.is_dir() | |||
vis_path.mkdir(exist_ok=True) | |||
|
|||
if not vis_params['inference_input_path']: # FIXME: function parameters should not come in as different types if inference or not. | |||
if not vis_params[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? just put the comment upper the if and not redirect on a second line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled and merged this change, will fix
@@ -71,8 +73,8 @@ def load_from_checkpoint(checkpoint, model, optimizer=None, inference:str=''): | |||
strict_loading = False if not inference else True | |||
model.load_state_dict(checkpoint['model'], strict=strict_loading) | |||
logging.info(f"=> loaded model\n") | |||
if optimizer and 'optimizer' in checkpoint.keys(): # 2nd condition if loading a model without optimizer | |||
optimizer.load_state_dict(checkpoint['optimizer'], strict=False) | |||
# if optimizer and 'optimizer' in checkpoint.keys(): # 2nd condition if loading a model without optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should stay (although I dont feel strongly either way here, so feel free to disagree)
- as if we keep it:
ifoptimizer == None
then it will not change anything
ifoptimizer != None
it will load a better/more accurate/more pre-trained checkpoint
^ for lack of better words - if we remove it
we should remove the optimizer var from the params, docs, & return line
(vic's PR's Lines 55, 60, 78)
@victorlazio109 I was having issues with this line too & the way I was able to solve it was to remove the strict=False
from line 77
my code:
if optimizer and 'optimizer' in checkpoint.keys(): # 2nd condition if loading a model without optimizer
optimizer.load_state_dict(checkpoint['optimizer'])#, strict=False)
- lastly, if it is removed to be able to load checkpoints with diff optimizers..
maybe we could add anif
statement?
here is a helpful link if this is the case:
https://stackoverflow.com/questions/64301566/how-to-check-if-an-object-is-a-certain-pytorch-optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a difference between your code and the commented lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nvm, you commented out strict=False, I will try
train_segmentation.py
Outdated
@@ -411,14 +411,15 @@ def evaluation(eval_loader, | |||
if batch_index in range(min_vis_batch, max_vis_batch, increment): | |||
vis_path = progress_log.parent.joinpath('visualization') | |||
if ep_idx == 0 and batch_index == min_vis_batch: | |||
logging.info(f'Visualizing on {dataset} outputs for batches in range {vis_params["vis_batch_range"]}. All ' | |||
f'images will be saved to {vis_path}\n') | |||
logging.info(f'Visualizing on {dataset} outputs for batches in range {vis_params[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do that instead ? (Little tic from me sorry)
logging.info(
f'Visualizing on {dataset} outputs for batches in range {vis_params["vis_batch_range"]}. '
f' All images will be saved to {vis_path}\n')
train_segmentation.py
Outdated
@@ -317,13 +316,14 @@ def train(train_loader, | |||
if batch_index in range(min_vis_batch, max_vis_batch, increment): | |||
vis_path = progress_log.parent.joinpath('visualization') | |||
if ep_idx == 0: | |||
logging.info(f'Visualizing on train outputs for batches in range {vis_params["vis_batch_range"]}. All images will be saved to {vis_path}\n') | |||
logging.info(f'Visualizing on train outputs for batches in range {vis_params[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do that instead ? (Little tic from me sorry)
logging.info(
f'Visualizing on train outputs for batches in range {vis_params["vis_batch_range"]}.'
f' All images will be saved to {vis_path}\n')
README.md
Outdated
@@ -32,7 +32,7 @@ The final step in the process is to assign every pixel in the original image a v | |||
## **Requirement** | |||
This project comprises a set of commands to be run at a shell command prompt. Examples used here are for a bash shell in an Ubuntu GNU/Linux environment. | |||
|
|||
- [Python 3.6](https://www.python.org/downloads/release/python-360/), see the full list of dependencies in [requirements.txt](requirements.txt) | |||
- [Python 3.7.6](https://www.python.org/downloads/release/python-376/), see the full list of dependencies in [environment.yml](environment.yml) | |||
- [mlflow](https://mlflow.org/) | |||
- [minicanda](https://docs.conda.io/en/latest/miniconda.html) (highly recommended) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo : miniconda
params['global']['model_name'] = hparams['model_name'] | ||
# ToDo: Should adjust batch size as a function of model and target size... | ||
params['training']['class_weights'] = [1.0, hparams['permanent_water_weight'], hparams['rivers_weight'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A (hyperopt) template should not have parameters clearly linked to a specific application, e.g. floods like in this case. "specific structure of the GDL config file" indeed ... We may need a structure of YAML files as a function of application that a template could make use of.
utils/geoutils.py
Outdated
# Get extent of gpkg data with fiona | ||
with fiona.open(gpkg, 'r') as src: | ||
gpkg_crs = src.crs | ||
assert gpkg_crs == raster.crs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use utils.verification.assert_crs_match(...)
instead?
as a few times I have noticed they dont always match
or
I need to check src.crs['init'] instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I realize src.crs['init'] opens up a whole new can of worms
so mb ignore for now ( I have put on my # ToDo:
list )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove that assertion, it isn't meant to be there
dest.write(out_img) | ||
return out_tif | ||
except ValueError as e: # if gpkg's extent outside raster: "ValueError: Input shapes do not overlap raster." | ||
# TODO: warning or exception? if warning, except must be set in images_to_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does utils.readers.image_reader_as_array( ... )
still need to call clip_raster_with_gpkg( ... )
?
because it still expects 2 returns (not 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning (I think)
But this (Lines 102-104) should prob also return something
thinking off the top of my head to fix: I would say
return False
pass thru samples.creation.process_raster_img
after rst_pth, r_ = process_raster_img(img_pan, gpkg)
test if rst_path==False
then trigger a flag?
then have the flag break
out of loop
losses/__init__.py
Outdated
@@ -38,7 +38,7 @@ def forward(self, preds, labels): | |||
cals = [] | |||
for obj in self.criterion: | |||
cals.append(obj(preds, labels)) | |||
loss = sum(cals) | |||
loss = sum(cals) * 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to add a lil more 'robustness' i would suggest maybe:
loss = sum(cals) / len(self.criterion)
- losses: __init__.py - utils: geoutils.py, visualization.py - gdl_hyperopt_HPC removed - README.md correction - train_segmentation.py: space indent fixes, checkpoint path logged on mlflow
- models: model_choice.py #fixed cuda runtime error - utils: augmentation.py #logging info statement removed, visualization.py #indentation fix - inference.py # new memory management features added, smoothing func improved - train_segmentation.py: space indent fixes, cuda runtime error fixes
inference.py
Outdated
raster_info={}) | ||
|
||
sample['metadata'] = image_metadata | ||
totensor_transform = augmentation.compose_transforms(params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params
is called but never asign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated this behavior; params
is not passed to the local function but called globally. Difficult to detect because no error is thrown. This is not ideal and has been fixed with the latest commit. Thanks for pointing it out.
@@ -0,0 +1,569 @@ | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this script a replacement for Images_to_samples.py? If so, I don't think it's very usefull to have both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily, it exists to support the new way our images(*single bands) are called and processed. Let me know if you have suggestions where you want it to live.
|
||
try: | ||
mlrun = get_latest_mlrun(params) | ||
run_name_split = mlrun.data.tags['mlflow.runName'].split('_') | ||
params['global']['mlflow_run_name'] = run_name_split[0] + f'_{int(run_name_split[1])+1}' | ||
params['global']['mlflow_run_name'] = run_name_split[0] + f'_{int(run_name_split[1]) + 1}' | ||
except: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why try/except statement here? If necessary, maybe it should be narrowed down to catch only known errors. Not a priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be worked on, previous work from an intern
gdl_hyperopt_template.py
Outdated
params['training']['state_dict_path'] = params['training']['dict_unet'] | ||
elif params['global']['model_name'] == "deeplabv3_resnet101": | ||
params['training']['state_dict_path'] = params['training']['dict_deeplab'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dict_[model] parameter comes from where? Is it an output of a previous run? Maybe a comment or two would help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be taken off, it is unique to my special use case of hyperopt.
@@ -247,6 +248,7 @@ def samples_preparation(in_img_array, | |||
# Stratification bias | |||
if (stratd is not None) and (dataset == 'trn'): | |||
tile_size = target.size | |||
u, count = np.unique(target, return_counts=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing this bug.
@@ -43,112 +45,223 @@ | |||
logging.getLogger(__name__) | |||
|
|||
|
|||
def calc_inference_chunk_size(gpu_devices_dict: dict, max_pix_per_mb_gpu: int = 350): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this function? Did it cause problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it does not work well with the revised smoothening function. I can explain further. It has to do with the padding of the image x2 of its dimension, so you have to provide a static chunk_size which will not burst the memory during operations.
step = int(chunk_size / subdiv) | ||
for row in range(0, src.height, step): | ||
for column in range(0, src.width, step): | ||
window = Window.from_slices(slice(row, row + chunk_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah Window!
print("Number of features written: {}".format(i)) | ||
|
||
|
||
def gen_img_samples(src, chunk_size, *band_order): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice generator function. please add docstring to this new function :)
|
||
|
||
@torch.no_grad() | ||
def segmentation(img_array, | ||
def segmentation(param, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here again, docstrings could help understand what arguments are expected. Not sure I understand the tp_mem argument. Does it stand for "temporary memory"? It's the image to be inferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add docstrings amd make proper comments. tp_mem is a temp_file to write very large arrays to disk at inference
WINDOW_SPLINE_2D = torch.as_tensor(np.moveaxis(WINDOW_SPLINE_2D, 2, 0), ).type(torch.float) | ||
WINDOW_SPLINE_2D = WINDOW_SPLINE_2D.to(device) | ||
|
||
fp = np.memmap(tp_mem, dtype='float16', mode='w+', shape=(h_, w_, num_classes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more comments from here on? Why do you create a memory map from tp_mem? What does it become further on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically a temporary memory map file was introduced to aid inference on very large tifs
fp = np.memmap(tp_mem, dtype='float16', mode='r', shape=(h_, w_, num_classes)) | ||
subdiv = 2.0 | ||
step = int(chunk_size / subdiv) | ||
pred_img = np.zeros((h_, w_), dtype=np.uint8) | ||
for row in tqdm(range(0, input_image.height, step), position=2, leave=False): | ||
for col in tqdm(range(0, input_image.width, step), position=3, leave=False): | ||
arr1 = fp[row:row + chunk_size, col:col + chunk_size, :] / (2 ** 2) | ||
arr1 = arr1.argmax(axis=-1).astype('uint8') | ||
pred_img[row:row + chunk_size, col:col + chunk_size] = arr1 | ||
pred_img = pred_img[:h, :w] | ||
end_seg = time.time() - start_seg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here again, comments would help follow what's going on. Many lines are similar to those in gen_img_sample(). Could this be refactored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most probably, code improvement is always work in progress
@@ -487,37 +527,6 @@ def ordereddict_eval(str_to_eval: str): | |||
return str_to_eval | |||
|
|||
|
|||
def defaults_from_params(params, key=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was added by Blaise, right? Why remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was refactored somewhere else and it became redundant
- gdl_hyperopt_template.py - inference.py
# list of GPU devices that are available and unused. If no GPUs, returns empty list | ||
gpu_devices_dict = get_device_ids(num_devices, | ||
max_used_ram_perc=max_used_ram, | ||
max_used_perc=max_used_perc) | ||
logging.info(f'GPUs devices available: {gpu_devices_dict}') | ||
num_devices = len(gpu_devices_dict.keys()) | ||
device = torch.device(f'cuda:{list(gpu_devices_dict.keys())[0]}' if gpu_devices_dict else 'cpu') | ||
|
||
logging.info(f'Creating dataloaders from data in {samples_folder}...\n') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vic, any reason why you removed this? Were you having trouble with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to model_choice.py. made a fix too of the frequent bug on HPC of not being able to use GPUs with ids other than 0
logging.debug(f'Unique values in loaded raster: {np.unique(img_array)}\n' | ||
f'Shape of raster: {img_array.shape}') | ||
for info in tqdm(list_img, desc='Inferring from images', position=0, leave=True): | ||
with start_run(run_name=Path(info['tif']).name, nested=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a little bug here. at line 475, start_run is imported only if a mlflow uri is inputted. We can just import everything at the top of the script no matter if the uri is given or not. I can address this bugfix in my next PR.
Main Highlights