Model.Train and Cross Validation #147

dannyhow12 · 2022-04-29T15:52:05Z

Hi and good day,

Thank you for the wonderful repo which has been super user friendly.

I would like to extend a question on Model.Train vs Cross Validation, where the cross validation was used in the KiTS19.ipynb example. However, due to the usage limit in Google Colab free version, it could not be completed. Thus, I am attempting to train an alternative method which is model.train, where I believe that it is shown in the BRATS2020.ipynb example, as well as referencing it from model.py

However, despite running the code on Google Colab, the training does not seem to start at all as it just seems to be loading forever. Could you please point out on whether my method of calling model.train in this source code is correct? Many thanks.

`
import tensorflow as tf
import os
from tensorflow.python.keras.saving.saving_utils import model_metadata
from miscnn.data_loading.interfaces.nifti_io import NIFTI_interface
from miscnn.data_loading.data_io import Data_IO
from miscnn.processing.data_augmentation import Data_Augmentation
from miscnn.processing.subfunctions.normalization import Normalization
from miscnn.processing.subfunctions.clipping import Clipping
from miscnn.processing.subfunctions.resampling import Resampling
from miscnn.processing.preprocessor import Preprocessor
from miscnn.neural_network.model import Neural_Network
from miscnn.neural_network.architecture.unet.standard import Architecture
from miscnn.neural_network.metrics import dice_soft, dice_crossentropy, tversky_loss
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

#os.environ["CUDA_VISIBLE_DEVICES"] = "0"

#Initialize the NIfTI I/O interface and configure the images as one channel (grayscale) and three segmentation classes (background, kidney, tumor)
interface = NIFTI_interface(pattern="case_00[0-9]*", channels=1, classes=3)

#Specify the kits19 data directory
data_path = "/content/drive/MyDrive/data1/"
#Create the Data I/O object
data_io = Data_IO(interface, data_path)

sample_list = data_io.get_indiceslist()
sample_list.sort()

#Create and configure the Data Augmentation class
#data_aug = Data_Augmentation(cycles=2, scaling=True, rotations=True, elastic_deform=True, mirror=True, brightness=True, contrast=True, gamma=True, gaussian_noise=True)

#Create a pixel value normalization Subfunction through Z-Score
sf_normalize = Normalization(mode='z-score')
#Create a clipping Subfunction between -79 and 304
#sf_clipping = Clipping(min=-79, max=304)
#Create a resampling Subfunction to voxel spacing 3.22 x 1.62 x 1.62
sf_resample = Resampling((3.22, 1.62, 1.62))

#Assemble Subfunction classes into a list
#Be aware that the Subfunctions will be exectued according to the list order!
#29042022 version: removed sf_clipping
subfunctions = [sf_resample, sf_normalize]

#data_aug=data_aug Add inside Preprocessor 29042022 11.44pm removed
#Create and configure the Preprocessor class
pp = Preprocessor(data_io, batch_size=4, subfunctions=subfunctions, prepare_subfunctions=True,
prepare_batches=False, analysis="patchwise-crop", patch_shape=(80, 160, 160),
use_multiprocessing=True)

#Adjust the patch overlap for predictions
pp.patchwise_overlap = (40, 80, 80)

#Create the Neural Network model
unet_standard = Architecture(depth=4, activation="softmax", batch_normalization=True)
model = Neural_Network(preprocessor=pp, architecture=unet_standard, loss=tversky_loss, metrics=[dice_soft, dice_crossentropy], learning_rate=0.0001)

#Define Callbacks
cb_lr = ReduceLROnPlateau(monitor='loss', factor=0.1, patience=20, verbose=1, mode='min', min_delta=0.0001, cooldown=1, min_lr=0.00001)
cb_es = EarlyStopping(monitor='loss', min_delta=0, patience=150, verbose=1, mode='min')
#cb_cp = ModelCheckpoint("models/kits_unet.{epoch:02d}.hdf5", monitor='val_loss', verbose=1, save_freq=90*20)

model.train(sample_list, epochs=10, iterations=5, callbacks=[cb_lr, cb_es])
`

At the terminal, it just shows

/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/adam.py:105: UserWarning: The lr argument is deprecated, use learning_rate instead.
super(Adam, self).init(name, **kwargs)

and nothing more.

Regards,
Danny

**Quite interestingly, the terminal was able to show something and its epoch, after the use_multiprocessing was set to False.

The text was updated successfully, but these errors were encountered:

muellerdo · 2022-05-06T11:58:04Z

Hello @dannyhow12,

thank you for your kind words!

Sorry for the late reply, did you already find a solution for this issue?

Your code looks fine and shouldn't be the problem.

**Quite interestingly, the terminal was able to show something and its epoch, after the use_multiprocessing was set to False.

That would be also one of my first recommendations to turn of multiprocessing (use_multiprocessing).
Tensorflow is by default extremely cpu hungry, but I do not have much experience on multiprocessing in the Google Colab environment.

Be aware: If you have prepare_subfunctions=True, the training will start after preprocessing the complete dataset which will take a while for a larger 3D dataset like kits19 (but, I guess, it should not take longer than 30mins. On our workstation it was about 10mins).

Also be aware that MIScnn is currently only working with the dev-branch on Google Colab due to the requirement of Python 3.8: Check out this issue #146

Cheers,
Dominik

dannyhow12 · 2022-05-06T14:39:38Z

Hello @muellerdo,

Yep, I found the solution after setting use_multiprocessing to false, and it was able to train on Google Colab.

Interesting, however on Google Colab it took around 1h 30 minutes to process all 300 images of the KiTS21 dataset. Also, if I were to reuse the same batches for subsequent training, is there anyway that I can perform such an action based on the MIScnn framework? Because there are usage limits for Google Colab, thus the current implementation that I am working on is by setting ModelCheckpoint as callbacks.

Despite saying so, there is a drawback to this, as during each initiation of the training, the datasets need to be preprocessed again and this takes quite a long time. Is there a way where I could load the pickle file generated directly and to be implemented in the tranining?
Upon reviewing the data_io.py, the delete_batchDir denotes that if
True = delete temporary batches directory
False = delete only the batch data for the current seed
I am hoping for more clarification regarding this. Thank you

Yep, I am aware of the issue raised in #146 . Thank you for reminding.

Thank you for the response!

Best regards,
Danny

riki-igarashi · 2023-02-28T00:41:40Z

Thank you for providing us with a great library!

I think this problem is caused by using multiprocessing on Windows.
On Windows, we need to explicitly pass a global variable when creating a child process in multiprocessing.
Therefore, in the current code, the seed value of Data_IO changes each time a child process is created.

ref) Python Multiprocess diff between Windows and Linux

If only Google Colab is supported, my code may be meaningless, but here is a fix that will work on Windows with use_multiprocessing=True.

miscnn/processing/preprocessor.py

    def run_subfunctions(self, indices_list, training=True):
        # Prepare subfunctions using single threading
        if not self.use_multiprocessing or not training:
            for index in indices_list:
                self.prepare_sample_subfunctions(index, training)
        # Prepare subfunctions using multiprocessing
        else:
            pool = mp.Pool(int(self.mp_threads))
            pool.map(partial(self.prepare_sample_subfunctions,
                             training=training, seed=self.data_io.seed),  # change here !
                     indices_list)
            pool.close()
            pool.join()

    # Wrapper function to process subfunctions for a single sample
    def prepare_sample_subfunctions(self, index, training, seed=None):    # change here !
        # Load sample
        if seed is not None:                                              # change here !
            self.data_io.seed = seed
        sample = self.data_io.sample_loader(index, load_seg=training)
        # Run provided subfunctions on imaging data
        for sf in self.subfunctions:
            sf.preprocessing(sample, training=training)
        # Transform array data types in order to save disk space
        sample.img_data = np.array(sample.img_data, dtype=np.float32)
        if training:
            sample.seg_data = np.array(sample.seg_data, dtype=np.uint8)
        # Backup sample as pickle to disk
        self.data_io.backup_sample(sample)

It is a bit oddly written, but it works in my environment.

Hope this helps someone out there!

ps) I think this issue (#77) is caused by the same problem

muellerdo · 2023-02-28T16:01:40Z

Hey @riki-igarashi,

thank you for this contribution!
Definitely helpful for Windows users :)

Best Regards,
Dominik

muellerdo self-assigned this May 6, 2022

muellerdo added the bug Something isn't working label May 6, 2022

muellerdo added the information label Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model.Train and Cross Validation #147

Model.Train and Cross Validation #147

dannyhow12 commented Apr 29, 2022 •

edited

Loading

muellerdo commented May 6, 2022

dannyhow12 commented May 6, 2022 •

edited

Loading

riki-igarashi commented Feb 28, 2023

muellerdo commented Feb 28, 2023

Model.Train and Cross Validation #147

Model.Train and Cross Validation #147

Comments

dannyhow12 commented Apr 29, 2022 • edited Loading

muellerdo commented May 6, 2022

dannyhow12 commented May 6, 2022 • edited Loading

riki-igarashi commented Feb 28, 2023

muellerdo commented Feb 28, 2023

dannyhow12 commented Apr 29, 2022 •

edited

Loading

dannyhow12 commented May 6, 2022 •

edited

Loading