Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model.Train and Cross Validation #147

Open
dannyhow12 opened this issue Apr 29, 2022 · 4 comments
Open

Model.Train and Cross Validation #147

dannyhow12 opened this issue Apr 29, 2022 · 4 comments
Assignees
Labels
bug Something isn't working information

Comments

@dannyhow12
Copy link

dannyhow12 commented Apr 29, 2022

Hi and good day,

Thank you for the wonderful repo which has been super user friendly.

I would like to extend a question on Model.Train vs Cross Validation, where the cross validation was used in the KiTS19.ipynb example. However, due to the usage limit in Google Colab free version, it could not be completed. Thus, I am attempting to train an alternative method which is model.train, where I believe that it is shown in the BRATS2020.ipynb example, as well as referencing it from model.py

However, despite running the code on Google Colab, the training does not seem to start at all as it just seems to be loading forever. Could you please point out on whether my method of calling model.train in this source code is correct? Many thanks.

`
import tensorflow as tf
import os
from tensorflow.python.keras.saving.saving_utils import model_metadata
from miscnn.data_loading.interfaces.nifti_io import NIFTI_interface
from miscnn.data_loading.data_io import Data_IO
from miscnn.processing.data_augmentation import Data_Augmentation
from miscnn.processing.subfunctions.normalization import Normalization
from miscnn.processing.subfunctions.clipping import Clipping
from miscnn.processing.subfunctions.resampling import Resampling
from miscnn.processing.preprocessor import Preprocessor
from miscnn.neural_network.model import Neural_Network
from miscnn.neural_network.architecture.unet.standard import Architecture
from miscnn.neural_network.metrics import dice_soft, dice_crossentropy, tversky_loss
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

#os.environ["CUDA_VISIBLE_DEVICES"] = "0"

#Initialize the NIfTI I/O interface and configure the images as one channel (grayscale) and three segmentation classes (background, kidney, tumor)
interface = NIFTI_interface(pattern="case_00[0-9]*", channels=1, classes=3)

#Specify the kits19 data directory
data_path = "/content/drive/MyDrive/data1/"
#Create the Data I/O object
data_io = Data_IO(interface, data_path)

sample_list = data_io.get_indiceslist()
sample_list.sort()

#Create and configure the Data Augmentation class
#data_aug = Data_Augmentation(cycles=2, scaling=True, rotations=True, elastic_deform=True, mirror=True, brightness=True, contrast=True, gamma=True, gaussian_noise=True)

#Create a pixel value normalization Subfunction through Z-Score
sf_normalize = Normalization(mode='z-score')
#Create a clipping Subfunction between -79 and 304
#sf_clipping = Clipping(min=-79, max=304)
#Create a resampling Subfunction to voxel spacing 3.22 x 1.62 x 1.62
sf_resample = Resampling((3.22, 1.62, 1.62))

#Assemble Subfunction classes into a list
#Be aware that the Subfunctions will be exectued according to the list order!
#29042022 version: removed sf_clipping
subfunctions = [sf_resample, sf_normalize]

#data_aug=data_aug Add inside Preprocessor 29042022 11.44pm removed
#Create and configure the Preprocessor class
pp = Preprocessor(data_io, batch_size=4, subfunctions=subfunctions, prepare_subfunctions=True,
prepare_batches=False, analysis="patchwise-crop", patch_shape=(80, 160, 160),
use_multiprocessing=True)

#Adjust the patch overlap for predictions
pp.patchwise_overlap = (40, 80, 80)

#Create the Neural Network model
unet_standard = Architecture(depth=4, activation="softmax", batch_normalization=True)
model = Neural_Network(preprocessor=pp, architecture=unet_standard, loss=tversky_loss, metrics=[dice_soft, dice_crossentropy], learning_rate=0.0001)

#Define Callbacks
cb_lr = ReduceLROnPlateau(monitor='loss', factor=0.1, patience=20, verbose=1, mode='min', min_delta=0.0001, cooldown=1, min_lr=0.00001)
cb_es = EarlyStopping(monitor='loss', min_delta=0, patience=150, verbose=1, mode='min')
#cb_cp = ModelCheckpoint("models/kits_unet.{epoch:02d}.hdf5", monitor='val_loss', verbose=1, save_freq=90*20)

model.train(sample_list, epochs=10, iterations=5, callbacks=[cb_lr, cb_es])
`

At the terminal, it just shows

/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/adam.py:105: UserWarning: The lr argument is deprecated, use learning_rate instead.
super(Adam, self).init(name, **kwargs)

and nothing more.

Regards,
Danny

**Quite interestingly, the terminal was able to show something and its epoch, after the use_multiprocessing was set to False.

@muellerdo muellerdo self-assigned this May 6, 2022
@muellerdo muellerdo added the bug Something isn't working label May 6, 2022
@muellerdo
Copy link
Member

Hello @dannyhow12,

thank you for your kind words!

Sorry for the late reply, did you already find a solution for this issue?

Your code looks fine and shouldn't be the problem.

**Quite interestingly, the terminal was able to show something and its epoch, after the use_multiprocessing was set to False.

That would be also one of my first recommendations to turn of multiprocessing (use_multiprocessing).
Tensorflow is by default extremely cpu hungry, but I do not have much experience on multiprocessing in the Google Colab environment.

Be aware: If you have prepare_subfunctions=True, the training will start after preprocessing the complete dataset which will take a while for a larger 3D dataset like kits19 (but, I guess, it should not take longer than 30mins. On our workstation it was about 10mins).

Also be aware that MIScnn is currently only working with the dev-branch on Google Colab due to the requirement of Python 3.8: Check out this issue #146

Cheers,
Dominik

@dannyhow12
Copy link
Author

dannyhow12 commented May 6, 2022

Hello @muellerdo,

Yep, I found the solution after setting use_multiprocessing to false, and it was able to train on Google Colab.

Interesting, however on Google Colab it took around 1h 30 minutes to process all 300 images of the KiTS21 dataset. Also, if I were to reuse the same batches for subsequent training, is there anyway that I can perform such an action based on the MIScnn framework? Because there are usage limits for Google Colab, thus the current implementation that I am working on is by setting ModelCheckpoint as callbacks.

Despite saying so, there is a drawback to this, as during each initiation of the training, the datasets need to be preprocessed again and this takes quite a long time. Is there a way where I could load the pickle file generated directly and to be implemented in the tranining?
Upon reviewing the data_io.py, the delete_batchDir denotes that if
True = delete temporary batches directory
False = delete only the batch data for the current seed
I am hoping for more clarification regarding this. Thank you

Yep, I am aware of the issue raised in #146 . Thank you for reminding.

Thank you for the response!

Best regards,
Danny

@riki-igarashi
Copy link

Thank you for providing us with a great library!

I think this problem is caused by using multiprocessing on Windows.
On Windows, we need to explicitly pass a global variable when creating a child process in multiprocessing.
Therefore, in the current code, the seed value of Data_IO changes each time a child process is created.

ref) Python Multiprocess diff between Windows and Linux

If only Google Colab is supported, my code may be meaningless, but here is a fix that will work on Windows with use_multiprocessing=True.

miscnn/processing/preprocessor.py

    def run_subfunctions(self, indices_list, training=True):
        # Prepare subfunctions using single threading
        if not self.use_multiprocessing or not training:
            for index in indices_list:
                self.prepare_sample_subfunctions(index, training)
        # Prepare subfunctions using multiprocessing
        else:
            pool = mp.Pool(int(self.mp_threads))
            pool.map(partial(self.prepare_sample_subfunctions,
                             training=training, seed=self.data_io.seed),  # change here !
                     indices_list)
            pool.close()
            pool.join()

    # Wrapper function to process subfunctions for a single sample
    def prepare_sample_subfunctions(self, index, training, seed=None):    # change here !
        # Load sample
        if seed is not None:                                              # change here !
            self.data_io.seed = seed
        sample = self.data_io.sample_loader(index, load_seg=training)
        # Run provided subfunctions on imaging data
        for sf in self.subfunctions:
            sf.preprocessing(sample, training=training)
        # Transform array data types in order to save disk space
        sample.img_data = np.array(sample.img_data, dtype=np.float32)
        if training:
            sample.seg_data = np.array(sample.seg_data, dtype=np.uint8)
        # Backup sample as pickle to disk
        self.data_io.backup_sample(sample)

It is a bit oddly written, but it works in my environment.

Hope this helps someone out there!

ps) I think this issue (#77) is caused by the same problem

@muellerdo
Copy link
Member

Hey @riki-igarashi,

thank you for this contribution!
Definitely helpful for Windows users :)

Best Regards,
Dominik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working information
Projects
None yet
Development

No branches or pull requests

3 participants