This document is intended to show how to get started with a training experiment.
In the current setup, the training is driven by a data file and a config file. The data file specifies the training data. the config file specifies the training procedure. Right now, the config file has options for every training procedure implemented, even though many are only relevant to a specific training algorithm, model, etc.
DeepInPy expects a complex-valued, multi-channel MRI format. Even when the data are single-coil, the format should be followed.
The data format is a h5 file, consisting of the following fields:
imgs: [Ntraining, N1, N2, ..., NT, X, Y, Z]: np.complex
masks: [Ntraining, N1, N2, ..., NT, X, Y, Z]: np.float
maps: [Ntraining, Ncoil, N1, N2, ..., NT, X, Y, Z]: np.complex
ksp: [Ntraining, Ncoil, N1, N2, ..., NT, X, Y, Z]: np.complex
Ntraining
is the number of training examples. IfNtraining=1
, it should still be included as a singleton dimension.N1
,N2
, ...,NT
are higher-order dimensions, and can be used for multi-phase data (e.g. temporal, contrast, coefficients, phases, etc.). These dimensions are optional and can be excluded.X
,Y
,Z
are spatial dimensions. In the case of 2D data, theZ
dimension can be excluded.- "imgs" are the fully sampled images used for ground-truth for calculating nrmse and for doing supervised learning
- "ksp" is the kspace that will be down-sampled by the mask. It should be fully sampled, but it technically doesn’t have to be.
- "masks" will be multiplied by ksp each training round
Except for the masks, all data should be stored as complex-valued arrays.
imgs: [100, 256, 256]: np.complex
masks: [100, 256, 256]: np.float
maps: [100, 8, 256, 256]: np.complex
ksp: [100, 8, 256, 256]: np.complex
imgs: [1, 256, 256]: np.complex
masks: [1, 256, 256]: np.float
maps: [1, 1, 256, 256]: np.complex
ksp: [1, 1, 256, 256]: np.complex
Note that the maps
array can be all-ones in this case
imgs: [100, 20, 256, 256]: np.complex
masks: [100, 20, 256, 256]: np.float
maps: [100, 8, 20, 256, 256]: np.complex
ksp: [100, 8, 20, 256, 256]: np.complex
Note that the maps
array can be all-ones in this case
We use the same interface by treating the coil dimension as a higher-order dimension, and creating an all-ones maps array. We tell the code that it is "one-channel" data with a higher-order dimension equal to 8
imgs: [100, 8, 256, 256]: np.complex
masks: [100, 8, 256, 256]: np.float
maps: [100, 1, 8, 256, 256]: np.complex
ksp: [100, 1, 8, 256, 256]: np.complex
To write a data file, you can use the deepinpy.utils.utils.h5_write
function. The function takes the path to the target h5 file and a dictionary of key-value pairs:
# example data writer for 2D images with 10 training examples and 8 coils
from deepinpy.utils.utils import h5_write
imgs = np.random.randn(10, 256, 256, dtype=np.complex)
masks = np.random.randn(10, 256, 256, dtype=np.complex)
maps = np.random.randn(10, 8, 256, 256, dtype=np.float)
ksp = np.random.randn(10, 8, 256, 256, dtype=np.complex)
data = {'imgs': imgs, 'masks': masks, 'maps': maps, 'ksp': ksp}
h5_write('mydata.h5', data)
There is also a similar h5_read
function to load the training set.
DeepInPy is controlled by passing command-line arguments to the main.py
function. To view the command-line args, you can run
python main.py --help
The recommended way to pass command-line args is through the use of a config file:
python main.py --config configs/example.json
The config file is a JSON-formatted file containing the names of the command-line args, and the values to pass. These args will automatically be logged to tensorboard, so that they can be queried/reused.
Note: Not all command-line args will be used, as it depends on the specific model that you use. (TODO: organize command-line args by model/module).
Note 2: By default, DeepInPy will use the cpu for training. You should specify the GPUs to use otherwise
The main config parameters that are necessary to run a training:
data_file
: specifies the path to the data file in hdf5 format (see above section)recon
: the reconstruction method to use (for example, "modl", "cgsense", "resnet", etc.)network
: the neural network to use within the recon, if applicable (for example, "ResNet")
Other config parameters that are not required, but strongly recommended to set:
name
: name of the experiment, which will be tracked in tensorboardgpu
: specify a string of comma-separated gpu numbers for training (e.g. "0" or "0, 1")step
: step size, or learning rate, for trainingnum_epochs
: number of training epochs to runshuffle
: true to shuffle the datasetnum_data_sets
: controls the number of training samples to use for trainingstdev
: set to non-zero to add complex-valued white Gaussian noise to the dataself_supervised
: set to true to evaluate the loss in the measurement domain
It is possible to run simple distributed training, by splitting the training epoch over multiple GPUs/CPUs. For example, if the training set contains 100 samples and four GPUs are used, then each GPU will receive 25 training samples each epoch.
- To run distributed training on GPU, simple specify multiple GPUs in the config:
gpu: "0, 1, 2, 3"
- To run distributed training on CPU, do not set the
gpu
variable, and instead export the environment variable for OpenMP before running the code:export OMP_NUM_THREADS=20
The config can be used to enable hyperparameter optimization/tuning with support for parallelization across CPUs/GPUs. To enable hyperparameter optimization:
-
set
hyperopt
to true -
set
num_workers
to the number of experiments to run in parallel. For example, with four GPUs, setnum_workers
to 4. -
set
gpu
to the list of GPUs to use (or leave blank to use CPU) -
set
num_trials
to the number of experiments to run. For example, setnum_trials
to 10 to run 10 experiments with different hyperparameters -
Example: 100 trials using 4 GPUs with two experiments per GPU running at once:
"hyperopt": true,
"num_workers": 8,
"gpu": "0, 1, 2, 3",
"num_trials": 100
Currently, one must manually set which config options are tunable via hyperparameter optimization. DeepInPy uses TestTube to cotrol this. By default, the step size is the only tunable parameter, defined in main.py
:
parser.opt_range('--step', type=float, dest='step', default=.001, help='step size/learning rate', tunable=True, nb_samples=100, low=.0001, high=.001)
Notice that it is an opt_range
, meaning that it will sample values between low
and high
. Also notice that nb_samples=100
, meaning at most 100 different values will be sampled from this hyperparameter. Finally, notice that tunable=True
. If we change this to False
, then it will not be used for hyperparameter optimization
For example, currently the solver is not tunable:
parser.opt_list('--solver', action='store', dest='solver', type=str, tunable=False, options=['sgd', 'adam'], help='optimizer/solver ("adam", "sgd")', default="sgd")
If we change tunable
to True
, then hyperopt will choose between the values under options
for each experiment.
In this way, we can set multiple Hyperparameters to tunable=True
. Then, each hyperopt experiment will choose one value from each parameter, and we can sweep a large number of parameters at once.
The default policy is to use random search. This can also be modified by changing the strategy to grid search in the HyperOptArgumentParser
argument:
From
parser = HyperOptArgumentParser(usage=usage_str, description=description_str, formatter_class=argparse.ArgumentDefaultsHelpFormatter, strategy='random_search')
to
parser = HyperOptArgumentParser(usage=usage_str, description=description_str, formatter_class=argparse.ArgumentDefaultsHelpFormatter, strategy='grid_search')
DeepInPy has capabilities for learning rate scheduling. An example usage has been included in the default config. The general use is:
"lr_scheduler": [x,y]
where "x" is the epoch when the multiplicative factor will be applied and "y" is the multiplicative factor that scales the current learning rate. Each successive x number of epochs, e.g. 2x, 3x, 4x etc will also scale the learning rate.