Before starting to work on the code-level on GaNDLF, please follow the instructions to install GaNDLF from sources. Once that's done, please verify the installation using the following command:
# continue from previous shell
(venv_gandlf) $>
# you should be in the "GaNDLF" git repo
(venv_gandlf) $> gandlf verify-install
- The following flowcharts are intended to provide a high-level overview of the different submodules in GaNDLF.
- Navigate to the
README.md
file in each submodule folder for details.
- Command-line parsing: gandlf run
- Parameters from training configuration get passed as a
dict
via the config manager - Training Manager:
- Handles k-fold training
- Main entry point from CLI
- Training Function:
- Performs actual training
- Inference Manager:
- Handles inference functionality
- Main entry point from CLI
- Inference Function:
- Performs actual inference
To update/change/add a dependency in setup, please ensure at least the following conditions are met:
- The package is being actively maintained.
- The new dependency is being testing against the minimum python version supported by GaNDLF (see the
python_requires
variable in setup). - It does not clash with any existing dependencies.
- For details, please see README for
GANDLF.models
submodule. - Update Tests
- Update or add dependency in setup, if appropriate.
- Add transformation to
global_augs_dict
, defined inGANDLF/data/augmentation/__init__.py
- Ensure probability is used as input; probability is not used for any preprocessing operations
- For details, please see README for
GANDLF.data.augmentation
submodule. - Update Tests
- Update or add dependency in setup, if appropriate; see section on Dependency Management for details.
- All transforms should be defined by inheriting from
torchio.transforms.intensity_transform.IntensityTransform
. For example, please see the threshold/clip functionality in theGANDLF/data/preprocessing/threshold_and_clip.py
file. - Define each option in the configuration file under the correct key (again, see threshold/clip as examples)
- Add transformation to
global_preprocessing_dict
, defined inGANDLF/data/preprocessing/__init__.py
- For details, please see README for
GANDLF.data.preprocessing
submodule. - Update Tests
- Update Training Function
- Update Training Manager, if any training API has changed
- Update Tests
- Update Inference Function
- Update Inference Manager, if any inference API has changed
- Update Tests
Example: gandlf config-generator
CLI command
- Implement function and wrap it with
@click.command()
+@click.option()
- Add it to
cli_subcommands
dict The command would be available undergandlf your-subcommand-name
CLI command.
For any new feature, please ensure the corresponding option in the sample configuration is added, so that others can review/use/extend it as needed.
Once you have made changes to functionality, it is imperative that the unit tests be updated to cover the new code. Please see the full testing suite for details and examples.
There are two types of tests: unit tests for GaNDLF code, which tests the functionality, and integration tests for deploying and running mlcubes. Some additional steps are required for running tests:
- Ensure that the install optional dependencies [ref] have been installed.
- Tests are using sample data, which gets downloaded and prepared automatically when you run unit tests. Prepared data is stored at
${GaNDLF_root_dir}/testing/data/
folder. However, you may want to download & explore data by yourself.
Once you have the virtual environment set up, tests can be run using the following command:
# continue from previous shell
(venv_gandlf) $> pytest --device cuda # can be cuda or cpu, defaults to cpu
Any failures will be reported in the file ${GANDLF_HOME}/testing/failures.log
.
All integration tests are combined to one shell script:
# it's assumed you are in `GaNDLF/` repo root directory
cd testing/
./test_deploy.sh
The code coverage for the unit tests can be obtained by the following command:
bash
# continue from previous shell
(venv_gandlf) $> coverage run -m pytest --device cuda; coverage report -m
We use the native logging
library for logs management. This gets automatically configured when GaNDLF gets launched. So, if you are extending the code, please use loggers instead of prints.
Here is an example how root logger
can be used
def my_new_cool_function(df: pd.DataFrame):
logging.debug("Message for debug file only")
logging.info("Hi GaNDLF user, I greet you in the CLI output")
logging.error(f"A detailed message about any error if needed. Exception: {str(e)}, params: {params}, df shape: {df.shape}")
# do NOT use normal print statements
# print("Hi GaNDLF user!")
Here is an example how logger can be used:
def my_new_cool_function(df: pd.DataFrame):
logger = logging.getLogger(__name__) # you can use any your own logger name or just pass a current file name
logger.debug("Message for debug file only")
logger.info("Hi GaNDLF user, I greet you in the CLI output")
logger.error(f"A detailed message about any error if needed. Exception: {str(e)}, params: {params}, df shape: {df.shape}")
# print("Hi GaNDLF user!") # don't use prints please.
GaNDLF logs are splitted into multiple parts:
- CLI output: only
info
messages are shown here - debug file: all messages are shown
- stderr: display
warning
,error
, orcritical
messages
By default, the logs are saved in the /tmp/.gandlf
dir.
The logs are saved in the path that is defined by the '--log-file' parameter in the CLI commands.
Example of log message
#format: "%(asctime)s - %(name)s - %(levelname)s - %(pathname)s:%(lineno)d - %(message)s"
2024-07-03 13:05:51,642 - root - DEBUG - GaNDLF/GANDLF/entrypoints/anonymizer.py:28 - input_dir='.'
You can create and configure your own logger by updating the file GANDLF/logging_config.yaml
.