Neural network for classifying rock clasts in computed tomography (CT) scans of sediment cores.
Launch in Binder (Interactive jupyter lab environment in the cloud).
To help out with development, start by cloning this repo-url
git clone <repo-url>
Then I recommend using conda to install the dependencies. The conda virtual environment will also be created with Python and JupyterLab installed.
cd ctcorenet
conda env create --file=environment.yml --solver=libmamba
Activate the virtual environment first.
conda activate ctcorenet
Finally, double-check that the libraries have been installed.
conda list
Images are manually labeled by drawing polygons around objects of interest, e.g. ice rafted debris (IRD) rock clasts. The polygons are drawn using the labelme GUI program which is launched from the command-line using:
labelme
Follow the single image annotation tutorial at https://github.com/wkentaro/labelme/tree/master/examples/tutorial to draw the polygons. The polygons are stored in a JSON file, and can be converted to image masks using:
labelme_json_to_dataset data/CT_data/CORE_42.json -o data/train/CORE_42
This will produce a folder named CORE_42 containing 4 files:
- img.png
- label.png
- label_names.txt
- label_viz.png
The model can be trained using the following command:
python ctcorenet/ctcoreunet.py
This will load the image data stored in data/train
, perform the training
(minimize loss between img.png and label.png), and produce some outputs.
More advanced users can customize the training, e.g. to be more deterministic, running for only x epochs, train on a CUDA GPU using 16-bit precision, etc, like so:
python ctcorenet/ctcoreunet.py --deterministic=True --max_epochs=3 --accelerator=gpu --devices=1 --precision=16
More options to customize the training can be found by running
python ctcorenet/ctcoreunet.py --help
.
To easily manage the whole machine learning workflow, this project uses the data version control (DVC) library which stores all the commands and input/intermediate/output data assets used. This makes it easy to reproduce the entire pipeline using a single command
dvc pull
dvc repro
This command will perform all the data preparation and model training steps. For more information, see https://dvc.org/doc/start/data-pipelines.