Skip to content

Convection classification with machine learning using triplet trainer

Notifications You must be signed in to change notification settings

leifdenby/convml-tt

 
 

Repository files navigation

Studying convective organisation with neural networks

This repository contains code to generate training data, train and interprete the neural network used in L. Denby (2020) collected in a python module called convml_tt. From version v0.7.0 it was rewritten to use pytorch-lightning rather than fastai v1 to adopt best-practices and make it easier to modify and carry out further research on the technique.

Getting started

To use the convml_tt codebase you will first need to install pytorch which can most easily by done with conda (or use mamba which is conda re-implemented in c++ and is orders of magnitude faster - once installed just replace conda with mamba in the commands below).

  1. Once conda is installed you can create a conda environment:
conda create -n convml-tt
conda activate convml-tt

Into this conda environment you the need to install pytorch. Depending on whether you have access to a GPU or not you will need to install different pytorch packages:

2a. For GPU-based trained and inference:

conda install pytorch "torchvision>=0.4.0" pytorch-cuda -c pytorch -c nvidia

(to check that is working you can run python -c 'import torch; print(torch.cuda.is_available())')

2b. For CPU-based training and inference:

conda install pytorch "torchvision>=0.4.0" cpuonly -c pytorch
  1. With the environment set up and pytorch installed you can now install convml-tt directly from pypi using pip (note if you are planning on modifying the convml-tt functionality you will want to download the convml-tt source code and install from a local copy instead of from pypi. See development instructions for more details):
python -m pip install convml-tt

You will now have convml-tt available whenever you activate the convml-tt conda environment. You will have the base components of convml-tt installed which enable training the model on a existing triplet-dataset and making predictions with a trained model. Functionality to create training data is contained in a separate package called convml-data

Training

Below are details on how to obtain training data and how to train the model

Training data

Example dataset

A few example training datasets can be downloaded using the following command

python -m convml_tt.data.examples

Model training

You can use the CLI (Command Line Interface) to train the model

python -m convml_tt.trainer data_dir

where data_dir is the path of the dataset you want to use. There are a number of optional command flags available, for example to train with one GPU use the training process to weights & biases use --log-to-wandb. For a list of all the available flags use the -h.

Training can also be done interactively in for example a jupyter notebook, you can see some simple examples how what commands to use by looking at the automated tests in tests/.

Finally there detailed notes on how to train on the ARC3 HPC cluster at University of Leeds are in doc/README.ARC3.md, on the JASMIN analysis cluster and on Google Colab.

Model interpretation

There are currently two types of plots that I use for interpreting the embeddings that the model produces. These are a dendrogram with examples plotted for each class of the leaf nodes of the dendrogram and a scatter plot of two dimensions annotated with example tiles so the actual tiles can be visualised.

There is an example of how to make these plots and how to easily generate an embedding (or encoding) vector for each example tile in example_notebooks/model_interpretation. Again this notebook expects the directory layout mentioned above.

About

Convection classification with machine learning using triplet trainer

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 99.3%
  • Python 0.7%