Arno Solin (Doctoral student at Aalto University, instructor Dr. Simo Särkkä)
The goal of the competition was to automatically diagnose subjects with schizophrenia based on multimodal features derived from their magnetic resonance imaging (MRI) brain scans. The winning proposition was based on a Gaussian process (GP) classifier, where the observations are considered to be drawn from a Bernoulli distribution. The probability is related to the latent function via a sigmoid function that transforms it to a unit interval. A GP prior with a covariance function as a sum of a constant, linear, and Matérn kernel was placed over the latent functions. The model was trained by sampling using the GPstuff toolbox.
For more details, see the model documentation report.
This solution builds heavily upon the GPstuff toolbox for Mathworks Matlab (or Octave). It is our in-house-developed software package for Gaussian process modeling. All codes were tested in Matlab 8.2.0.701 (R2013b), and GPstuff version 4.5 (release 2014-07-22, available online, and distributed under the GNU General Public License) in Ubuntu Linux.
All files are written in Mathworks Matlab, and running the scripts require installation of the \gpstuff\ toolbox. The following files are provided:
settings.m
(Matlab)- Specifies the path to the training data (
TRAIN_DATA_PATH
), test data (TEST_DATA_PATH
), model (MODEL_PATH
), and submission output directories (SUBMISSION_PATH
). This is the only place that specifies the paths to these directories. - The GPstuff toolbox is added to the Matlab path with appropriate initializations.
- Specifies the path to the training data (
train.m
(Matlab)- Read training data from
TRAIN_DATA_PATH
(specified insettings.m
). - Do the normalization steps.
- Set up and train the GP classifier (Note that the random number generator seed is not specified).
- Save the model under
MODEL_PATH
(specified insettings.m
).
- Read training data from
predict.m
(Matlab)- Read the training and test data from
TRAIN_DATA_PATH
andTEST_DATA_PATH
, and do the normalization steps. - Load the model from
MODEL_PATH
. - Use the model to make predictions on new samples.
- Save the predictions to
SUBMISSION_PATH
.
- Read the training and test data from
The following steps should be taken to replicate the model training procedure (in order to use the exactly same samples, load the serialized model from disk by only running the prediction in predict.m
):
- Download and unpack the GPstuff toolbox. Additional speedup can be gained by mexing (see the toolbox documentation, or just run
matlab_install.m
). - Modify (to set the paths) and run
setup.m
in Matlab. - Run
train.m
in Matlab to train the GP classifier (note that the random seed is not fixed). The model is saved under the path specified insetup.m
. - Run
predict.m
in Matlab to predict using the GP classifier. The model output is stored under the path specified insetup.m
.
The winning model (serialized and saved) and submission CSV file are stored under ./model/
and ./submission/
, respectively.
This software is distributed under the GNU General Public License (version 3 or later); please refer to the file LICENSE.txt
, included with the software, for details.