README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# fslmer

<!-- badges: start -->
<!-- badges: end -->

## Overview

The `fslmer` package provides univariate and mass-univariate linear mixed-effects analysis for FreeSurfer imaging data. It is a port of Freesurfer's Matlab-based LME tools to the R programming language.

Please refer to the original documentation at https://surfer.nmr.mgh.harvard.edu/fswiki/LongitudinalStatistics and https://surfer.nmr.mgh.harvard.edu/fswiki/LinearMixedEffectsModels for an overview and further information about the software. For the original code, see the original repository at https://github.com/NeuroStats/lme.

If you use these tools in your analysis please cite:

- Bernal-Rusiel J.L., Greve D.N., Reuter M., Fischl B., Sabuncu M.R., 2012. Statistical Analysis of Longitudinal Neuroimage Data with Linear Mixed Effects Models, NeuroImage 66, 249-260, 2012, https://dx.doi.org/10.1016%2Fj.neuroimage.2012.10.065

- Bernal-Rusiel J.L., Greve D.N., Reuter M., Fischl B., Sabuncu M.R., 2013. Spatiotemporal Linear Mixed Effects Modeling for the Mass-univariate Analysis of Longitudinal Neuroimage Data, NeuroImage 81, 358–370, 2013, https://doi.org/10.1016/j.neuroimage.2013.05.049

- Reuter M., Schmansky N.J., Rosas H.D., Fischl B, 2012.Within-Subject Template Estimation for Unbiased Longitudinal Image Analysis, NeuroImage 61, 1402-1418, 2012, http://dx.doi.org/10.1016/j.neuroimage.2012.02.084

## Installation

You can install fslmer from [GitHub](https://github.com/) with:

```R
# install.packages("devtools") # run if necessary
devtools::install_github("Deep-MI/fslmer", build_vignettes=TRUE)
```
## Running the program

There are two types of analyses that can be done: univariate and mass-univariate. The univariate analysis is, in principle, similar to a classical mixed-effects analysis as implemented in the `lme4` or `nlme` packages. The mass-univariate analysis is specifically tailored for the surface-based vertex data from a preceding run of FreeSurfer's longitudinal analysis pipeline.

### Univariate analyses

- Loading the data

Usually you should already have a longitudinal Qdec table, which contains the subject IDs, image/observation/session IDs, time, and demographics (see https://surfer.nmr.mgh.harvard.edu/fswiki/LongitudinalStatistics for a description of the format). Using Freesurfer's `asegstats2table` and/or `aparcstats2table` tools, you can extract the volume and/or thickness estimates from the longitudinally processed data in your subjects directory:

```
asegstats2table --qdec-long PATH_TO_QDEC_TABLE/qdec.table.dat -t PATH_TO_ANALYSIS_DIRECTORY/aseg.long.table
```

- Preparing the data

Your longitudinal Qdec table needs to contain the "fsid-base" (subject ID) columns and "fsid" (image/observation/session ID), and should contain an additional column indicating time from baseline or the sequence of observations. Additional columns such as diagnoses, demographics, or covariates may also be present. For the current example, we assume that the timing info is stored in a numerical variable called "time", indicating years from baseline, and that a categorical variable "DX" indicates a diagnosis at baseline. You can used other names and/or additional variables in your analysis, but the example code has to be adapted accordingly.

After loading the data, some further preparations are needed, including merging and sorting the data. In particular, the `fslmer` tools require the data ordered according to time for each individual (that is, your design matrix needs to have all the repeated assessments for the first subject, then all for the second and so on). 

```R
# load the package
library(fslmer)

# Load aseg/aparc and qdec tables into R:
aseg <- read.table("PATH_TO_DATA/aseg.long.table", header=True)
qdec <- read.table("PATH_TO_QDEC_TABLE/qdec.table.dat", header=True)

# Read the qdec and aseg tables; note that R replaces '-' (and other characters) 
# in variable names with '.'.
qdec <- read.csv("adni-180-long-fs53-new.qdec")
aseg <- read.csv("adni-180-long-fs53-new-aseg.csv")

# For the aseg table, split the longitudinal ID into fsid and fsid.base.
aseg$fsid <- sub("\\.long\\..*", "", aseg$Measure.volume)
aseg$fsid.base <- sub(".*\\.long\\.", "", aseg$Measure.volume)

# Merge qdec and aseg tables based on fsid and fsid.base variables; create new 
# table 'dat'.
dat <- merge(qdec, aseg, by=c("fsid.base", "fsid"))

# Sort the by 'fsid.base' and then by 'time'. 
dat <- dat[order(dat$fsid.base, dat$Time.From.Baseline), ]

# Extract the structure of interest (here: volume of left hippocampus; can be 
# changed) and store it in a new variable 'Y'. The variable needs to be column 
# vector (hence the 'matrix' command).
Y <- matrix(dat$Left.Hippocampus, ncol=1)

# As an auxiliary variable, create a vector of number of observations per each 
# subject, i.e. count the number of occurances for each fsid.base. 
ni <- matrix(unname(table(dat$fsid.base)), ncol=1)
```

- Creating the design matrix and contrasts

Once you have your ordered data, you need to build your design matrix. As an example, a simple linear model containing a group by time interaction can be obtained with the following design matrix:

```R
X <- model.matrix(~time*DX, dat)
```

Let us assume that the categorical variable `DX` has two levels, patients and controls. Then the model matrix `X` will have four columns: one for the intercept, a time variable, a binary group indicator created from 'DX', and an interaction term between 'DX' and 'time'. The contrast `[0 0 0 1]`, which is a row vector with four elements, can then be used to test the interaction between `DX` and `time`, which indicates diverging slopes of thickness changes across time for the two groups. Instead of a contrast vector, it also possible to specify contrast matrices that have more than one row. Note that the number of columns of the contrast vector or matrix always needs to correspond to the number of columns in the model matrix.

```R
C <- matrix(c(0, 0, 0, 1), nrow=1)
```

- Estimating the model and conducting statistical inference

Estimate the model by using the `lme_fit_FS` function: The arguments `X`, `Y`, and `ni` have been defined before, and `Zcols` is a vector that indicates which terms of the model / columns of the design matrix should be regarded as random effects: assuming that the Intercept is the first column and time is the second column in `X`, use `Zcols=1` for a random-intercept model, and `Zcols=c(1, 2)` for a random-intercept-and-slope model.

```R
stats <- lme_fit_FS(X, Zcols, Y, ni)
```

The function returns a list representing a model fit, with entries `Bhat`, `CovBhat`, and `bihat`, among others; `Bhat` contains the beta values, `CovBhat` is the error covariance matrix, and `bihat` contains the random-effects coefficients.

Finally, conduct an F-test using the `lme_F` function, which takes the model fit and the contrast vector / matrix as inputs:

```R
F_C <- lme_F(stats, C)
```

This will return a list with entries `F`, `pval`, `sgn`, and `df`: `F` is the F-value, `pval` is the p-value, `sgn` is the sign of the beta coefficient, and `df` are the degrees of freedom.

### Mass-univariate analyses

The mass-univariate analysis is run per hemisphere ("lh" and "rh"). Here and in the following, we only describe the analysis for the left hemisphere, which needs to done also for the right hemisphere.

Start with FreeSurfer's `mris_preproc` comamand, which uses the longitudinal qdec table to automatically find the longitudinally processed data and assembles it into a single `lh.thickness.mgh` file. Note that it is possible to use a different study template than the standard fsaverage template, and that other measures than thickness can be used as well (see the help for mris_preproc).

```
mris_preproc --qdec-long PATH_TO_QDEC_TABLE/qdec.table.dat --target fsaverage --hemi lh --meas thickness --out lh.thickness.mgh
```

The next step is to smooth the data; here we use a 10 mm FWHM kernel. The resulting file will be `lh.thickness_sm10.mgh`.

```
mri_surf2surf --hemi lh --s fsaverage --sval lh.thickness.mgh --tval lh.thickness_sm10.mgh --fwhm-trg 10 --cortex --noreshape
```

- Loading the data

The `lh.thickness_sm10.mgh` file as well as a set of associated files will next be read into R.

```R
# load the package
library(fslmer)

# read thickness file
lh.thickness <- lme_openmgh("/PATH/TO/lh.thickness_sm10.mgh")

# read template surface
lh.sphere <- lme_readsurf("/PATH/TO/FREESURFER/DIRECTORY/subjects/fsaverage/surf/lh.sphere")

# read cortical label file
lh.cortex <- lme_readlabel("/PATH/TO/FREESURFER/DIRECTORY/subjects/fsaverage/label/lh.cortex.label")[,1]
```

- Preparing the data

The data is prepared in a similar way as for the univariate analysis. We just repeat the code here with minimal comments, see above for an explanation.

```R
# read qdec table
dat <- read.csv("PATH_TO_QDEC_TABLE/qdec.table.dat")

# order qdec table based on fsid.base and time
dat <- dat[order(dat$fsid.base, dat$time), ]

# create vector of number of observations per subject
ni <- matrix(unname(table(dat$fsid.base)), ncol=1)

# create nSubjects-times-nVertices matrix for thickness data
Y <- t(drop(lh.thickness$x))

# create (1-based) vector of cortical label indices (to exclude non-cortical vertices)
maskvtx <- sort(lh.cortex)+1
```

- Creating the design matrix and contrasts

Also the design matrix and the contrasts are constructed in the same way as for the univariate analysis. Again, we just repeat the code here.

```R
# create model matrix
X <- model.matrix(~time*DX, dat)

# create contrasts
C <- matrix(c(0, 0, 0, 1), nrow=1)

# determine the type of mixed-effects model (assuming the intercept is column 1 
# in the design matrix and the time variable is column 2).
#Zcols <- 1 # random-intercept 
Zcols <- c(1, 2) # random-slope, random-intercept
```

- Estimate the model

There two ways to estimate the model in the mass-univariate analysis. The first one is a simple vertex-wise analysis, and the second one is a novel analysis with a spatiotemporal model for the inherent dependencies (covariances) in the data. We recommend this second approach, because spatiotemporal models are more powerful to detect effects in your data than traditional vertex-wise models when two or more random effects are included in the longitudinal statistical model. Fitting these models usually requires less computation time than the above vertex-wise mass-univariate tools. 

The simple vertex-wise analysis can be run using the `lme_mass_fit_vw` function, which returns a list (`stats`) of statistical estimates (beta weights, their covariances, and others) that can be submitted to statistical inference subsequently.

```R
stats <- lme_mass_fit_vw(X, Zcols, Y, ni, numcore=6)
```

As an alternative to the above, run the code for the novel spatiotemporal analysis: here you should first compute initial temporal covariance component estimates using the `lme_mass_fit_init` function. These estimates can then be used to segment the brain into homogeneous regions of vertices with similar covariance parameters by using the `lme_mass_RgGrow` function. The spatiotemporal model can then be fitted by using the `lme_mass_fit_Rgw` function together with the previous segmentation and initial covariance estimates. The last function (`lme_mass_fit_Rgw`) returns a list (`FitRgw$stats`) of statistical estimates (beta weights, their covariances, and others) that can be submitted to statistical inference subsequently.

```R
# obtain initial estimates
FitInit <- lme_mass_fit_init(X=X, Zcols=Zcols, Y=Y, ni=ni, maskvtx=maskvtx, numcore=6)

# run algorithm to identify spatially homogeneous regions
RgGrow <- lme_mass_RgGrow(lh.sphere, FitInit$Re0, FitInit$Theta0, maskvtx=maskvtx, nst=2, prc=95)

# fit model
FitRgw <- lme_mass_fit_Rgw(X, Zcols, Y, ni, FitInit$Theta0, RgGrow$Regions, lh.sphere, prs=6)
```

For both the simple vertex-wise and the novel spatiotemporal analysis, it is advisable to run the analyses on machines where multiple cores are available, since these mass-univariate analyses can easily take multiple hours. Set the number of cores using `numcore` or `prs` options in the above functions.

As an optional step, estimate the random-effects coefficients per each vertex, for example the random intercept and random slope at each location. This is not required for further analysis, but might be interesting nevertheless. Note that this can take a long time (i.e. a few hours) if using it on a single core. To run it in parallel on multiple cores on your system, set `prs` to a number higher than one.

```R
rfx <- lme_mass_rfx(FitRgw$stats, X, Zcols, Y, ni, maskvtx, prs=1)
```

The `lme_mass_rfx` function returns the subject-specific random effects estimates at each vertex. The output is a list of lists, with the following entries:

* `Rfx`: Estimated subject-especific random effects matrix `(number of cases,  number of ranom effects * number of vertices)`. The columns of this matrix are grouped by vertex: for example, if there are two random effects in the model, then the first two columns contain the subject-specific random effect coefficients for the first vertex, then the next two columns contain the subject-specific random effect coefficients for the second vertex, etc.
* `Bhat`: Population-level regression coefficients corresponding to the random effects (same as in `FitRgw$stats`), stacked in one matrix.
* `nrfx`: Number of random effects.

We illustrate the prediction of individual values from model estimates in a [separate document](Examples.md).

- Conduct statistical inference

Inference consists of two parts, the calculation of F-values and uncorrected p-values, and the correction for multiple comparisons. The `lme_mass_F` computes, for every vertex, F- and p-values, and also returns the sign of the contrast to get a directional interpretation of the F-value (i.e., positive or negative effect). All these values are contained in `F_C$F`, `F_C$pval`, and `F_C$sgn`, respectively. Repeat this analysis multiple times if you have multiple contrasts.

```R
# inference
F_C <- lme_mass_F(FitRgw$stats, C)
```

Multiple comparison correction can be done via classical FDR thresholding or via a two-stage FDR procedure, an extension of the classical FDR procedure with higher power to detect significant effects. For classical FDR thresholding, we submit the output of the `lme_mass_F` function to the `lme_mass_FDR` function. This gives an updated threshold for uncorrected p-values that keeps the FDR. For the two-stage procedure, we submit the output of the `lme_mass_F` function to the `lme_mass_FDR2` function. This gives a list (`FDR2_C`) of corrected p-values (`FDR2_C$sided_pval`), indices of vertices surviving the correction (`FDR2_C$detvtx`), and the new thereshold (`FDR2_C$pth`), among others. Also this analysis needs to be run multiple times if you have multiple contrasts.

```R
# multiple comparison correction using classical FDR
thr_pFDR  <- lme_mass_FDR(F_C$pval, 0.05)

# multiple comparison correction using two-stage approach
FDR2_C <- lme_mass_FDR2(F_C$pval, F_C$sgn)
```

- Export the results for visualization with Freeview

The last step in this analysis is to export the results for visualization with FreeSurfer's Freeview program. Three kinds of parameters are typically of interest: the statistical parameters (here: F-values) and the corrected or uncorrected p-values. We use the `lme_savemgh` function for exporting the data, which can then be overlaid onto the corresponding (lh or rh) surface of the template brain that was used for the analysis (here: `fsaverage`).

Exporting the data requires two things: first, creating an object (here: `vol`) that contains the data and some information about the image dimensions. Since we are constructing surface overlay files, the dimensions will be `(nVertices, 1, 1, 1)`. Second, the overlay data need to be extracted from the outputs of the various functions; how this is done for each type of parameters is illustrated below.

```R
# save F-values
vol <- NULL
vol$ndim1 <- length(F_C$F)
vol$ndim2 <- 1
vol$ndim3 <- 1
vol$nframes <- 1
vol$x <- array(data=F_C$F, dim=c(length(F_C$F), 1, 1, 1))
lme_savemgh(vol=vol, fname=OUTPUT_FILE_NAME)

# save uncorrected p-values
vol <- NULL
vol$ndim1 <- length(F_C$pval)
vol$ndim2 <- 1
vol$ndim3 <- 1
vol$nframes <- 1
vol$x <- array(data=F_C$pval, dim=c(length(F_C$pval), 1, 1, 1))
lme_savemgh(vol=vol, fname=OUTPUT_FILE_NAME)

# save FDR2-corrected p-values
vol <- NULL
vol$ndim1 <- length(FDR2_C$sided_pval)
vol$ndim2 <- 1
vol$ndim3 <- 1
vol$nframes <- 1
vol$x <- array(data=FDR2_C$sided_pval, dim=c(length(FDR2_C$sided_pval), 1, 1, 1))
lme_savemgh(vol=vol, fname=OUTPUT_FILE_NAME)
```
- Further analysis

For further analysis such as extracting clusters and getting cluster statistics, we recommend using FreeSurfer's `mri_surfcluster` program.

## Updates and bug-fixes

In version 0.0.0.9002, a bug was fixed in the lme_mass_fit_Rgw() function that resulted in an incorrect assignment of the model estimates to the `stats` output structure in situations where the model could not be estimated and returned a NULL value. As a consequence, the ordering of vertices was not reliable, and anatomical inferences should be drawn with caution. It is recommended to re-run the updated lme_mass_fit_Rgw() version (version 0.0.0.9002 or newer).