Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EM Efficiency vs. INLA #27

Open
smeisler opened this issue Jul 20, 2022 · 9 comments
Open

EM Efficiency vs. INLA #27

smeisler opened this issue Jul 20, 2022 · 9 comments

Comments

@smeisler
Copy link
Contributor

smeisler commented Jul 20, 2022

Hi,

I know the benchmarking in the BayesfMRI paper used the INLA implementation, but I have been testing the EM version (2.0 latest commit) and have been getting very long run times compared to the published benchmarks.

I am running on a CentOs7 HPC, using 32 CPUs / 100GB RAM. A single hemisphere took ~18 hours to process for a single HPC subject resampled to 10000 vertices. Max memory usage was around 28GB.

My run is pretty much identical to the vignette with the exception that I am running on the HCP working memory N-back task, and thus have 8 conditions of interest (4 conditions for 2-back and 0-back each).

Is this to be expected, and are there any ways I can speed up the run? For example, if I only ran two conditions (pool all the 2-back and 0-back conditions together), would I expect to see speed increases? I would prefer not to resample further, and I imagine I will get diminishing returns on adding CPUs.

The results look great, by the way! Plotted below is a single subject 2-back vs 0-back, plotted against the HCP group level analysis. Thanks again for all of your work in developing this!
image
image

Best,
Steven

@mandymejia
Copy link
Owner

mandymejia commented Jul 21, 2022 via email

@mandymejia
Copy link
Owner

mandymejia commented Jul 21, 2022 via email

@smeisler
Copy link
Contributor Author

Thanks for the suggestions! Definitely no rush to address this, and I will try pooling across conditions and testing a different subject as well.

@danieladamspencer
Copy link
Collaborator

Hey @smeisler , just checking in to see if you saw similar runtimes with other subjects. Did you try pooling across conditions? Happy to help now that I'm back!

@smeisler
Copy link
Contributor Author

smeisler commented Aug 2, 2022

Hi, I have not tried this yet, but I will update when I do.

@smeisler
Copy link
Contributor Author

smeisler commented Aug 20, 2022

Getting back to this, I retried INLA after grouping together the 0- and 2-back conditions and changing variable names (see #28), and the program crashes with the following error message (on both HCP subjects I tested):

SETTING UP DATA 
.. reading in data for session 1
.. reading in data for session 2
MAKING DESIGN MATRICES 
RUNNING MODELS 

.. LEFT CORTEX ANALYSIS 
    848 locations removed due to NA or NaN values in at least one scan.
.... prewhitening... done!

.... estimating model with INLAError in INLA::f(ZeroBack_HRF, model = spde, replicate = repl1, hyper = list(theta = list(initial = c(-2,  : 
 object 'spde' not found

*** inla.core.safe:  inla.program has crashed: rerun to get better initial values. try=1/2 
Error in INLA::f(ZeroBack_HRF, model = spde, replicate = repl1, hyper = list(theta = list(initial = c(-2,  : 
 object 'spde' not found

*** inla.core.safe:  inla.program has crashed: rerun to get better initial values. try=2/2 
Error in INLA::f(ZeroBack_HRF, model = spde, replicate = repl1, hyper = list(theta = list(initial = c(-2,  : 
 object 'spde' not found
Error in inla.core.safe(formula = formula, family = family, contrasts = contrasts,  : 
 *** Fail to get good enough initial values. Maybe it is due to something else.
Calls: BayesGLM_cifti -> BayesGLM -> <Anonymous> -> inla.core.safe
Execution halted

Restarting with EM implementation and will update when that finishes.

@smeisler
Copy link
Contributor Author

smeisler commented Aug 20, 2022

The EM implementation was notably quicker. The run time was 58 and 67 minutes for the two subjects.

@danieladamspencer
Copy link
Collaborator

Hi @smeisler , do you have a reproducible example that I can use to try to figure this out? I am unable to reproduce this error. Glad to hear that the EM is working for you, though!

@smeisler
Copy link
Contributor Author

smeisler commented Aug 22, 2022

Yes, this is using HCP1200 data, very similar to the vignette, but using the working memory task and grouping 0- and 2-back conditions together. Just make sure to change the two path_to_XX variables upfront to match your file system. And, 'data_dir' should point to your HCP 1200 subjects directory. Please let me know if you need any help in understanding the code.

subject <-'100307' # subject name
outfile='SS_test_INLA_100307'
path_to_lic <- '/PATH/TO/pardiso.lic'
path_to_workbench <- '/PATH/TO/workbench'
data_dir <- '/PATH/TO/HCP1200'

knitr::opts_chunk$set(echo = TRUE)
options(warn = -1)
library(ciftiTools)
library(dplyr)
library(INLA)
library(BayesfMRI)
library(abind)
library(ggplot2)
library(IRkernel)
library(IRdisplay)
library(rgl)
library(gifti)
rgl.printRglwidget = TRUE

inla.setOption(pardiso.license = path_to_lic)
inla.pardiso.check()
wb_cmd <- file.path(path_to_workbench,'/bin_rh_linux64/wb_command')
ciftiTools::ciftiTools.setOption("wb_path", path_to_workbench)

task='WM'
TR=0.72
tasks <- c('0bk_body','0bk_tools','0bk_places','0bk_faces',
           '2bk_body','2bk_tools','2bk_places','2bk_faces') # Task event filenames
names_tasks <- c('ZeroBack', 'TwoBack')
K <- length(names_tasks)
fmri_acqs <- c('LR','RL')

# Load surfaces
dir_s <- file.path(data_dir, subject, 'T1w', 'fsaverage_LR32k')
fname_gifti_left <- file.path(dir_s, paste0(subject,'.L.midthickness_MSMAll.32k_fs_LR.surf.gii'))
fname_gifti_right <- file.path(dir_s, paste0(subject,'.R.midthickness_MSMAll.32k_fs_LR.surf.gii'))

# Get fMRI images and event info
for(i in 1:length(fmri_acqs)){
    acq = fmri_acqs[i]
    func_dir <- file.path(data_dir,subject, 'MNINonLinear', 'Results', paste('tfMRI',task,acq,sep="_"))
    assign(paste0('fname',i,'_ts'), file.path(func_dir,paste('tfMRI',task,acq,'Atlas_MSMAll.dtseries.nii',sep="_")))
    assign(paste0('motion',i),as.matrix(read.table(file.path(func_dir,'Movement_Regressors.txt'), header=FALSE)))
    onsets <- vector('list', length=K)
    names(onsets) <- names_tasks
    for(t in tasks){
        # Put all 0-back in task 1, 2-back in task 2
        if (grepl("0", t)){
        ind_t <- 1
        } else{
        ind_t <- 2 }

        fname_t <- file.path(func_dir,'EVs',paste0(t,'.txt'))
	row_to_add <- read.table(fname_t, header=FALSE)
	onsets[[ind_t]] <- rbind(onsets[[ind_t]], row_to_add)
    }
    assign(paste0('onsets',i), onsets)
}


thetas <- NULL # No starting values for precision parameters
results <- BayesGLM_cifti(
    cifti_fname = c(fname1_ts, fname2_ts),
    surfL_fname = fname_gifti_left,
    surfR_fname = fname_gifti_right,
    onsets = list(onsets1, onsets2),
    TR = TR,
    nuisance = list(motion1, motion2),
    session_names = fmri_acqs,
    resamp_res = 10000,
    num.threads =  parallel::detectCores() - 2,
    verbose = TRUE,
    outfile = outfile,
    avg_sessions = TRUE,
    EM = TRUE)
saveRDS(results, file = paste0(outfile,'.rds'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants