-
Notifications
You must be signed in to change notification settings - Fork 1
2. Data input and output in Python
Procedures to read fMRI data in Python
Example data are at:
-
Perspective-taking fMRI: /space_lin2/fhlin/perspective
-
Human connectome project: /space_lin1/hcp
The path of our current available Anaconda folder is:
/space_lin2/kaihua/anaconda3
Activating our python environment is simple.
First, "source" the relevant profile, after which you can check if the "conda" command is available by outputting its version, as shown below:
source /space_lin2/kaihua/.bashrc
conda --version
Later, choose which virtual environment you want to utilise. Currently, there are mainly two environments: (base) and (py36), and you can activate one of them by entering "conda activate" command, as shown below:
conda activate base
conda activate py36
Different environments would contain different packages of different versions, which will meet the requirements of different tasks. Check the details of these packages by entering:
conda list
Here are examples to read perspective-taking fMRI data in Python.
Take the NII file '/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/bold/007/sfmcstc.nii'
for example.
import nibabel as nb
nii_img = nb.load("/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/bold/007/sfmcstc.nii")
nii_data = nii_img.get_fdata() # shape: (72, 72, 42, 101)
affine = nii_img.affine # affine matrix, shape: (4, 4)
hdr = nii_img.header # header
import mne
stc = mne.read_source_estimate("/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/f1/subj_02_2_fsaverage_sfmcstc-lh.stc") # <class 'mne.source_estimate.SourceEstimate'>
stc_data = stc.data # shape: (20484, 100), <class 'numpy.ndarray'>
stc_vertices = stc.vertices # shape: (2, 10242)
stc_tmin = stc.tmin # i.e., epoch_begin_latency=0.0
stc_tstep = stc.tstep # i.e., sample_period=2.0
stc_times = stc.times # shape: (87,)
The resulting STC files can be read by this python script file(not fully completed)
...
Assume we have got the fMRI data and its target in the form of numpy array, of which the shapes are:
print(fmri_data.shape) # shape: (36, 17, 150); (samples, scenes, rois)
print(behavior_targets.shape) # shape: (36,)
Before applying standardization, we should first convert the original fMRI data from 3 dimensions to 2 dimensions:
fmri_data_shape1 = [fmri_data.shape[0], fmri_data.shape[1], fmri_data.shape[2]] # [36, 17, 150]
fmri_data = fmri_data.reshape(fmri_data_shape1[0]*fmri_data_shape1[1], fmri_data_shape1[2]) # shape: (36, 17, 150) -> (612, 150)
Then we can apply Standardization by utilizing the function "StandardScaler" in scikit-learn:
standardization = True
if standardization:
ss = StandardScaler()
fmri_data = ss.fit_transform(fmri_data) # fit and transform
Similarly, we can apply PCA by utilizing the function "PCA" in scikit-learn:
use_pca = True
if use_pca:
pca = PCA(n_components=0.85) # remain 85% variance explanation
fmri_data = pca.fit_transform(fmri_data) # shape: (612, 150) -> (612, 8)
After applying standardization or PCA, we need to restore the data shape from 2 dimensions to 3 dimensions:
fmri_data = fmri_data.reshape(fmri_data_shape1[0], fmri_data_shape1[1], -1) # shape: (612, 8) -> (36, 17, 8)
For temporal models like LSTM, shape (36, 17, 8) can directly serve as the input. But for other conventional machine learning regression models, we still need to further flatten the temporal (i.e., scenes) and spatial (i.e., rois after PCA) dimension:
fmri_data_shape2 = [fmri_data.shape[0], fmri_data.shape[1], fmri_data.shape[2]] # [36, 17, 8]
fmri_data = fmri_data.reshape(fmri_data_shape2[0], fmri_data_shape2[1]*fmri_data_shape2[2]) # shape: (36, 17, 8) -> (36, 136)
Splitting dataset can be achieved by applying the function "train_test_split" in scikit-learn:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fmri_data, behavior_targets, test_size=0.2, random_state=42) # selecting 20% samples as the testing set