Skip to content

2. Data input and output in Python

Kaihua Chen edited this page Nov 15, 2022 · 14 revisions

Procedures to read fMRI data in Python

Example data are at:

  1. Perspective-taking fMRI: /space_lin2/fhlin/perspective

  2. Human connectome project: /space_lin1/hcp

0. Python environment

The path of our current available Anaconda folder is: /space_lin2/kaihua/anaconda3

Activating our python environment is simple.

First, "source" the relevant profile, after which you can check if the "conda" command is available by outputting its version, as shown below:

source /space_lin2/kaihua/.bashrc
conda --version

Later, choose which virtual environment you want to utilise. Currently, there are mainly two environments: (base) and (py36), and you can activate one of them by entering "conda activate" command, as shown below:

conda activate base

conda activate py36

Different environments would contain different packages of different versions, which will meet the requirements of different tasks. Check the details of these packages by entering:

conda list

1. Read in-house fMRI data

Here are examples to read perspective-taking fMRI data in Python.

1.1 NII format

Take the NII file '/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/bold/007/sfmcstc.nii' for example.

import nibabel as nb
nii_img = nb.load("/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/bold/007/sfmcstc.nii")

nii_data = nii_img.get_fdata()  # shape: (72, 72, 42, 101)
affine = nii_img.affine  # affine matrix, shape: (4, 4)
hdr = nii_img.header  # header

1.2 STC format

import mne
stc = mne.read_source_estimate("/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/f1/subj_02_2_fsaverage_sfmcstc-lh.stc") # <class 'mne.source_estimate.SourceEstimate'>
stc_data = stc.data # shape: (20484, 100), <class 'numpy.ndarray'>
stc_vertices = stc.vertices # shape: (2, 10242)
stc_tmin = stc.tmin # i.e., epoch_begin_latency=0.0
stc_tstep = stc.tstep # i.e., sample_period=2.0
stc_times = stc.times # shape: (87,)

2. Read Human Connectome Project data

The resulting STC files can be read by this python script file(not fully completed)

3. Basic data processing in Python

3.1 After reading STC files

...

Assume we have got the fMRI data and its target in the form of numpy array, of which the shapes are:

print(fmri_data.shape) # shape: (36, 17, 150); (samples, scenes, rois)
print(behavior_targets.shape) # shape: (36,)

3.2 Standardization

Before applying standardization, we should first convert the original fMRI data from 3 dimensions to 2 dimensions:

fmri_data_shape1 = [fmri_data.shape[0], fmri_data.shape[1], fmri_data.shape[2]] # [36, 17, 150]
fmri_data = fmri_data.reshape(fmri_data_shape1[0]*fmri_data_shape1[1], fmri_data_shape1[2]) # shape: (36, 17, 150) -> (612, 150)

Then we can apply Standardization by utilizing the function "StandardScaler" in scikit-learn:

standardization = True
if standardization:
    ss = StandardScaler()
    fmri_data = ss.fit_transform(fmri_data) # fit and transform

3.3 Principle component analysis (PCA)

Similarly, we can apply PCA by utilizing the function "PCA" in scikit-learn:

use_pca = True
if use_pca:
    pca = PCA(n_components=0.85) # remain 85% variance explanation
    fmri_data = pca.fit_transform(fmri_data) # shape: (612, 150) -> (612, 8)

3.4 Restore data shape

After applying standardization or PCA, we need to restore the data shape from 2 dimensions to 3 dimensions:

fmri_data = fmri_data.reshape(fmri_data_shape1[0], fmri_data_shape1[1], -1) # shape: (612, 8) -> (36, 17, 8)

For temporal models like LSTM, shape (36, 17, 8) can directly serve as the input. But for other conventional machine learning regression models, we still need to further flatten the temporal (i.e., scenes) and spatial (i.e., rois after PCA) dimension:

fmri_data_shape2 = [fmri_data.shape[0], fmri_data.shape[1], fmri_data.shape[2]] # [36, 17, 8]
fmri_data = fmri_data.reshape(fmri_data_shape2[0], fmri_data_shape2[1]*fmri_data_shape2[2]) # shape: (36, 17, 8) -> (36, 136)

3.5 Split dataset into training and testing sets

Splitting dataset can be achieved by applying the function "train_test_split" in scikit-learn:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fmri_data, behavior_targets, test_size=0.2, random_state=42) # selecting 20% samples as the testing set
Clone this wiki locally