-
Notifications
You must be signed in to change notification settings - Fork 1
2. Data input and output in Python
Procedures to read fMRI data in Python
Example data are at:
-
Perspective-taking fMRI: /space_lin2/fhlin/perspective
-
Human connectome project: /space_lin1/hcp
The path of our current available Anaconda folder is:
/space_lin2/kaihua/anaconda3
Activating our python environment is simple.
First, "source" the relevant profile, after which you can check if the "conda" command is available by outputting its version, as shown below:
source /space_lin2/kaihua/.bashrc
conda --version
Later, choose which virtual environment you want to utilise. Currently, there are mainly two environments: (base) and (py36), and you can activate one of them by entering "conda activate" command, as shown below:
conda activate base
conda activate py36
Different environments would contain different packages of different versions, which will meet the requirements of different tasks. Check the details of these packages by entering:
conda list
Here are examples to read perspective-taking fMRI data in Python.
Take the NII file '/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/bold/007/sfmcstc.nii'
for example.
import nibabel as nb
nii_img = nb.load("/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/bold/007/sfmcstc.nii")
nii_data = nii_img.get_fdata() # shape: (72, 72, 42, 101)
affine = nii_img.affine # affine matrix, shape: (4, 4)
hdr = nii_img.header # header
import mne
stc = mne.read_source_estimate("/space_lin2/fhlin/perspective/subj_02/epi_data/unpack/f1/subj_02_2_fsaverage_sfmcstc-lh.stc") # <class 'mne.source_estimate.SourceEstimate'>
stc_data = stc.data # shape: (20484, 100), <class 'numpy.ndarray'>
stc_vertices = stc.vertices # shape: (2, 10242)
stc_tmin = stc.tmin # i.e., epoch_begin_latency=0.0
stc_tstep = stc.tstep # i.e., sample_period=2.0
stc_times = stc.times # shape: (87,)
The resulting STC files can be read by this python script file(not fully completed)
...
Assume we have got the fMRI data and its target in the form of numpy array, of which the shapes are:
print(fmri_data.shape) # shape: (36, 17, 150); (samples, scenes, rois)
print(behavior_targets.shape) # shape: (36,)
Before applying standardization, we should first convert the original fMRI data from 3 dimensions to 2 dimensions:
fmri_data_shape1 = [fmri_data.shape[0], fmri_data.shape[1], fmri_data.shape[2]] # [36, 17, 150]
fmri_data = fmri_data.reshape(fmri_data_shape1[0]*fmri_data_shape1[1], fmri_data_shape1[2]) # shape: (36, 17, 150) -> (612, 150)
Then we can apply Standardization by utilizing the function "StandardScaler" in scikit-learn:
standardization = True
if standardization:
ss = StandardScaler()
fmri_data = ss.fit_transform(fmri_data) # fit and transform
Similarly, we can apply PCA by utilizing the function "PCA" in scikit-learn:
use_pca = True
if use_pca:
pca = PCA(n_components=0.85) # remain 85% variance explanation
fmri_data = pca.fit_transform(fmri_data) # shape: (612, 150) -> (612, 8)
After applying standardization or PCA, we need to restore the data shape from 2 dimensions to 3 dimensions:
fmri_data = fmri_data.reshape(fmri_data_shape1[0], fmri_data_shape1[1], -1) # shape: (612, 8) -> (36, 17, 8)
For temporal models like LSTM, shape (36, 17, 8) can directly serve as the input. But for other conventional machine learning regression models, we still need to further flatten the temporal (i.e., scenes) and spatial (i.e., rois after PCA) dimension:
fmri_data_shape2 = [fmri_data.shape[0], fmri_data.shape[1], fmri_data.shape[2]] # [36, 17, 8]
fmri_data = fmri_data.reshape(fmri_data_shape2[0], fmri_data_shape2[1]*fmri_data_shape2[2]) # shape: (36, 17, 8) -> (36, 136)
Splitting dataset can be achieved by applying the function "train_test_split" in scikit-learn:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fmri_data, behavior_targets, test_size=0.2, random_state=42) # selecting 20% samples as the testing set
For example, we can choose SVR, RIDGE, LASSO, ELASTC, Random Forest as our models, which can be easily defined by scikit-learn:
from sklearn.svm import SVR
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.ensemble import RandomForestRegressor
svr_reg = SVR(kernel='linear')
ridge_reg = Ridge(alpha=1000, solver="cholesky")
lasso_reg = Lasso(alpha=1000)
elastic_reg = ElasticNet(alpha=0.1,l1_ratio=0.5)
rf_reg = RandomForestRegressor(max_depth=3, n_estimators=10000)
In "3. processing data in Python", we have got the X_train, X_test, y_train, y_test. Therefore, we can use "fit" our models by using these data:
svr = svr_reg.fit(X_train, y_train)
ridge = ridge_reg.fit(X_train, y_train)
lasso = lasso_reg.fit(X_train, y_train)
elastic = elastic_reg.fit(X_train, y_train)
rf = rf_reg.fit(X_train, y_train)
Taking ELASTIC for example, we express our results mainly through RMSE and R2 score:
from sklearn.metrics import mean_squared_error
print("(train) elastic score: ", elastic.score(X_train, y_train))
print("(train) elastic RMSE: ", np.sqrt(mean_squared_error(elastic.predict(X_train),y_train)))
print("(test) elastic score: ", elastic.score(X_test, y_test))
print("(test) elastic RMSE: ", np.sqrt(mean_squared_error(elastic.predict(X_test),y_test)))
We use Pytorch to define LSTM:
import torch
import torch.nn as nn
class LSTMNet(nn.Module):
def __init__(self, in_size, out_size):
super(LSTMNet, self).__init__()
self.lstm = nn.LSTM(in_size, 64, 1, batch_first=True)
self.drop = nn.Dropout(0.6)
self.fc1 = nn.Linear(64, 32)
self.fc2 = nn.Linear(32, out_size)
self.sigmoid = nn.Sigmoid()
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
def forward(self, x):
batch_size = x.size(0)
h_0 = torch.zeros(1, batch_size, 64)
c_0 = torch.zeros(1, batch_size, 64)
out, (final_hidden_state, final_cell_state) = self.lstm(x, (h_0, c_0))
out = self.drop(out)
out = self.tanh(out)
out = self.fc1(out)
out = self.drop(out)
out = self.tanh(out)
out = self.fc2(out)
out = out[:, -1, :]
out = out.squeeze(-1)
return out
from sklearn.metrics import mean_squared_error, r2_score
model = LSTMNet(fmri_data.shape[2], 1)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2, weight_decay=5e-2)
criterion = nn.MSELoss()
epochs = 200
train_loss = []
test_loss = []
train_r2_score = []
test_r2_score = []
for epoch in range(epochs):
print("--------------------------------")
print("epoch: ", epoch)
inputs, labels = torch.tensor(X_train, dtype=torch.float32), torch.tensor(y_train, dtype=torch.float32)
outputs = model(inputs)
optimizer.zero_grad()
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss.append(np.sqrt(loss.data))
print("train_loss: ", train_loss[-1].detach().numpy())
train_r2_score.append(r2_score(labels.detach().numpy(), outputs.detach().numpy()))
model.eval()
with torch.no_grad():
inputs, labels = torch.tensor(X_test, dtype=torch.float32), torch.tensor(y_test, dtype=torch.float32)
outputs = model(inputs)
loss = criterion(outputs, labels)
test_loss.append(np.sqrt(loss.data))
print("test_loss: ", test_loss[-1].detach().numpy())
test_r2_score.append(r2_score(labels.detach().numpy(), outputs.detach().numpy()))
if test_loss[-1] <= min(test_loss):
torch.save(model.state_dict(), 'model/best_model.pth') # save the model's parameters in the epoch that has the min RMSE on testing set
torch.save(model.state_dict(), 'model/last_model.pth') # save the model's parameters in the last epoch
Take the 'best_model.pth' for example:
model = LSTMNet(fmri_data.shape[2], 1).eval()
model.load_state_dict(torch.load("model/best_model.pth"))
train_predict = model(torch.tensor(X_train, dtype=torch.float32)).detach().numpy()
test_predict = model(torch.tensor(X_test, dtype=torch.float32)).detach().numpy()