Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP, ENH] Add masking functions and simplify IO #70

Closed
wants to merge 13 commits into from
162 changes: 162 additions & 0 deletions tedana/utils/masking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
"""
"""
import numpy as np
import pandas as pd
import nibabel as nib


def row_idx(arr1, arr2):
"""
Get a 1D index of rows in arr1 that exist in arr2.

Parameters
----------
arr1 : (X x 3) :obj:`numpy.ndarray`
arr2 : (Z x 3) :obj:`numpy.ndarray`

Returns
-------
idx : 1D :obj:`numpy.ndarray`
Index of rows in arr1 that exist in arr2.

Notes
-----
This works amazingly well, but is quite slow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this much faster by subbing out pandas, if we stick with this approach !

"""
df1 = pd.DataFrame(arr1, columns=['x', 'y', 'z'], dtype=str)
df2 = pd.DataFrame(arr2, columns=['x', 'y', 'z'], dtype=str)

df1['unique_value'] = df1[['x', 'y', 'z']].apply(lambda x: '_'.join(x),
axis=1)
df1 = df1[['unique_value']]
df1['idx1'] = df1.index
df1 = df1.set_index('unique_value')

df2['unique_value'] = df2[['x', 'y', 'z']].apply(lambda x: '_'.join(x),
axis=1)
df2 = df2[['unique_value']]
df2['idx2'] = df2.index
df2 = df2.set_index('unique_value')

catted = pd.concat((df1, df2), axis=1, ignore_index=False)
if any(pd.isnull(catted['idx1'])):
raise Exception('We have a weird error where there is >=1 '
'voxel in echo-specific mask outside of union mask.')

catted = catted.dropna()
catted = catted.sort_values(by=['idx2'])
rel_echo_idx = catted['idx1'].values
return rel_echo_idx


def apply_mask_me(img, mask_img):
"""
Apply multi-echo mask to set of nifti images. Mask may vary by echo.

Parameters
----------
img : (X x Y x Z x E [x T]) :obj:`nibabel.nifti.Nifti1Image`
Data img.
mask_img : (X x Y x Z [x E]) :obj:`nibabel.nifti.Nifti1Image`
Mask img.

Returns
-------
masked_arr : (M x E [x T]) :obj:`numpy.ndarray`
Masked data. M refers to the number of voxels in the mask img.
"""
if not isinstance(mask_img, nib.Nifti1Image):
raise TypeError('Provided mask is not a Nifti1Image.')
elif len(mask_img.shape) not in (3, 4):
raise ValueError('Mask must be 3D (X x Y x Z) or 4D (X x Y x Z x E).')

if not np.array_equal(img.affine, mask_img.affine):
raise ValueError('Input img affine must match mask_img affine.')

data = img.get_data()
if len(img.shape) == 4:
_, _, _, n_echos = img.shape
n_vols = 1
data = data[:, :, :, :, None]
elif len(img.shape) == 5:
_, _, _, n_echos, n_vols = img.shape
else:
raise ValueError('Input img must be 4D (X x Y x Z x E) or '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid error to pass back up to the user ? They'll have either passed in a zcat ( X x Y x Z x E x T) or individual echo files, so I don't think they would interact with a (X x Y x Z x E) image..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_data2 would load zcat (X x Y x Z*E x T), the preferred 5D (X x Y x Z x E x T) and individual echo files all into the same X x Y x Z x E x T-ordered image type, so masking and unmasking would respectively take in and return only that image type. We could allow other image types, but I think that it's unnecessary. Multiple individual echo images could just be masked/unmasked with nilearn's functions, and we should have to deal with zcat images (even when we do have zcat files) because we'll have load_data2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I misinterpreted your point. I guess it would be rare to feed in an image that isn't X x Y x Z x E x T or X x Y x Z x E, but I think that just makes this error unlikely (but not invalid).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being unclear ! To clarify: shouldn't it be a X x Y x Z x T file, not a X x Y x Z x E file ? The former would be an individual echo file, the latter is something I don't think users will interact with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Images with X x Y x Z x T data shouldn't be used with the multi-echo masking/unmasking functions. In that situation, nilearn's functions are more appropriate. Plus, load_data would take in multiple X x Y x Z x T files and return an X x Y x Z x E x T image, so X x Y x Z x T images shouldn't occur with multi-echo data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree, but I'm not sure how this error would make sense to a user, since they'll have only supplied the individual echo files or the zcat image (as far as I understand) ! The X x Y x Z x E file would then be created "under-the-hood" in a workflow.

'5D (X x Y x Z x E x T).')

n_x, n_y, n_z = mask_img.shape[:3]
mask_arr = np.array(mask_img.get_data()).astype(bool)
if mask_arr.ndim == 3:
# We can probably simplify/speed things up when the mask is not
# echo-dependent
mask_arr = mask_arr[:, :, :, None]
mask_arr = np.tile(mask_arr, (1, 1, 1, n_echos))

union_mask = np.any(mask_arr, axis=3)
union_idx = np.vstack(np.where(union_mask)).T
masked_arr = np.empty((np.sum(union_mask), n_echos, n_vols))
masked_arr[:] = np.nan

for i_echo in range(n_echos):
echo_mask = mask_arr[:, :, :, i_echo]
abs_echo_idx = np.vstack(np.where(echo_mask)).T
rel_echo_idx = row_idx(union_idx, abs_echo_idx)
masked_arr[rel_echo_idx, i_echo, :] = data[abs_echo_idx[:, 0],
abs_echo_idx[:, 1],
abs_echo_idx[:, 2],
i_echo, :]
return masked_arr


def unmask_me(X, mask_img):
"""
Unmask multi-echo data to nifti image. Mask may vary by echo.

Parameters
----------
X : (M x E [x T]) :obj:`numpy.ndarray`
Masked data. M refers to the number of voxels in the mask img.
mask_img : (X x Y x Z [x E]) :obj:`nibabel.nifti.Nifti1Image`
Mask img.

Returns
-------
out_img : (X x Y x Z x E [x T]) :obj:`nibabel.nifti.Nifti1Image`
Data img.
"""
if not isinstance(mask_img, nib.Nifti1Image):
raise TypeError('Provided mask is not a Nifti1Image.')
elif len(mask_img.shape) not in (3, 4):
raise ValueError('Mask must be 3D (X x Y x Z) or 4D (X x Y x Z x E).')

if X.ndim == 2:
X = X[:, :, None]
elif X.ndim != 3:
raise ValueError('X must be 2D (M x E) or 3D (M x E x T).')

_, n_echos, n_vols = X.shape

mask_arr = np.array(mask_img.get_data()).astype(bool)
if mask_arr.ndim == 3:
mask_arr = mask_arr[:, :, :, None]
mask_arr = np.tile(mask_arr, (1, 1, 1, n_echos))
n_x, n_y, n_z = mask_arr.shape[:3]

out = np.zeros((n_x, n_y, n_z, n_echos, n_vols))
for i_vol in range(n_vols):
unmasked_arr = np.zeros(mask_arr.shape)
for j_echo in range(n_echos):
echo_x = X[:, j_echo, i_vol]
# NaNs should occur where data have been masked out by echo-
# specific mask but where voxels exist in overall mask
echo_x = echo_x[~np.isnan(echo_x)]
echo_mask_idx = np.vstack(np.where(mask_arr[:, :, :, j_echo])).T
if echo_x.shape[0] != echo_mask_idx.shape[0]:
raise ValueError('Masked data do not match dimensions of '
'echo-specific mask.')
unmasked_arr[echo_mask_idx[:, 0], echo_mask_idx[:, 1],
echo_mask_idx[:, 2], j_echo] = echo_x
out[:, :, :, :, i_vol] = unmasked_arr
out_img = nib.Nifti1Image(out, header=mask_img.header,
affine=mask_img.affine)
return out_img
82 changes: 82 additions & 0 deletions tedana/utils/utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
Utilities for tedana package
"""
import warnings
import os.path as op

import nibabel as nib
Expand Down Expand Up @@ -106,6 +107,87 @@ def load_image(data):
return fdata


def load_data2(data, n_echos=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used anywhere, at this point ?

Copy link
Member Author

@tsalo tsalo Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would replace load_data as the input function for multi-echo data files, but it isn't used yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks !! Can we drop the gifti support in there too, then ?

"""
Load files into image object.

Parameters
----------
data : file, list of files, img_like, or list of img_like
Input multi-echo data array, where `X` and `Y` are spatial dimensions,
`M` is the Z-spatial dimensions with all the input echos concatenated,
and `T` is time. A list of image-like objects (e.g., .nii or .gii) are
accepted, as well
n_echos : int, optional
Number of echos in provided data array. Only necessary if `data` is
a z-concatenated file. Default: None

Returns
-------
out_img : img_list
Nifti1Image or GiftiImage object containing data from input file(s).
"""
if isinstance(data, list) and len(data) > 1:
if len(data) == 2: # inviable -- need more than 2 echos
raise ValueError('Cannot run `tedana` with only two echos: '
'{}'.format(data))
else: # individual echo files were provided (surface or volumetric)
if np.all([isinstance(f, str) for f in data]):
data = [nib.load(f) for f in data]

if get_dtype(data) == 'GIFTI':
# Compile data across echoes
arrays = [np.vstack([dat.data for dat in img.darrays]) for img
in data]
arr = np.stack(arrays, axis=1)

# Split by volume
split = np.split(arr, np.arange(1, arr.shape[0]), axis=0)
split = [vol.squeeze().T for vol in split] # S x E darrays
darrs = [nib.gifti.GiftiDataArray(vol) for vol in split]
out_img = nib.gifti.GiftiImage(header=data[0].header,
darrays=darrs)
elif get_dtype(data) == 'NIFTI':
arr = np.stack([img.get_data() for img in data], axis=4)
out_img = nib.Nifti1Image(arr, data[0].affine)
else:
raise TypeError('Input file(s) must be nifti or gifti.')
return out_img
elif isinstance(data, list) and len(data) == 1:
# A single multi-echo nifti/gifti file
data = data[0]

if isinstance(data, str):
img = nib.load(data)
else:
img = data

# we have a z-cat file -- we need to know how many echos are in it!
if get_dtype(img) == 'NIFTI' and len(img.shape) == 4:
warnings.warn(('In the future, support for z-concatenated images will '
'be removed. Please store multi-echo data as a set '
'of 4D files or as a single 5D (X x Y x Z x E x T) '
'file.'),
FutureWarning)
if n_echos is None:
raise ValueError('For z-concatenated images, n_echos must be '
'provided.')
elif img.shape[2] % n_echos != 0:
raise ValueError('Number of echoes does not divide evenly into '
'data.')
else:
(nx, ny), nz = img.shape[:2], img.shape[2] // n_echos
fdata = img.get_data().reshape(nx, ny, nz, -1, n_echos, order='F')
out_img = nib.Nifti1Image(fdata, img.affine, header=img.header,
extra=img.extra)
out_img.header.extensions = []
out_img.header.set_sform(out_img.header.get_sform(), code=1)
else:
out_img = img

return out_img


def load_data(data, n_echos=None):
"""
Coerces input `data` files to required 3D array output
Expand Down