[WIP, ENH] Add masking functions and simplify IO #70

tsalo · 2018-05-27T16:21:22Z

Changes proposed in this pull request:

Enhance tedana.utils.load_data to allow a multi-echo or z-concatenated gii/nii file, or a list of single-echo gii/nii files. Will return a single img-like object.
Add masking/unmasking functions for multi-echo gifti or nifti data.
- Masking functions should take in a multi-echo img-like object and a mask img-like object and should return an M x E x T numpy array.
- Unmasking functions should take in an M x E x T numpy array and a mask img-like object and should return a multi-echo img-like object.
- Mask images can either be general (i.e., the same for all echoes) or can vary by echo. Masked data will have NaNs in voxels/vertices outside an echo-specific mask but inside the overall mask in order to maintain the M x E x T shape without splitting the data array by echo.
Once this is merged, all unmasked data should be in img-like objects and masked data should be in numpy arrays.

# Conflicts: # tedana/utils/utils.py

rmarkello

Okay! Overall, a good start towards your proposed layout. I've made a few comments in places I think some clarification might be warranted: namely, there are a few spots where I think your gifti file manipulation isn't going to work. Also, and perhaps more importantly, I'm not sure about the utility of echo-specific masks! What's your thinking here? If there's no reason, I think that removing that functionality would make for some much easier code!

rmarkello · 2018-05-27T21:44:44Z

tedana/utils/utils.py

+                data = [nib.load(f) for f in data]
+
+            if isinstance(data[0], nib.gifti.gifti.GiftiImage):
+                arrays = [np.vstack([dat.data for dat in img.darrays]) for img


You'd need np.hstack here if you want samples x time arrays.

I was doing this section wrong in a couple of ways, so I've had to alter it significantly in my newest commit, but thanks for catching this. It convinced me to take a closer look.

rmarkello · 2018-05-27T21:54:53Z

tedana/utils/utils.py

+                arrays = [np.vstack([dat.data for dat in img.darrays]) for img
+                          in data]
+                arr = np.stack(arrays, axis=1)
+                out_img = nib.gifti.GiftiImage(img.header, arr)


I believe img.header is called before being defined, here!

But, perhaps more importantly, I don't think this will actually generate a working GiftiImage. You'd need to use kwargs when instantiating, as in nib.gifti.GiftiImage(header=img.header, darrays=arr). However, it's worth noting that while that will technically work, it wouldn't be a valid gifti image file since the data is being stored as a standard numpy.ndarray and not a GiftiDataArray instance (trying to save it to a file would raise an error).

You could modify the tedana.utils.new_gii_like function (to account for the echos dimension) to output an appropriate gifti image?

You're totally right. I will try to shift the work to tedana.utils.new_gii_like soon, but am pushing a commit that works on some simulated data for now.

rmarkello · 2018-05-27T21:58:41Z

tedana/utils/masking.py

+    img : :obj:`nibabel.gifti.gifti.GiftiImage`
+        3D (Vertex, Echo, Time) gifti image containing data.
+    mask_img : :obj:`nibabel.gifti.gifti.GiftiImage`
+        1D (Vertex) or 2D (Vertex, Echo) gifti image with boolean or bool-like


I can't think of how we would use a mask that differed based on echo. I don't think that the current pipeline has such a use case, at least! What was your intended application?

rmarkello · 2018-05-27T22:05:03Z

tedana/utils/masking.py

+        raise TypeError('Provided data img is not a GiftiImage.')
+
+    img_arrs = [np.array(darr.data) for darr in img.darrays]
+    all_imgs_same = np.all(arr.shape == img_arrs[0].shape for arr in img_arrs)


You'll need a list comprehension here otherwise you get a generator: all_imgs_same = np.all([arr.shape == img_arrs[0].shape for arr in img_arrs]).

Thx. Nice catch!

rmarkello · 2018-05-27T22:13:03Z

tedana/utils/masking.py

+    masked_arrs = [np.empty((np.sum(union_mask), n_echos)) for _ in
+                   range(len(img_arrs))]
+    for i_vol in range(n_vols):
+        masked_arrs[i_vol][:] = np.nan


I'm not totally clear on why we want np.nan in the masked data. Rather, I understand why the np.nan are there, but I think this goes back to my comment above: I'm not sure that we need echo-specific masks! I'm happy to be convinced of their utility, but I can't envision a use case at the current juncture.

I guess that having echo-specific masks is the same as using make_adaptive_mask, but this way only data need to be passed around after initial masking, rather than data + a mask. I'd rather avoid having any masks or mask-like arrays passed around if the data can carry the same information implicitly. It should be easy enough to get the min_mask or the adaptive_mask from an array that has NaNs in voxels from echoes with dropout in those areas.

rmarkello · 2018-05-27T22:19:10Z

tedana/utils/masking.py

+                                 idx_val in abs_echo_idx])
+        for j_vol in range(n_vols):
+            masked_arrs[j_vol][i_echo, rel_echo_idx] = \
+                img_arrs[j_vol][i_echo, abs_echo_idx]


I'm not sure I understand what shape the darrays in the initial gifti image are in. Are we storing them as samples x echos arrays, where each GiftiDataArray is a separate volume? That's what I assumed based on load_data2(), but here it seems they're echo x samples arrays.

Honestly, I wasn't as careful about the order of dimensions as I should have been. Do you (or @emdupre) have a preference? Either way, each GiftiDataArray will be a separate volume, but I don't currently have a preference for echo x sample or sample x echo.

I think I might prefer sample x echo to be consistent with the original formatting X x Y x Z x echo x time, where X x Y x Z are being collapsed into sample. WDYT ?

tsalo · 2018-05-29T15:44:06Z

I was under the impression that tedana used multi-echo masks. Is that not the case?

codecov · 2018-05-29T16:25:42Z

Codecov Report

Merging #70 into master will decrease coverage by 2.97%.
The diff coverage is 1.72%.

@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
- Coverage   45.81%   42.84%   -2.98%     
==========================================
  Files          29       30       +1     
  Lines        1602     1718     +116     
==========================================
+ Hits          734      736       +2     
- Misses        868      982     +114

Impacted Files	Coverage Δ
tedana/utils/masking.py	`0% <0%> (ø)`
tedana/utils/utils.py	`69.19% <5.4%> (-14.66%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31504f6...ef33750. Read the comment docs.

emdupre · 2018-06-01T22:08:16Z

@rmarkello just issued a hotfix for file handling in #72 -- could you merge in master on this, @tsalo ? I'd like to try and have both of these PRs in a 0.0.2 release, if possible ! But I'm planning to cut one soon with at least the corrected file handling.

tsalo · 2018-07-25T22:20:32Z

Okay, I think this should work now. The unmask_me function is pretty quick, but the apply_mask_me function is really slow (~30-40 seconds on simulated full-size data with 20 TRs). We can probably make it much faster when an echo-varying ME mask isn't used, which is good, but I don't know how to speed it up for ME masks.

I'm holding off on writing tests until after we've reviewed and agreed on an implementation.

tsalo · 2018-07-27T01:05:01Z

@rmarkello @emdupre I have a question about T2* estimation that might significantly affect this PR. It seems that in the t2ss array (the one we have documented with "???"), which is shaped M x E-1, the first column is the voxel-wise estimates of T2* using the first two echoes, the next column uses three echoes, etc. Then, the full T2* map (t2saf) uses T2* estimated from the first two echoes for voxels with good data for only one echo (see here). Does that make sense?

I had assumed that the full map would use one of the following:

Some notation so that make_optcom would use the single echo's value (but it doesn't seem to do that).
Whatever estimated T2* that would correspond to a simple average across echoes.
Or, following the logic that seems to be used in the code, fitting to all of the echoes even though only one is "good."

If data from "bad" echoes is going to be used (as with the T2* estimation method currently implemented or the second and third ideas above), then the whole img-based masking/unmasking approach probably needs to be rethought ☹️

emdupre

My reading is that t2ss is the full log-linear fit, t2sa is the limited map only considering those voxels which have good signal from two or more echoes, and t2saf is t2sa but adding back in information from t2ss for the single-echo signal voxels.

I might be totally misinterpreting, so I'd appreciate if you could set me straight, @tsalo ! Or from anyone else, too. It'd be great to get this sorted. It'd also be great if we gave these things better names ! fl is maybe my least favorite of the variable names in the code you linked, though none are great....

emdupre · 2018-07-30T01:03:46Z

tedana/utils/utils.py

@@ -106,6 +107,87 @@ def load_image(data):
    return fdata


+def load_data2(data, n_echos=None):


Is this being used anywhere, at this point ?

It would replace load_data as the input function for multi-echo data files, but it isn't used yet.

Got it, thanks !! Can we drop the gifti support in there too, then ?

emdupre · 2018-07-30T01:04:14Z

tedana/utils/masking.py

+
+    Notes
+    -----
+    This works amazingly well, but is quite slow.


We can make this much faster by subbing out pandas, if we stick with this approach !

emdupre · 2018-07-30T01:06:07Z

tedana/utils/masking.py

+    elif len(img.shape) == 5:
+        _, _, _, n_echos, n_vols = img.shape
+    else:
+        raise ValueError('Input img must be 4D (X x Y x Z x E) or '


Is this a valid error to pass back up to the user ? They'll have either passed in a zcat ( X x Y x Z x E x T) or individual echo files, so I don't think they would interact with a (X x Y x Z x E) image..?

load_data2 would load zcat (X x Y x Z*E x T), the preferred 5D (X x Y x Z x E x T) and individual echo files all into the same X x Y x Z x E x T-ordered image type, so masking and unmasking would respectively take in and return only that image type. We could allow other image types, but I think that it's unnecessary. Multiple individual echo images could just be masked/unmasked with nilearn's functions, and we should have to deal with zcat images (even when we do have zcat files) because we'll have load_data2

Oh sorry, I misinterpreted your point. I guess it would be rare to feed in an image that isn't X x Y x Z x E x T or X x Y x Z x E, but I think that just makes this error unlikely (but not invalid).

Sorry for being unclear ! To clarify: shouldn't it be a X x Y x Z x T file, not a X x Y x Z x E file ? The former would be an individual echo file, the latter is something I don't think users will interact with.

Images with X x Y x Z x T data shouldn't be used with the multi-echo masking/unmasking functions. In that situation, nilearn's functions are more appropriate. Plus, load_data would take in multiple X x Y x Z x T files and return an X x Y x Z x E x T image, so X x Y x Z x T images shouldn't occur with multi-echo data.

Completely agree, but I'm not sure how this error would make sense to a user, since they'll have only supplied the individual echo files or the zcat image (as far as I understand) ! The X x Y x Z x E file would then be created "under-the-hood" in a workflow.

tsalo · 2018-07-30T02:18:18Z

From what I can tell, your interpretation of t2ss isn't quite correct. It seems to be the log-linear fit for ascending numbers of echoes, starting with 2, which is why it's shaped M x E-1 (because 1-echo fit is not done/included). One problem is that it thus has values for voxels using more echoes than are good. For example, imagine a voxel j that has good data for only one echo (of 4). t2ss is going to have the beta for the model fitted on the first two echoes at t2ss[j, 0], the beta for the first three echoes at t2ss[j, 1], and the beta for all four at t2ss[j, 2]. Then, the value in t2sa[j] is going to be 0 or NaN (or whatever we set it to), but the value in t2saf[j] is going to be the fit using the first two echoes from t2ss (i.e., t2ss[j, 0]).

It doesn't make a ton of sense to me, and I think any of the three possibilities I mentioned above make more sense.

I can make a new branch with a notebook to show what I mean.

emdupre · 2018-07-30T02:30:45Z

I think the critical line is here and I can remember we had a discussion about it, though I don't remember where or the resolution ! Will look for it. The main issue is that it seems that when we subtract two from the echo number it wraps back around for the first echo, filling that value for the last echo in the list. Is that also your understanding ?

tsalo · 2018-07-30T02:46:00Z

I think that's meant to be a feature, not a bug. I.e., the last column is filled first and then overwritten because the author didn't want to use those values at all. At least that's my interpretation based on the fact that t2ss is preallocated with n_echos - 1 columns. If the single-echo fit was going to be kept around and used, then the array would have been made with n_echos columns.

tedana/tedana/model/monoexponential.py

Lines 69 to 70 in 31504f6

    
           t2ss = np.zeros([n_samp, n_echos - 1]) 
        
           s0vs = np.zeros([n_samp, n_echos - 1])

But yeah, we could change the preallocation, the lines you referenced, and usage of start_echo, which may be an input but is dealt with like a constant throughout, in order to add in the single-echo fit to t2ss and t2saf (and their S0 equivalents). However, do we want to use the single-echo fits? The values look funny (very small and low variance), which make sense given that it's fitting to multiple values (from diff volumes) of a single input (the one echo). We could use the full all-echo fit (including bad echoes) in t2saf or find some way to use the echo-averaged or single-echo value in make_optcom.

emdupre · 2018-07-30T03:13:43Z

I see. Thanks for helping me remember all of this ! 👍

Yes, I think you're right that it's a (desired) feature to overwrite the first echo, probably for the reasons you outlined i.e., because those values aren't useful.

WDYT are the next steps, then ? I'm rather reluctant to change something so big as the T2* generation without a very good reason and a broader discussion....

tsalo · 2018-07-30T03:22:55Z

I agree that we should discuss this with everyone before changing the behavior at all. It could be something for the group to discuss in person at INCF or we could bring it up on Gitter or over email.

My concern for this PR is that the current approach doesn't work with my proposed changes because, if we want to use data from bad echoes for full or even two-echo fits (as is currently implemented), then explicitly masking the bad echoes with the multi-echo masks I've been working on will prevent it.

emdupre · 2018-07-30T19:26:00Z

Ok -- WDYT of closing this for now, then, and opening a new issue to continue the discussion on the T2* map generation ? This, #101, and #14 seem like things we should decide upon for our 1.0.0 release, but we have a little time between now and then, IMHO.

tsalo · 2018-07-30T19:44:28Z

I'm happy to close this for the foreseeable future, and do think that changing the fitting behavior on a conceptual level should wait until a major release.

tsalo added 5 commits May 19, 2018 19:06

Update load_data.

c485153

Merge remote-tracking branch 'ME-ICA/master' into io

7eb6b5b

Add masking/unmasking functions.

dede0b8

Merge remote-tracking branch 'ME-ICA/master' into io

971e0c0

# Conflicts: # tedana/utils/utils.py

Update load_data2 docstring.

1f90bb8

tsalo mentioned this pull request May 27, 2018

Modularize functions for easier testing #50

Closed

rmarkello reviewed May 27, 2018

View reviewed changes

tsalo added 2 commits May 29, 2018 12:15

Address some points in review and fix bug in load_data2.

529c713

Merge remote-tracking branch 'ME-ICA/master' into io

37d5cc3

tsalo added 3 commits June 1, 2018 21:35

Merge remote-tracking branch 'ME-ICA/master' into io

acb37bf

Still working on nifti masking/unmasking.

47ca0fe

Merge remote-tracking branch 'ME-ICA/master' into io

437c855

handwerkerd mentioned this pull request Jul 6, 2018

Change number of volumes when saving images #82

Closed

handwerkerd mentioned this pull request Jul 25, 2018

Allow for an inputted mask #96

Closed

tsalo added 3 commits July 25, 2018 15:26

Merge remote-tracking branch 'ME-ICA/master' into io

9b9b3a0

Remove gifti functions and update documentation.

368a9f9

Get ME masking working!

ef33750

emdupre reviewed Jul 30, 2018

View reviewed changes

tsalo closed this Jul 30, 2018

tsalo mentioned this pull request Jul 30, 2018

Full T2*/S0 maps use two-echo fits for voxels with one good echo #102

Closed

tsalo mentioned this pull request Apr 5, 2019

Operate on data as imgs and lists of imgs throughout package #251

Closed

tsalo deleted the io branch November 19, 2019 02:09

		@@ -106,6 +107,87 @@ def load_image(data):
		return fdata


		def load_data2(data, n_echos=None):

[WIP, ENH] Add masking functions and simplify IO #70

[WIP, ENH] Add masking functions and simplify IO #70

Conversation

tsalo commented May 27, 2018

rmarkello left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsalo commented May 29, 2018

codecov bot commented May 29, 2018 • edited Loading

Codecov Report

emdupre commented Jun 1, 2018

tsalo commented Jul 25, 2018 • edited Loading

tsalo commented Jul 27, 2018

emdupre left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsalo Jul 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsalo commented Jul 30, 2018 • edited Loading

emdupre commented Jul 30, 2018

tsalo commented Jul 30, 2018

emdupre commented Jul 30, 2018

tsalo commented Jul 30, 2018

emdupre commented Jul 30, 2018

tsalo commented Jul 30, 2018

codecov bot commented May 29, 2018 •

edited

Loading

tsalo commented Jul 25, 2018 •

edited

Loading

emdupre left a comment •

edited

Loading

tsalo Jul 30, 2018 •

edited

Loading

tsalo commented Jul 30, 2018 •

edited

Loading