-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, ENH] Add masking functions and simplify IO #70
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! Overall, a good start towards your proposed layout. I've made a few comments in places I think some clarification might be warranted: namely, there are a few spots where I think your gifti file manipulation isn't going to work. Also, and perhaps more importantly, I'm not sure about the utility of echo-specific masks! What's your thinking here? If there's no reason, I think that removing that functionality would make for some much easier code!
tedana/utils/utils.py
Outdated
data = [nib.load(f) for f in data] | ||
|
||
if isinstance(data[0], nib.gifti.gifti.GiftiImage): | ||
arrays = [np.vstack([dat.data for dat in img.darrays]) for img |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd need np.hstack
here if you want samples x time
arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doing this section wrong in a couple of ways, so I've had to alter it significantly in my newest commit, but thanks for catching this. It convinced me to take a closer look.
tedana/utils/utils.py
Outdated
arrays = [np.vstack([dat.data for dat in img.darrays]) for img | ||
in data] | ||
arr = np.stack(arrays, axis=1) | ||
out_img = nib.gifti.GiftiImage(img.header, arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe img.header
is called before being defined, here!
But, perhaps more importantly, I don't think this will actually generate a working GiftiImage
. You'd need to use kwargs when instantiating, as in nib.gifti.GiftiImage(header=img.header, darrays=arr)
. However, it's worth noting that while that will technically work, it wouldn't be a valid gifti image file since the data is being stored as a standard numpy.ndarray
and not a GiftiDataArray
instance (trying to save it to a file would raise an error).
You could modify the tedana.utils.new_gii_like
function (to account for the echos
dimension) to output an appropriate gifti image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right. I will try to shift the work to tedana.utils.new_gii_like
soon, but am pushing a commit that works on some simulated data for now.
tedana/utils/masking.py
Outdated
img : :obj:`nibabel.gifti.gifti.GiftiImage` | ||
3D (Vertex, Echo, Time) gifti image containing data. | ||
mask_img : :obj:`nibabel.gifti.gifti.GiftiImage` | ||
1D (Vertex) or 2D (Vertex, Echo) gifti image with boolean or bool-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of how we would use a mask that differed based on echo. I don't think that the current pipeline has such a use case, at least! What was your intended application?
tedana/utils/masking.py
Outdated
raise TypeError('Provided data img is not a GiftiImage.') | ||
|
||
img_arrs = [np.array(darr.data) for darr in img.darrays] | ||
all_imgs_same = np.all(arr.shape == img_arrs[0].shape for arr in img_arrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need a list comprehension here otherwise you get a generator: all_imgs_same = np.all([arr.shape == img_arrs[0].shape for arr in img_arrs])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Nice catch!
tedana/utils/masking.py
Outdated
masked_arrs = [np.empty((np.sum(union_mask), n_echos)) for _ in | ||
range(len(img_arrs))] | ||
for i_vol in range(n_vols): | ||
masked_arrs[i_vol][:] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally clear on why we want np.nan
in the masked data. Rather, I understand why the np.nan
are there, but I think this goes back to my comment above: I'm not sure that we need echo-specific masks! I'm happy to be convinced of their utility, but I can't envision a use case at the current juncture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that having echo-specific masks is the same as using make_adaptive_mask
, but this way only data need to be passed around after initial masking, rather than data + a mask. I'd rather avoid having any masks or mask-like arrays passed around if the data can carry the same information implicitly. It should be easy enough to get the min_mask
or the adaptive_mask
from an array that has NaNs
in voxels from echoes with dropout in those areas.
tedana/utils/masking.py
Outdated
idx_val in abs_echo_idx]) | ||
for j_vol in range(n_vols): | ||
masked_arrs[j_vol][i_echo, rel_echo_idx] = \ | ||
img_arrs[j_vol][i_echo, abs_echo_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what shape the darrays
in the initial gifti image are in. Are we storing them as samples x echos
arrays, where each GiftiDataArray
is a separate volume? That's what I assumed based on load_data2()
, but here it seems they're echo x samples
arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I wasn't as careful about the order of dimensions as I should have been. Do you (or @emdupre) have a preference? Either way, each GiftiDataArray
will be a separate volume, but I don't currently have a preference for echo x sample
or sample x echo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might prefer sample x echo
to be consistent with the original formatting X x Y x Z x echo x time
, where X x Y x Z
are being collapsed into sample
. WDYT ?
I was under the impression that |
Codecov Report
@@ Coverage Diff @@
## master #70 +/- ##
==========================================
- Coverage 45.81% 42.84% -2.98%
==========================================
Files 29 30 +1
Lines 1602 1718 +116
==========================================
+ Hits 734 736 +2
- Misses 868 982 +114
Continue to review full report at Codecov.
|
@rmarkello just issued a hotfix for file handling in #72 -- could you merge in master on this, @tsalo ? I'd like to try and have both of these PRs in a |
Okay, I think this should work now. The I'm holding off on writing tests until after we've reviewed and agreed on an implementation. |
@rmarkello @emdupre I have a question about T2* estimation that might significantly affect this PR. It seems that in the I had assumed that the full map would use one of the following:
If data from "bad" echoes is going to be used (as with the T2* estimation method currently implemented or the second and third ideas above), then the whole img-based masking/unmasking approach probably needs to be rethought |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading is that t2ss
is the full log-linear fit, t2sa
is the limited map only considering those voxels which have good signal from two or more echoes, and t2saf
is t2sa
but adding back in information from t2ss
for the single-echo signal voxels.
I might be totally misinterpreting, so I'd appreciate if you could set me straight, @tsalo ! Or from anyone else, too. It'd be great to get this sorted. It'd also be great if we gave these things better names ! fl
is maybe my least favorite of the variable names in the code you linked, though none are great....
@@ -106,6 +107,87 @@ def load_image(data): | |||
return fdata | |||
|
|||
|
|||
def load_data2(data, n_echos=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used anywhere, at this point ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would replace load_data
as the input function for multi-echo data files, but it isn't used yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks !! Can we drop the gifti support in there too, then ?
|
||
Notes | ||
----- | ||
This works amazingly well, but is quite slow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make this much faster by subbing out pandas
, if we stick with this approach !
elif len(img.shape) == 5: | ||
_, _, _, n_echos, n_vols = img.shape | ||
else: | ||
raise ValueError('Input img must be 4D (X x Y x Z x E) or ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid error to pass back up to the user ? They'll have either passed in a zcat ( X x Y x Z x E x T
) or individual echo files, so I don't think they would interact with a (X x Y x Z x E)
image..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_data2
would load zcat (X x Y x Z*E x T
), the preferred 5D (X x Y x Z x E x T
) and individual echo files all into the same X x Y x Z x E x T
-ordered image type, so masking and unmasking would respectively take in and return only that image type. We could allow other image types, but I think that it's unnecessary. Multiple individual echo images could just be masked/unmasked with nilearn's functions, and we should have to deal with zcat images (even when we do have zcat files) because we'll have load_data2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I misinterpreted your point. I guess it would be rare to feed in an image that isn't X x Y x Z x E x T
or X x Y x Z x E
, but I think that just makes this error unlikely (but not invalid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being unclear ! To clarify: shouldn't it be a X x Y x Z x T
file, not a X x Y x Z x E
file ? The former would be an individual echo file, the latter is something I don't think users will interact with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Images with X x Y x Z x T
data shouldn't be used with the multi-echo masking/unmasking functions. In that situation, nilearn's functions are more appropriate. Plus, load_data
would take in multiple X x Y x Z x T
files and return an X x Y x Z x E x T
image, so X x Y x Z x T
images shouldn't occur with multi-echo data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree, but I'm not sure how this error would make sense to a user, since they'll have only supplied the individual echo files or the zcat image (as far as I understand) ! The X x Y x Z x E
file would then be created "under-the-hood" in a workflow.
From what I can tell, your interpretation of It doesn't make a ton of sense to me, and I think any of the three possibilities I mentioned above make more sense. I can make a new branch with a notebook to show what I mean. |
I think the critical line is here and I can remember we had a discussion about it, though I don't remember where or the resolution ! Will look for it. The main issue is that it seems that when we subtract two from the echo number it wraps back around for the first echo, filling that value for the last echo in the list. Is that also your understanding ? |
I think that's meant to be a feature, not a bug. I.e., the last column is filled first and then overwritten because the author didn't want to use those values at all. At least that's my interpretation based on the fact that tedana/tedana/model/monoexponential.py Lines 69 to 70 in 31504f6
But yeah, we could change the preallocation, the lines you referenced, and usage of |
I see. Thanks for helping me remember all of this ! 👍 Yes, I think you're right that it's a (desired) feature to overwrite the first echo, probably for the reasons you outlined i.e., because those values aren't useful. WDYT are the next steps, then ? I'm rather reluctant to change something so big as the T2* generation without a very good reason and a broader discussion.... |
I agree that we should discuss this with everyone before changing the behavior at all. It could be something for the group to discuss in person at INCF or we could bring it up on Gitter or over email. My concern for this PR is that the current approach doesn't work with my proposed changes because, if we want to use data from bad echoes for full or even two-echo fits (as is currently implemented), then explicitly masking the bad echoes with the multi-echo masks I've been working on will prevent it. |
I'm happy to close this for the foreseeable future, and do think that changing the fitting behavior on a conceptual level should wait until a major release. |
Changes proposed in this pull request:
tedana.utils.load_data
to allow a multi-echo or z-concatenated gii/nii file, or a list of single-echo gii/nii files. Will return a single img-like object.