-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Use different masking thresholds for denoising and classification #736
Conversation
masksum_denoise: Threshold of 1, used for denoising and optimal combination. masksum_clf: Threshold of 3, used for decomposition and component selection.
Codecov Report
@@ Coverage Diff @@
## main #736 +/- ##
==========================================
+ Coverage 93.19% 93.20% +0.01%
==========================================
Files 27 27
Lines 2204 2209 +5
==========================================
+ Hits 2054 2059 +5
Misses 150 150
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I still plan to run this on several data sets to visually make sure everyone looks good. I hope to finish that up today.
One suggested change is to alter variable names with something like:
mask_denoise, masksum_denoise -> mask_1echo, masksum_1echo
mask_clf, masksum_clf -> mask_3echo, masksum_3echo
My sense is that this will help people keep track of the masks better. That is, one mask is for all voxels with 1 good echo and the other is for voxels with at lest 3 good echoes. Naming them by their purpose (denoise
and clf
) seems like it will cause confusion, particularly when we do things like classify with _clf and display the classifications with _denoise. This is NOT a strong opinion on my part.
My other general suggestion regards documentation and comments. With these two masks floating around, it's not always clear how various functions are handling masks with only one good echo. For example, to understand how a mask with only one good echo is treated in decay.fit_decay you really need to dig into the code. I'd suggest expanding some of the explanation for fit_decay and maybe adding more general documentation on how why we have these two masks bouncing around the code. This isn't critical for this PR so, if you want to merge this PR and open and new issue to improve this documentation, I'd be fine with that.
@tsalo I'm seeing some issues when running this PR. The clearest issue is the program crashes when I run it with
I've found one additional error that might be related. I'll update if I localize this more, but wanted to share as is. I'm running this locally using the 5 echo testing dataset with commands like:
|
I don't think the EDIT: I can replicate this on other data, though, and the reason is that there are no non-significant Kappa values, so it's trying to get an elbow out of an empty array. I think that's a general bug rather than something introduced in this PR. EDIT 2: Indeed, this is the same general error as in #181. |
@handwerkerd I've opened #752 about the Regarding the other big issue (documentation and variable naming), I will start working on that. To start, here is something I'd like to add to the text report, right after the adaptive masks are created:
What do you think of it? I'd like to workshop it before merging. |
Now that the full maps are used throughout the pipeline, they are the primary T2*/S0 outputs. The limited maps are not used for anything, and are only written out if the workflow is run in verbose mode.
The good/bad news is that both of the issues I list above: crashing with the curvefit option and the adaptive mask not playing well with AFNI also occur on the main branch. I need to get a few other things done today, but will try to share more info soon. |
I think I figured out the issue with the unreadable desc-adaptiveGoodSignal_mask.nii.gz |
It must be int64 because we read the data initially as float32 data, and convert it to int, and then it wants something with large enough capacity to fit a 32-bit float that's been converted to an integer. We may be able to get around it by specifying a datatype when we do the mask, since it should make a new matrix anyway. At least, if I'm recalling the code correctly, that's what happens and why we may be seeing this behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR I think looks good. I agree with @handwerkerd that the documentation on what's going on is probably not enough to make things to clear in the future. However, I don't think that this is the fault of the PR, I just think that the PR made it clear how confusing the whole process is instead of exacerbating it by much (since previously we had one mask doing two tasks instead, which doesn't particularly seem clear either). I lean towards approving and merging this, and then deciding how to tackle the documentation differently since that problem is fairly pervasive throughout the package.
I think that difference comes from the change in the T2* map. I tested by running on one of the runs from Zaki's preprocessed Cambridge dataset with (1) the main branch, (2) the split-adaptive-mask branch, and (3) the split-adaptive-mask branch, using the T2* map from the run on the main branch. The main branch and split-adaptive-mask branch results differ, but the main branch and split-adaptive-mask branch + main branch's T2* map results appear almost identical. With main branchtedana -d \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-1_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-2_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-3_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-4_space-scanner_desc-partialPreproc_bold.nii.gz \
-e 12 28 44 60 --tedpca mdl \
--debug \
--out-dir /Users/taylor/Documents/datasets/cambridge/tedana-main/ With split-adaptive-mask branchtedana -d \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-1_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-2_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-3_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-4_space-scanner_desc-partialPreproc_bold.nii.gz \
-e 12 28 44 60 --tedpca mdl \
--debug \
--out-dir /Users/taylor/Documents/datasets/cambridge/tedana-sam/ With split-adaptive-mask branch, using t2smap from main branch# with split-adaptive-mask branch
tedana -d \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-1_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-2_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-3_space-scanner_desc-partialPreproc_bold.nii.gz \
/Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-4_space-scanner_desc-partialPreproc_bold.nii.gz \
-e 12 28 44 60 --tedpca mdl \
--debug \
--t2smap /Users/taylor/Documents/datasets/cambridge/tedana-main/T2starmap.nii.gz \
--out-dir /Users/taylor/Documents/datasets/cambridge/tedana-sam-t2s/ EDIT: Also, looking at the correlations between the ICA components, my interpretation seems to bear out. The correlations between components from |
Sorry for the delay on this, but I'm still trying to wrap my head around things and I do think there's an issue. The ICA maps shouldn't be similar. They should be identical. The big change here is that there are voxels in the optimally combined data with <3 good echoes that would otherwise have been excluded. When we run the PCA (line 563 of tedana.py) we use I'm a bit stuck trying to figure out why this is happening and if I'm making a silly mistake. Can others confirm that [Update: |
That's true, but I think the differences just come from changes in resolution due to loading a file instead of working directly from an array. Just to make sure, I ran six versions with (1) the most current version of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With @tsalo last check and I gave this one more read-through, I think this looks good. Any issues I have with this aren't due to this PR.
It looks like there's now a CI failure, but hopefully running it again will fix that.
I restarted the failing workflow and it looks like it fixed itself. |
Looks like we have two approvals, which I didn't realize. Does anyone have any issues with me merging this? |
Closes #685 and closes #680.
NOTE: While the proposed changes should affect the coverage of the T2*, combined, and denoised data, they should not impact component classification or values within the original coverage of any map.
Changes proposed in this pull request:
[mask|masksum]_denoise
: Threshold of 1, used for denoising and optimal combination.[mask|masksum]_clf
: Threshold of 3, used for decomposition and component selection.T2starmap
andS0map
), while the limited maps are now only written out on verbose runs and are nameddesc-limited_[T2starmap|S0map]
.