-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnose ds210 failure #2731
Diagnose ds210 failure #2731
Conversation
Looks like that does it. Rerunning to confirm. Will try to reproduce locally. |
@effigies were you able to replicate this? I just tried locally (with the same random seed) and it completed.. |
Sorry, should have updated. I did try locally and couldn't replicate. If we upload the entire working directory as artifacts, that might be the best bet for tracking it down. |
Update - I reran, this time using the fast-track anatomicals provided, and was able to replicate the crash. As predicted, the BOLD mask is awful. After looking into this, I think the failure point is Two things:
|
Wow great job catching it. Your two hypotheses are correct. The question is what is wrong in the fast track that ends up giving N4 a 4D file (?) Can you check what are the inputs to |
I think I got confused across working directories, because this didn't seem to be the case after retesting. But I think I have found a simple solution - we have not been passing the already calculated bold mask into the final boldref workflow. I submitted a quick fix (3911638) for the particular pathway our ds210 test takes (ME, SDC), but the others are unaccounted for and thus failing. |
@effigies i can't request you as a reviewer since it's your PR, but would appreciate a quick look (if you find some time) |
fmriprep/workflows/bold/base.py
Outdated
@@ -993,6 +994,9 @@ def init_func_preproc_wf(bold_file, has_fieldmap=False): | |||
(bold_hmc_wf, bold_bold_trans_wf, [ | |||
("outputnode.xforms", "inputnode.hmc_xforms"), | |||
]), | |||
(initial_boldref_wf, final_boldref_wf, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the idea of having a second go-round was to get a better mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be IMHO okay, but to me, the problem is that we do not know yet what's making the fast-track fall into this while the regular track is fine for the given seed.
We first need to bisect the issue and determine the point where inputs and (or) outputs deviate from the regular track.
We may live with this just fine, but if we do this not knowing why the error occurs, the regression will be almost certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the (silent) failure point is n4_correct, though from looking at the inputs it is not very clear why one case is failing while the other is fine. I've compiled the working directories of fast-track(ft)/no fast-track(noft) within a tarball (ds210_n4_error.tar.gz) if either of you would like to take a look.
@mgxd Very unlikely today, as we'll be wrapping our sprint. I'll have some time on the plane and train, though. |
The masks are different. That doesn't justify the crash (as in, it's crazy the mask makes such a big difference for N4), but at least we know where the problem is coming from, right? Some thoughts that I'm skeptical will fully solve the issue but that together will give more reliability to the full process:
Only when these things have been tested and the outputs of N4 in both conditions are more similar, I would then consider whether we also want to use a prior mask to avoid these steps altogether. However, these steps are going to happen anyways in the first time around, so we want to make sure the workflow is more reliable. WDYT? |
I'm digging up more on the inputs.
That all said, there is also appeal in giving SynthStrip a go, and replace all of this if it works. |
Q: Is this happening with single-echo too? |
I would guess the difference is because the anatomical derivatives have been run on the high res ds210, whereas we're calculating the T1w/MNI transform using the downsampled outputs (and using
Yes, it would be good to test. But we'd still want to backport a fix for
Potentially (see #2761) but I haven't personally replicated / seen it |
Are we using FSL 6? Could the reason for these errors bubble up be an upgrade and this code https://github.com/nipreps/niworkflows/blob/e1f4267eb5fd878b2a001f184e1bddbb3f4a6843/niworkflows/func/util.py#L378-L386 introducing weirdness? Please check some of the ideas in nipreps/niworkflows#707. I'm pretty positive we want to do the clipping before registration and N4, but don't have time now to have a stab at it. |
I just ran with the following changes: nipreps/niworkflows@maint/1.4.x...mgxd:dbg/ds210-failure and was able to successfully complete, with normal looking reports. |
Yup, that's the weights instead of the mask, I'm sure of that. I have updated my PR with further changes (esp. removing the binarization and binary dilation for the no-premask pathway). |
96d3d05
to
fae2c19
Compare
interestingly, we're now running into
|
8905906
to
04dbcb5
Compare
3e5b0f2
to
916022f
Compare
merging since i'd like to get a release in before the weekend |
Thanks for going ahead. No objections. |
Starting by setting a random seed to verify that it can be reproduced.