Diagnose ds210 failure #2731

effigies · 2022-03-03T21:55:47Z

Starting by setting a random seed to verify that it can be reproduced.

effigies · 2022-03-04T03:04:38Z

Looks like that does it. Rerunning to confirm. Will try to reproduce locally.

mgxd · 2022-04-13T15:54:39Z

@effigies were you able to replicate this? I just tried locally (with the same random seed) and it completed..

effigies · 2022-04-13T16:05:54Z

Sorry, should have updated. I did try locally and couldn't replicate. If we upload the entire working directory as artifacts, that might be the best bet for tracking it down.

mgxd · 2022-04-14T13:10:07Z

Update - I reran, this time using the fast-track anatomicals provided, and was able to replicate the crash. As predicted, the BOLD mask is awful. After looking into this, I think the failure point is final_boldref_wf.enhance_and_skullstrip_bold_wf.n4_correct (see attachment)

Two things:

N4Correct sets dimensionality to 3, but the input is a 4D file
Shouldn't RobustAverage be producing a single volume?

cc @effigies @oesteban

oesteban · 2022-04-14T13:35:20Z

Wow great job catching it.

Your two hypotheses are correct. The question is what is wrong in the fast track that ends up giving N4 a 4D file (?)

Can you check what are the inputs to final_boldref_wf.enhance_and_skullstrip_bold_wf that change between fast-track / regular-track?

mgxd · 2022-04-14T17:24:13Z

Two things:

N4Correct sets dimensionality to 3, but the input is a 4D file

Shouldn't RobustAverage be producing a single volume?

I think I got confused across working directories, because this didn't seem to be the case after retesting. But I think I have found a simple solution - we have not been passing the already calculated bold mask into the final boldref workflow. I submitted a quick fix (3911638) for the particular pathway our ds210 test takes (ME, SDC), but the others are unaccounted for and thus failing.

mgxd · 2022-04-14T20:59:19Z

@effigies i can't request you as a reviewer since it's your PR, but would appreciate a quick look (if you find some time)

effigies · 2022-04-14T21:02:07Z

fmriprep/workflows/bold/base.py

@@ -993,6 +994,9 @@ def init_func_preproc_wf(bold_file, has_fieldmap=False):
            (bold_hmc_wf, bold_bold_trans_wf, [
                ("outputnode.xforms", "inputnode.hmc_xforms"),
            ]),
+            (initial_boldref_wf, final_boldref_wf, [


I thought the idea of having a second go-round was to get a better mask.

This could be IMHO okay, but to me, the problem is that we do not know yet what's making the fast-track fall into this while the regular track is fine for the given seed.

We first need to bisect the issue and determine the point where inputs and (or) outputs deviate from the regular track.

We may live with this just fine, but if we do this not knowing why the error occurs, the regression will be almost certain.

the (silent) failure point is n4_correct, though from looking at the inputs it is not very clear why one case is failing while the other is fine. I've compiled the working directories of fast-track(ft)/no fast-track(noft) within a tarball (ds210_n4_error.tar.gz) if either of you would like to take a look.

effigies · 2022-04-15T13:56:13Z

@mgxd Very unlikely today, as we'll be wrapping our sprint. I'll have some time on the plane and train, though.

oesteban · 2022-04-15T15:10:20Z

The masks are different. That doesn't justify the crash (as in, it's crazy the mask makes such a big difference for N4), but at least we know where the problem is coming from, right?

Some thoughts that I'm skeptical will fully solve the issue but that together will give more reliability to the full process:

Using the weights input here: https://github.com/nipreps/niworkflows/blob/e1f4267eb5fd878b2a001f184e1bddbb3f4a6843/niworkflows/func/util.py#L526 would be more reliable, can you give it a quick try?
This float setting (and the one on ApplyTransforms later) could also be biting at us: https://github.com/nipreps/niworkflows/blob/master/niworkflows/data/epi_atlasbased_brainmask.json#L4
If we change the N4 setting to weights, then we could also remove this binarization: https://github.com/nipreps/niworkflows/blob/e1f4267eb5fd878b2a001f184e1bddbb3f4a6843/niworkflows/func/util.py#L499

Only when these things have been tested and the outputs of N4 in both conditions are more similar, I would then consider whether we also want to use a prior mask to avoid these steps altogether.

However, these steps are going to happen anyways in the first time around, so we want to make sure the workflow is more reliable.

WDYT?

oesteban · 2022-04-15T20:20:59Z

I'm digging up more on the inputs.

The masks are float32 (which is not great for a mask).
The _average input images are very close to one another, but not exactly the same. It would be good to trace that back to the fast-track divergence.
The _average images should be clipped for N4 to perform appropriately. Currently they have a range between 0 through 12000, with a median of ~6.5. I'm sure reducing the dynamic range to 0-255 would basically compress all those low intensities within one bin, making the job easier for N4.

That all said, there is also appeal in giving SynthStrip a go, and replace all of this if it works.

oesteban · 2022-04-15T20:26:00Z

The _average input images are very close to one another, but not exactly the same. It would be good to trace that back to the fast-track divergence.

Q: Is this happening with single-echo too?

mgxd · 2022-04-15T20:39:22Z

I would guess the difference is because the anatomical derivatives have been run on the high res ds210, whereas we're calculating the T1w/MNI transform using the downsampled outputs (and using --sloppy). I tried testing with the latest ANTs version (2.3.5), but no luck. I'm rerunning now using your suggestions (#2731 (comment))

That all said, there is also appeal in giving SynthStrip a go, and replace all of this if it works.

Yes, it would be good to test. But we'd still want to backport a fix for 21.0.x

Q: Is this happening with single-echo too?

Potentially (see #2761) but I haven't personally replicated / seen it

oesteban · 2022-04-15T20:52:04Z

Are we using FSL 6? Could the reason for these errors bubble up be an upgrade and this code https://github.com/nipreps/niworkflows/blob/e1f4267eb5fd878b2a001f184e1bddbb3f4a6843/niworkflows/func/util.py#L378-L386 introducing weirdness?

Please check some of the ideas in nipreps/niworkflows#707.

I'm pretty positive we want to do the clipping before registration and N4, but don't have time now to have a stab at it.

mgxd · 2022-04-15T20:55:01Z

I just ran with the following changes: nipreps/niworkflows@maint/1.4.x...mgxd:dbg/ds210-failure

and was able to successfully complete, with normal looking reports.

oesteban · 2022-04-15T21:27:58Z

and was able to successfully complete, with normal looking reports.

Yup, that's the weights instead of the mask, I'm sure of that.

I have updated my PR with further changes (esp. removing the binarization and binary dilation for the no-premask pathway).

mgxd · 2022-04-20T17:17:27Z

interestingly, we're now running into

220420-17:06:37,576 nipype.workflow ERROR:
	 Saving crash info to /out/fmriprep/sub-02/log/20220420-161238_95a4bf05-53b1-4a12-965b-3f1ff395c91f/crash-20220420-170637-UID1001-skullstrip_first_pass-4731a275-68da-44b0-8314-e8ce701ce2f9.txt
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 516, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 741, in _run_command
    result = self._interface.run(cwd=outdir)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 428, in run
    runtime = self._run_interface(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/fsl/preprocess.py", line 163, in _run_interface
    runtime = super(BET, self)._run_interface(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 822, in _run_interface
    self.raise_exception(runtime)
  File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 749, in raise_exception
    raise RuntimeError(
RuntimeError: Command:
bet vol0000_xform-00000_clipped_merged_average_corrected.nii vol0000_xform-00000_clipped_merged_average_corrected_brain.nii.gz -f 0.20 -m
Standard output:
/opt/fsl-6.0.5.1/bin/bet failed during command:vol0000_xform-00000_clipped_merged_average_corrected.nii vol0000_xform-00000_clipped_merged_average_corrected_brain.nii.gz -f 0.20 -m
Standard error:
/opt/fsl-6.0.5.1/bin/bet: line 399:  1528 Segmentation fault      (core dumped) ${FSLDIR}/bin/bet2 $IN $OUT $bet2opts
Return code: 1

mgxd · 2022-04-21T18:00:28Z

@effigies @oesteban I think the latest niworkflows changes have fixed this. Are we fine with merging (after removing the random seed) and cutting 21.0.2?

mgxd · 2022-04-21T19:36:52Z

merging since i'd like to get a release in before the weekend

effigies · 2022-04-21T20:44:45Z

Thanks for going ahead. No objections.

CI: Set random seed to match a failing build

eb0b80e

mgxd marked this pull request as ready for review April 14, 2022 20:58

mgxd requested a review from oesteban April 14, 2022 20:58

effigies commented Apr 14, 2022

View reviewed changes

oesteban mentioned this pull request Apr 15, 2022

FIX: Address some reliability issues of the functional masking workflow nipreps/niworkflows#707

Closed

DBG: Test masking workflow updates

fae2c19

mgxd force-pushed the diagnose-ds210-failure branch from 96d3d05 to fae2c19 Compare April 19, 2022 13:11

oesteban mentioned this pull request Apr 19, 2022

FIX: Address some reliability issues of the functional masking workflow nipreps/niworkflows#711

Closed

CI: Increase circle machine limits

04dbcb5

mgxd force-pushed the diagnose-ds210-failure branch from 8905906 to 04dbcb5 Compare April 20, 2022 20:01

mgxd mentioned this pull request Apr 20, 2022

FIX: Improve reliability of BOLD masking workflow nipreps/niworkflows#712

Merged

MAINT: Bump in-house dependencies

2da7d2e

mgxd added this to the 21.0.2 milestone Apr 21, 2022

FIX: Remove fixed seed [skip tests] [skip ds005] [skip ds054]

916022f

mgxd force-pushed the diagnose-ds210-failure branch from 3e5b0f2 to 916022f Compare April 21, 2022 18:15

mgxd merged commit 1ec8149 into nipreps:maint/21.0.x Apr 21, 2022

effigies deleted the diagnose-ds210-failure branch April 26, 2022 00:36

mgxd mentioned this pull request Apr 26, 2022

FIX: Check for empty ACompCor results before trying to rename #2693

Closed

dlevitas mentioned this pull request Aug 24, 2022

Poor functional BOLD brain masks #2836

Closed

mgxd mentioned this pull request Aug 30, 2022

DS210 CI failing intermittently in confounds workflow #2659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagnose ds210 failure #2731

Diagnose ds210 failure #2731

effigies commented Mar 3, 2022

effigies commented Mar 4, 2022

mgxd commented Apr 13, 2022

effigies commented Apr 13, 2022

mgxd commented Apr 14, 2022

oesteban commented Apr 14, 2022

mgxd commented Apr 14, 2022

mgxd commented Apr 14, 2022

effigies Apr 14, 2022

oesteban Apr 14, 2022 •

edited

Loading

mgxd Apr 15, 2022

effigies commented Apr 15, 2022

oesteban commented Apr 15, 2022

oesteban commented Apr 15, 2022

oesteban commented Apr 15, 2022

mgxd commented Apr 15, 2022

oesteban commented Apr 15, 2022 •

edited

Loading

mgxd commented Apr 15, 2022

oesteban commented Apr 15, 2022

mgxd commented Apr 20, 2022

mgxd commented Apr 21, 2022

mgxd commented Apr 21, 2022

effigies commented Apr 21, 2022

Diagnose ds210 failure #2731

Diagnose ds210 failure #2731

Conversation

effigies commented Mar 3, 2022

effigies commented Mar 4, 2022

mgxd commented Apr 13, 2022

effigies commented Apr 13, 2022

mgxd commented Apr 14, 2022

oesteban commented Apr 14, 2022

mgxd commented Apr 14, 2022

mgxd commented Apr 14, 2022

effigies Apr 14, 2022

Choose a reason for hiding this comment

oesteban Apr 14, 2022 • edited Loading

Choose a reason for hiding this comment

mgxd Apr 15, 2022

Choose a reason for hiding this comment

effigies commented Apr 15, 2022

oesteban commented Apr 15, 2022

oesteban commented Apr 15, 2022

oesteban commented Apr 15, 2022

mgxd commented Apr 15, 2022

oesteban commented Apr 15, 2022 • edited Loading

mgxd commented Apr 15, 2022

oesteban commented Apr 15, 2022

mgxd commented Apr 20, 2022

mgxd commented Apr 21, 2022

mgxd commented Apr 21, 2022

effigies commented Apr 21, 2022

oesteban Apr 14, 2022 •

edited

Loading

oesteban commented Apr 15, 2022 •

edited

Loading