Don't use motion-corrupted b0's in topup #158

MichielCottaar · 2019-11-13T14:49:50Z

Here is a proposed piece of code to identify the bad b0's. The new code below extracts all b0's and runs topup (with the configuration file set for quick convergence). After topup each b0 is then scored based on its average square residual from the average b0.

The next step is to decide how to use these b0's in the later analysis. My proposal is to:

Identify the acquisition block with the highest quality initial b0. Merge all the data starting with that acquisition block (rather than starting with whatever acquisition block happened to be the first positive one that the user provided).
Merge the first b0 of the first block with the best b0 out of the corresponding block with opposite phase encoding. Run topup on just these two. Using two b0's should be fine as we already confirmed they are good quality. The advantage of not including b0's from the other blocks is that we get the susceptibility field appropriate for the first b0. Eddy's susceptibility-by-motion can then take care of any changes in the susceptibility field in the other acquisition blocks.
Run eddy as normal
Optionally, revert the reordering done during step 1. Personally, I don't feel this is necessary as the bvecs will be different across all subjects anyway (due to different head rotations).

Feedback on this proposal would be very welcome, before I spend too much time trying to implement it.

glasserm · 2019-11-13T23:28:44Z

I am not following, are we using the first b0 or the best b0 in topup for both phase encoding directions?

MichielCottaar · 2019-11-14T11:49:39Z

Let's distinguish the reference b0, which is the first volume passed on to both topup and eddy, from the other b0. We would choose the reference b0 to be the first b0 of some acquisition block (i.e., the first b0 of 98AP, 98PA, 99AP, or 99PA), so that we don't have to reorder the data within a block. My proposal is to select this reference b0 to be the highest-scoring one from these 4 options.

Once we have selected a reference b0, we can select any b0 with the opposite phase encoding to be the other b0. So for this one, I would propose to select the best b0 even if it is not the first.

glasserm · 2019-11-14T14:20:21Z

ok thta makes sense.

mharms · 2019-11-15T14:49:52Z

Thanks @MichielCottaar for kicking this off.
Some comments, in no particular order:

You wrote:

2\. The advantage of not including b0's from the other blocks is that we get the susceptibility field appropriate for the first b0

That's only really true to the extent that the subject didn't move between the "reference b0" and the 2nd b0 of opposite polarity that gets selected for inclusion in 'topup'. And that can be a long interval under the proposal. The optimal situation for SBM would be to preferentially select the b0 from the end of one run, and the beginning of the following run with opposite polarity, assuming that both are of "good" quality. Does that impact your thinking on which b0's to preferentially select? Alternatively, given that the proposal isn't really optimal for SBM considerations, would we be better off continuing with the existing approach of supplying multiple b0's to 'topup' as a way to help ensure that the topup field estimate is robust? (i.e., supply all the same b0's to topup that we currently do, except simply remove those that are deemed "bad").

We should think beyond the specific HCP-LS acquisition protocol to what would also be optimal for other acquisition protocols -- e.g., protocols that acquire a short scan with a couple b0s of only one polarity, and then just a single dMRI scan of the opposite polarity (e.g., UK-Biobank); or protocols which acquire only two runs. In either case, if we do go with just a single pair of b0's as input to topup, then limiting the selection of the reference b0 to the "best" one that happens to also be first in the run seems overly restrictive (and would likely result in cases where we still end up with a "bad" reference b0). Thus, I'd suggest that we don't necessarily limit the reference b0 to being the first frame in the run. I understand that complicates things a bit, because the reference b0 then needs to be added to the front of the input to 'eddy' (and then possibly removed again when done), but it would be manageable right? (Alternatively, rather than 'eddy' automatically assuming that the first frame is the reference, perhaps an argument could be added that allows one to specify an arbitrary frame as the reference? Which of course, even if Jesper was willing, has its own negatives, in that we then need a new version of 'eddy' for that particular feature)
Are there any licensing issues with adding code originally released with an Apache license 2.0?

MichielCottaar · 2019-11-15T15:32:25Z

For SBM it would indeed be ideal to provide topup with b0's that are well aligned with each other (even if they were not acquired close in time). The scorign algorithm implemented here actually gives us this information, so we could identify the b0's which have the least amount of rotation around the x- or y-axis (which are the problematic axes for SBM). We would have to somehow weight the quality of the b0's and the lack of rotation between them to select the best pair of b0's to use. I'm not sure if it is worth it.

On how many b0's to pass to topup: using only two b0's might be indeed a bit too few unless we want to optimise the SBM. However, I think the main advantage of having more b0's was that if one or two of them were corrupted by motion, they would not affect the final estimate too much. When we preselect good b0's, my understanding is that there is really no reason to go beyond 2 or 3 b0's per phase encode direction.

Putting another b0 in front of the eddy run would be easy to implement. My main reason to be worried about this is eddy's assumption that the reference b0 and the first diffusion-weighted image are well aligned with each other (see discussion on the usage of the --b0_peas flag in eddy). If we put another b0 as the reference at the beginning, then this assumption will not hold. We could also start diffusion-weighted images around, but this will become very complicated very quickly.
My understanding is that Apache license is a very free one and that it is not a problem as long as we satisfy point 4.1-4.3 in the license (https://opensource.org/licenses/Apache-2.0). However, I'm not a lawyer...

coalsont · 2019-11-15T19:37:39Z

The apache license appears to be a permissive license (like BSD or MIT, not copyleft like GPL), and the main reason it is so long is to prevent abuse by patenting code and then copyright licensing it "openly", while keeping the patent private, and later suing people for infringing the patent. I think all we need to do is include the original license notice (and mention if we made changes to the code, and include any related notices if they exist on the original code) on that part of the code and we are covered (we don't need to change any of our licenses).

mharms · 2019-11-18T22:49:04Z

Good point about dont_peas, esp. since our plan (at least currently) is to use that option. But that means that there is essentially an unavoidable “conflict” between using dont_peas and using an “optimized" b0 as the reference space.

I feel that we would benefit from Jesper contributing to this discussion.

VosperFlopstop · 2019-11-19T14:30:44Z

Do we really think this is a real problem? For this to be an issue the first b0 would have to be broken in all 4 blocks.
The issue of the second b0 for topup complicates things a little bit, because we now need a last-first pair to both be non-broken. But the it is not that crucial to have the two b0s for topup to be very close in time. What SBM does is to remove bias by estimating how the field changes when the subject moves. That change can be equally well estimated if one knows the field at "zero" position, as if one doesn't know it. So in the case where we don't know the field "perfectly" because there is movement between the two input images to topup it simply means that there will be a very slight residual distortion (compared to for example a hifi structural), but we are still able to remove any MBS related variance just as well.
Mike, you asked about an option to set an arbitrary scan as ref scan (for example the third b0 in a block). There is already such an (hidden) option. I am just a little wary of using it. What I do there is that I set that as reference after each iteration, which definitely goes a very long way. But within each iteration there is still a difference in how things are done. This is due to some very fundamental choices done early on that would not be easy to change at this stage. I might just be a little too careful, but I prefer to let the first volume be the reference. And as you say, if we want to use --dont_peas that needs to be the "actual" first volume.
In brief, I think Michiel's plan to find which of the 4 blocks have the best first b0 is good. If it turns out that the b0 with opposing PE closest in time is broken, then we go to the next closest. Etc.
Finally, just to keep you up to date with what I am currently doing on this project. I have started doing some profiling of eddy for running slice-to-vol, and it has given some surprising results. It looks like I have already been able to shave 30-40% off the execution time with identical results. I will soon send a new executable to Sri for him to test if he sees the same gain there. I get the 30-40% on our k80s, I get nothing on the GeForce GT 750M on my Mac.

mharms · 2019-11-20T03:51:15Z

My point was that we should think beyond the standard HCP-LS protocol. What about protocols that collect only a pair of scans? Or those like UK Biobank with only a single dMRI scan? In those cases, you only have 2 or even just 1 chance to get a "good" b0 that is also at the start of a run.

Also, I don't think we should necessarily let dont_peas drive our decision making. After all, we were (are) on the fence about using it in the first place. If other considerations, such as using a temporally separated b0 as the reference b0, take on increased importance, then would it be a big loss simply to revert to using PEAS?

Great to hear about the possible substantial decreases in execution time!

VosperFlopstop · 2019-11-21T10:50:30Z

Hi again Mike. I see your point about thinking beyond the HCP-LS project, but I also think we should be careful not to try and solve "every" problem, and in the process end up overcomplicating and delaying the HCP-LS pipelines.
For the more "austere" acquisitions, like for example Biobank, I invariably suggest that people adapt a protocol like [AP-b0x3] [PA-b0x3 dwis b0 dwis b0 ...]. It costs pretty much nothing extra, and it all but guarantees that they will be able to get a good b0 pair to run topup on. I think that there will have to be a limit to how much we bend over backwards to accommodate people who chose to ignore our advice.
As for dropping dont_peas, yes you are right that it wouldn't cost much. The b0<->dwi alignment is much improved in more recent versions of eddy and it is really a toss up as to what is best.

MichielCottaar · 2019-12-03T14:34:59Z

So it seems like we have two options:

Adopt the first b0 of one of the runs as the reference b0. This has the disadvantage that if there are only very few runs (as in some non-HCP datasets), there might not be any good ones.
Adopt the best b0 overall as the reference b0. This would require dropping the --dont_peas flag, because the first dwi will no longer be necessarily aligned with the reference b0.

It sounds like either would work fine. I think both would have a similar complexity to implement. I'm happy to go with the second option, if you still have a preference for it, @mharms?

mharms · 2019-12-04T20:22:46Z

@MichielCottaar: Yes, that is a nice concise summary of the situation. Ultimately I think you, Jesper, Saad, and Stam need to make the decision as to what is "best". That said, I think I have a decent understanding of the issues and uncertainties involved, and based on my synthesis of the issues, I've come to the point where I favor NOT using dont_peas, for the following reasons:

@VosperFlopstop indicated it is a "toss up".
Using PEAS (i.e., not providing dont_peas as a flag) is the "default" for eddy, so using PEAS is parsimonious with the default settings.
Some of the kids are pretty squirmy, so I'm sure that there will be some for whom there is a substantial movement between the b=0 and first dwi, even within the same run. So, rather than having to check the log files for the estimated movement (between the b=0 and the dwis), identifying "large values" (a somewhat arbitrary decision), and then either flagging those subjects in the database (something likely to be missed by users) or re-running them with PEAS enabled, why not just run everyone with PEAS in the first place?
Recall that we ultimately settled on using dont_peas for the HCP-YA processing because that was overall a very compliant young-adult population, so we decided that the small extra uncertainty added by PEAS wasn't worth it. However, HCP-LS is a less compliant population. Further, it sounds like the accuracy of PEAS has improved relative to when we made the decision to use dont_peas for HCP-YA.

So, first I think we need to come to agreement on whether or not we are going to use dont_peas for the HCP-LS processing. If we decide that we will NOT, then it seems like that would bias us toward your Option (2), unless the accuracy of PEAS is itself somewhat dependent on the magnitude of the movement between the "first" b=0 and the first dwi. That's a question for @VosperFlopstop as to whether such an interaction exists. i.e., if the accuracy of PEAS declines as the size of the estimated "movement" increases, then that would incline us back toward your Option (1).

I guess another consideration, that should at least be on the table, is that with Option (1), a user could still choose to use dont_peas, if they wanted to. Whereas with Option (2), the use of dont_peas would be strongly discouraged (and probably should be checked for and outright prohibited).

glasserm · 2019-12-04T20:25:57Z

Sorry if this has already been discussed, but will this impact use of susceptibility with movement interaction correction?

mharms · 2019-12-04T20:36:24Z

No, MBS related variance itself is removed equally well, per #158 (comment)

That said, if the pair of b=0's used for topup have a considerable displacement between them, there will be a "slight residual distortion" (relative to the structurals). But that's the issue of how do we best choose the opposite polarity b=0 to use in topup, which is separate from how we choose the b=0 that defines the reference space.

MichielCottaar · 2020-01-06T17:04:45Z

Debugging this code will probably take a while, but in the meantime more comments are welcome.

I went with the strategy of adopting the best b0 overall out of the positive and the negative volumes (i.e., ignoring that they are not at the start of a series). The best b0 of the positive series is prepended to the full dataset, so that topup and eddy have the same reference b0. This reference b0 is removed again after eddy is finished.

MichielCottaar · 2020-02-14T15:04:40Z

Based on the discussion above I decided to just get the best b0, irrespective of when it was acquired. This best b0 is determined in one of two ways:

If there are many b0's with a specific phase encoding (more than 4), then the b0's are registered to each other using mcflirt (i.e., no distortion correction). The b0 most similar to the average b0 is considered the best one.
If there are fewer b0's with a phase encoding, the b0's are combined with at most 4 b0's with opposite phase encoding and they are aligned using topup. Again, the b0 most similar to the average b0 is considered the best one. This option is only used with a small number of b0's, because otherwise topup is too slow.
If there are many more b0's with one phase encoding, then options 1 and 2 will both be used (one for each phase encoding direction).

After the best b0's are found for each phase encode direction, they are combined for a full run of topup. The best b0 with positive phase encoding is also prepended to the dataset sent to eddy. This can cause misalignment between the first two volumes. To get eddy to successfully register these volumes, I had to decrease the FWHM in the first few iterations of eddy (according to Jesper, these first few iterations should be very quick, so this should not slow down the eddy run).

MichielCottaar · 2020-02-26T10:33:20Z

@glasserm @mharms @VosperFlopstop: do any of you have any feedback on this update to the pipeline? Given that we prepend a new b0 to the data given to eddy, it might be prudent to test this on a few more subjects to check whether eddy reliably converges. But it would be good to get some feedback before running larger scale tests...

glasserm · 2020-02-26T12:35:33Z

It sounds good to me.

mharms · 2020-04-06T20:08:47Z

We are probably a couple months out on having processing resources available to begin the dMRI processing, so now is a good time for us to (re)engage on this with @MichielCottaar and @VosperFlopstop. Toward that goal, some Q's for @MichielCottaar (noting that I haven't reviewed the code in any detail yet):

In your approach (1) (Don't use motion-corrupted b0's in topup #158 (comment)), if both the "pos" and "neg" polarities have > 4 b0's, then is the "best" one of each polarity determined completely independently of the other polarity? Why is that the choice, rather than say choosing one polarity, and then the "best" from the other polarity that matches that one?
It appears that the new approach replaces the previous code entirely. It seems it might be preferable if it was instead implemented as an option, so as to preserve backwards compatibility with the previous way of processing the data... As part of this option, we could check if dont_peas is also set, and if so, either generate a "warning" or simply abort outright.
Given that a "bad" b0 may itself badly corrupt the average, what do you think of the idea of iterating the process until the "average" is itself stable? (This would require having some threshold, itself potentially a settable parameter, of what defines a "bad" match to the average. Do we have a handle on what that threshold would be, or are we only capable of operating at the other end of selecting the "best"?)

DiffusionPreprocessing/DiffPreprocPipeline_Eddy.sh

DiffusionPreprocessing/scripts/run_eddy.sh

MichielCottaar · 2020-04-07T09:37:46Z

Replies to @mharms comments below:

1. In your approach (1) ([#158 (comment)](https://github.com/Washington-University/HCPpipelines/pull/158#issuecomment-586326388)), if both the "pos" and "neg" polarities have > 4 b0's, then is the "best" one of each polarity determined completely independently of the other polarity?  Why is that the choice, rather than say choosing one polarity, and then the "best" from the other polarity that matches that one?

I'm not sure what your proposal is here, but there is no matching of b0's between the two polarities irrespective of the number of b0's. The goal is to identify undistorted b0's, not b0's with opposite polarities that "match" well. If some polarity has enough b0's, I can safely assume the avearge b0 at that polarity will be undistorted and hence use that as a reference to identify the undistorted b0's. So, there is no real advantage of including the b0's with opposite polarities in that case.

2. It appears that the new approach _replaces_ the previous code entirely.  It seems it might be preferable if it was instead implemented as an _option_, so as to preserve backwards compatibility with the previous way of processing the data...  As part of this option, we could check if `dont_peas` is also set, and if so, either generate a "warning" or simply abort outright.

This makes sense to me. I've added to my TODO list to put the old code back in.

3. Given that a "bad" b0 may itself badly corrupt the average, what do you think of the idea of iterating the process until the "average" is itself stable?  (This would require having some threshold, itself potentially a settable parameter, of what defines a "bad" match to the average.  Do we have a handle on what that threshold would be, or are we only capable of operating at the other end of selecting the "best"?)

This also sounds reasonable. I'll have a look at the distribution of the similarities to the average b0 to see if we can define a reasonable(~3 sigma) threshold to get rid of distorted b0's that are so different that they would significantly bias the average b0.

mharms · 2020-04-07T18:07:23Z

@MichielCottaar: My point in (1) was the following: Once we've identified the best "positive" b0, why not pick the best "negative" b0 against that positive b0, rather than against the average of the negative b0's? The best positive b0 should be uncorrupted, and thus the negative b0 that best matches that should also be uncorrupted, thus achieving that goal. The rationale for this proposal was that it would tend to also pick the negative b0 that was close in space (i.e., little movement) to the positive (reference) b0. Admittedly, @VosperFlopstop indicated above that this consideration shouldn't be seen as critical (#158 (comment)), but all else being equal, isn't it still preferable to have a pos/neg pair for topup that are both uncorrupted and close in spatial position?

MichielCottaar · 2020-04-08T12:28:13Z

@mharms: Hmm, I don't see a straightforward way to do that. For example, if we have N b0's with Pos polarity, and already found the best b0 with Neg polarity, would we just put all of those together and run topup. The resulting average would be completely dominated by the N b0's with Pos polarity. We could also run topup N times with just one of the Pos b0's, but in either case this might take a very long time if N is large. And I'm not even sure the result would be actually better. If there is still some minor corruption in the best Neg b0 identified, we might bias our search of the Pos b0's to have the same corruption, which might bias our estimated susceptibility field more than if we had just identified the best Pos b0 independently.

If we want to encourage a similar head position in our final pos/neg pair, I think it would be better to just make this an explicit part of the selection procedure. So keep the rating procedure we have now to identify the uncorrupted images, but when selecting the final pair also take into account that we want a similar head orientation. I'm not sure how to weight these two goals, but it should be possible to come up with some trade-off (i.e., how much more corrupted can the image be to get a better alignment in head position by N degrees). Still I think this would be quite a bit of work with little extra gain...

mharms · 2020-04-08T15:19:05Z

Hmm. I didn't think through the practical issues/challenges of how one would actually implement my proposal of selecting the "best" (uncorrupted by motion) Pos and Neg b0, while also selecting a pair close in space. Input from @VosperFlopstop would be nice here again. i.e., I'd feel assured about the proposed approach if he signs off that we simply shouldn't worry (and/or that it simply isn't practical) about getting a b0 pair that is close in space.

VosperFlopstop · 2020-04-15T12:04:54Z

Hi guys,

sorry for the delay. I think it is sufficient to pick the best b0 for each PE-direction individually. Unless the subject has moved massively, or for some reason has left the scanner, the errors will be very small.

If you were to pursue the strategy of finding the "best pair" I would suggest using the current strategy to pick a set of b0 for each PE-direction that are all "good", then to put all of those b0s into a 4D file that you run topup on with the test_acqp.cnf configuration file. That configuration file runs just the early, low resolution, steps of the full topup and is intended mainly for checking that ones acqparams.txt file is correct before running the full, and more time consuming, topup.

But importantly for your case the movement parameters are well estimated already by these steps, and typically change very little if/when running to the full resolution. So that means you have a way to determine the relative positions of all the b0s in there, and use that to pick a "best pair" of Pos and Neg.

I hope that addresses your questions?

mharms · 2020-06-19T23:16:47Z

@MichielCottaar Could you please go through and "Resolve conversation" on all the items where you either accepted my suggestions, or is simply no longer relevant? Thx.

MichielCottaar · 2020-06-21T20:21:25Z

I believe these changes address all of your comments. Let me know if anything else comes up.

DiffusionPreprocessing/DiffPreprocPipeline.sh

DiffusionPreprocessing/DiffPreprocPipeline_PostEddy.sh

DiffusionPreprocessing/DiffPreprocPipeline_PreEddy.sh

DiffusionPreprocessing/scripts/basic_preproc_best_b0.sh

DiffusionPreprocessing/scripts/eddy_postproc.sh

global/scripts/select_dwi_vols

mharms

I did another relatively quick skim, and made some minor comments/suggestions. I imagine that @glasserm will want to review this next. Not sure about @coalsont

Co-authored-by: Michael Harms <mharms@wustl.edu>

MichielCottaar · 2020-06-24T09:08:59Z

This version of the code succesfully ran on jalapeno (or more technically the version a few days ago, but we have only tweaked the comments & error messages since, so it should still be fine). The error message of combining the --select-best-bo and --extra-eddy-arg=--dont-peas flags also works.

glasserm · 2020-06-26T00:43:36Z

Looks good.

MichielCottaar · 2020-06-29T07:58:48Z

Are we waiting for anyone else to review this?

glasserm · 2020-06-29T12:38:40Z

I merged it

MichielCottaar force-pushed the enh_bad_b0s branch from 034a86d to e372e60 Compare January 6, 2020 16:47

MichielCottaar force-pushed the enh_bad_b0s branch from 60a5a8e to 78564e7 Compare February 14, 2020 14:27

MichielCottaar marked this pull request as ready for review February 14, 2020 15:00

mharms reviewed Apr 6, 2020

View reviewed changes

DiffusionPreprocessing/DiffPreprocPipeline_Eddy.sh Outdated Show resolved Hide resolved

mharms reviewed Apr 6, 2020

View reviewed changes

DiffusionPreprocessing/scripts/run_eddy.sh Outdated Show resolved Hide resolved

ENH: Score each b0

64751ce

Michiel Cottaar added 5 commits June 20, 2020 10:36

Log message indicating what is being run

d53d784

Fixed typo

b0a458c

Clarify first volume will be removed again

82c7e0d

Support b0maxbval in basic_preproc_best_b0.sh script

6dc1a49

Removed fsl_sub again

6777a5b

mharms reviewed Jun 22, 2020

View reviewed changes

DiffusionPreprocessing/DiffPreprocPipeline.sh Outdated Show resolved Hide resolved

mharms reviewed Jun 22, 2020

View reviewed changes

DiffusionPreprocessing/DiffPreprocPipeline.sh Outdated Show resolved Hide resolved