Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

tsalo · 2019-01-06T14:29:11Z

@emdupre and I noticed an odd for loop in the component selection function in emdupre#16, wherein a copy of the set of accepted components (at that stage) is reduced a total of three times and is then used later in the function. We included a note to look into this later, so I thought I would open an issue for it.

First and foremost, we want to make sure that this isn't a hardcoded reference to the number of echoes. If it isn't, then it would be nice to know what it is supposed to be.

tedana/tedana/selection/select_comps.py

Lines 198 to 204 in 1bc32e4

    
           # NOTE: We're not sure why this is done, nor why it's specifically done 
        
           # three times. Need to look into this deeper, esp. to make sure the 3 
        
           # isn't a hard-coded reference to the number of echoes. 
        
           for nn in range(3): 
        
               ncls = comptable.loc[ncls].loc[ 
        
                   comptable.loc[ 
        
                       ncls, 'variance explained'].diff() < varex_upper_p].index.values

handwerkerd · 2019-03-25T19:41:33Z

I'll have to look into this more systematically, but, in the earlier version https://bitbucket.org/prantikk/me-ica/src/8cc47cfed0203b3d6d187935ad3c2823b3e36a88/meica.libs/select_model.py?at=master&fileviewer=file-view-default
it's labeled as "Not(e) outlier variance"
That said, I still can't figure out exactly what it's doing without running & checking exactly what's in these values. Here's my current guess. It's taking a list of components that have not yet been rejected. It is finding the difference in variance between neighboring components (I'm assuming they are in order from highest to lowest variance). For each iteration of the loop, it removes the highest variance component remaining and it also removes any components where the difference in variance between a component & the next component is greater than the median variance of the easily accepted kappa components.
That would fit the label of outlier detection and it doesn't seem like the three repetitions have anything to do with the number of echoes, but the consistent removal of the three highest variance remaining components seems arbitrary. Assuming that a relatively large jump in variance from a sorted list of components is bad (without consideration of kappa or rho values) also seems non-ideal.

tsalo · 2019-03-26T11:21:44Z

Ah, that makes sense. Thank you!

If that's the case, then we have a problem due to the fact that the component table is sorted within fitmodels_direct based on Kappa, not variance explained. The Kappa-sorting and the for loop both trace back to 2014, so I couldn't figure out if the sorting was added after the for loop (which would make sense if these two steps are conflicting).

tsalo added the question issues detailing questions about the project or its direction label Jan 6, 2019

emdupre added this to the transparent and reproducible processing milestone Jan 14, 2019

handwerkerd mentioned this issue Mar 29, 2019

Ambiguous error during "spatial clustering of components" step #181

Closed

tsalo mentioned this issue Apr 20, 2019

[FIX] Sort comptable by varex before identifying outlier components #261

Closed

tsalo mentioned this issue May 23, 2019

[FIX] Sort comptable by varex before identifying outlier components #295

Merged

tsalo closed this as completed in #295 May 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

tsalo commented Jan 6, 2019

handwerkerd commented Mar 25, 2019

tsalo commented Mar 26, 2019

Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

Comments

tsalo commented Jan 6, 2019

handwerkerd commented Mar 25, 2019

tsalo commented Mar 26, 2019