Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

Closed
tsalo opened this issue Jan 6, 2019 · 2 comments · Fixed by #295
Closed

Determine rationale for for loop in Step 2 of selcomps (v2.5) #176

tsalo opened this issue Jan 6, 2019 · 2 comments · Fixed by #295
Labels
question issues detailing questions about the project or its direction

Comments

@tsalo
Copy link
Member

tsalo commented Jan 6, 2019

@emdupre and I noticed an odd for loop in the component selection function in emdupre#16, wherein a copy of the set of accepted components (at that stage) is reduced a total of three times and is then used later in the function. We included a note to look into this later, so I thought I would open an issue for it.

First and foremost, we want to make sure that this isn't a hardcoded reference to the number of echoes. If it isn't, then it would be nice to know what it is supposed to be.

# NOTE: We're not sure why this is done, nor why it's specifically done
# three times. Need to look into this deeper, esp. to make sure the 3
# isn't a hard-coded reference to the number of echoes.
for nn in range(3):
ncls = comptable.loc[ncls].loc[
comptable.loc[
ncls, 'variance explained'].diff() < varex_upper_p].index.values

@tsalo tsalo added the question issues detailing questions about the project or its direction label Jan 6, 2019
@handwerkerd
Copy link
Member

I'll have to look into this more systematically, but, in the earlier version https://bitbucket.org/prantikk/me-ica/src/8cc47cfed0203b3d6d187935ad3c2823b3e36a88/meica.libs/select_model.py?at=master&fileviewer=file-view-default
it's labeled as "Not(e) outlier variance"
That said, I still can't figure out exactly what it's doing without running & checking exactly what's in these values. Here's my current guess. It's taking a list of components that have not yet been rejected. It is finding the difference in variance between neighboring components (I'm assuming they are in order from highest to lowest variance). For each iteration of the loop, it removes the highest variance component remaining and it also removes any components where the difference in variance between a component & the next component is greater than the median variance of the easily accepted kappa components.
That would fit the label of outlier detection and it doesn't seem like the three repetitions have anything to do with the number of echoes, but the consistent removal of the three highest variance remaining components seems arbitrary. Assuming that a relatively large jump in variance from a sorted list of components is bad (without consideration of kappa or rho values) also seems non-ideal.

@tsalo
Copy link
Member Author

tsalo commented Mar 26, 2019

Ah, that makes sense. Thank you!

If that's the case, then we have a problem due to the fact that the component table is sorted within fitmodels_direct based on Kappa, not variance explained. The Kappa-sorting and the for loop both trace back to 2014, so I couldn't figure out if the sorting was added after the for loop (which would make sense if these two steps are conflicting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question issues detailing questions about the project or its direction
Projects
None yet
3 participants