-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REF, DOC] Reorganize selcomps and fitmodels_direct #247
Conversation
1. Replace “acc” with “unclf” (unclassified) 2. Eliminate unnecessary intersections between - e.g., candartA, candartB, ign_add0 3. Clean up unnecessarily complicated indexing with .loc 4. Fix one use of d_table_score that should be d_table_score_scrub 5. Use mean of ranks instead of sum for decision table scores, which lets us drop the n_decision_metrics variable 6. Add kappa ratio score to component table 7. Sort component table by variance explained when identifying varex outliers
- Add new `selection.manual_selection` function. - Allow manacc to be a list instead of just a string with commas. - Switch order of outputs from `fitmodels_direct` so `comptable` is first. - Improve handling and logging of interactions between mixm, ctab, and manacc.
- Rename selcomps to kundu_selection_v2_5 - Simplify arguments for kundu_selection_v2_5 - Simplify arguments for fitmodels_direct - Add function with metrics from selcomps to model.fit (kundu_metrics)
Codecov Report
@@ Coverage Diff @@
## master #247 +/- ##
==========================================
+ Coverage 46.11% 49.19% +3.08%
==========================================
Files 33 37 +4
Lines 2045 2120 +75
==========================================
+ Hits 943 1043 +100
+ Misses 1102 1077 -25
Continue to review full report at Codecov.
|
Also fix up docstrings and eliminate step where component number is set as a column. Always keep it as the index of the DataFrame.
* Speed up spatclust function. This also clusters positive and negative values separately. * Rename spatclust to threshold_map and add binarize/sided arguments. * Replace manually generated binary structure with function-made one.
# Conflicts: # tedana/model/__init__.py # tedana/model/fit.py # tedana/selection/select_comps.py # tedana/workflows/tedana.py
# Conflicts: # tedana/decomposition/eigendecomp.py # tedana/io.py # tedana/model/fit.py # tedana/viz.py # tedana/workflows/tedana.py
I think this PR is ready for review. |
- ste —> source_tes - ‘ste’ —> ‘paid’ - Drop PAID support from tedana workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave a comment saying what bug was squished, and when it was introduced, unless you introduced it downstream in this PR?
Per the call today, I'm willing to help split these into separate branches with @tsalo listed as the author if we think this group of changes should be split into multiple PRs. From a quick examination, it seems like it would make sense to open the following as separate PRs:
I realize that's a lot, but this is a really big PR and I think changes of this scale will be more easily digested in smaller chunks, plus it will help us understand which parts are related to each other. Any thoughts, @emdupre and @tsalo ? |
I've split off most of the self-contained changes from this PR into new ones. The only changes that I haven't done over in new branches are the large-scale reorganizations of selcomps, fitmodels_direct, and eigendecomp (i.e., the changes proposed in the first comment that haven't been struck through). I will open a PR with that larger reorganization once all of these smaller PRs have been dealt with, to avoid annoying merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tedana/decomposition/eigendecomp.py
Outdated
def tedpca(catd, OCcatd, combmode, mask, t2s, t2sG, | ||
ref_img, tes, method='mle', ste=-1, kdaw=10., rdaw=1., | ||
verbose=False): | ||
out_dir='.', verbose=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the docstring to reference this new argument ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in #265.
tedana/decomposition/eigendecomp.py
Outdated
|
||
Returns | ||
------- | ||
n_components : :obj:`int` | ||
Number of components retained from PCA decomposition | ||
dd : (S x T) :obj:`numpy.ndarray` | ||
kept_data : (S x T) :obj:`numpy.ndarray` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable name doesn't seem to match with the description immediately below it. Can we sync these ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I caught this a little after the fact. It's fixed in #265.
tedana/decomposition/eigendecomp.py
Outdated
reindex=False, mmixN=vTmixN, method=None, | ||
label='mepca_', out_dir=out_dir, verbose=verbose) | ||
# varex_norm overrides normalized varex computed by dependence_metrics | ||
comptable['real normalized variance explained'] = varex_norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'real normalized variance explained' makes it sound like there's a 'fake' one 😆 Maybe we should consider updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I reconsidered that wording a bit later. In #266 I change it to "original normalized variance explained." I can change that to something better though, if you want. I just want to retain both versions, for now, until we can validate fitmodels_direct
.
I would argue, though, that the one estimated in fitmodels_direct
is kind of fake, since it doesn't match the one that comes directly from the PCA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that should really go in #265. Pushing fixed name now.
Whether to sort components in descending order by Kappa. Default: False | ||
mmixN : (T x C) array_like, optional | ||
Z-scored mixing matrix. Default: None | ||
method : {'kundu_v2', 'kundu_v3', None}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't added back in v3, right ? Should we include it here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We calculate some metric maps that are only used in v3.2, like WTS
, tsoc_B
, PSC
, and F_S0_maps
. Those maps are currently sort of vestigial, but I didn't want to remove them completely in case we do add v3 back in.
't2s_full array {1}'.format(t2s.shape, | ||
t2s_full.shape)) | ||
elif not (catd.shape[2] == tsoc.shape[1] == mmix.shape[0]): | ||
raise ValueError('Third dimension (number of volumes) of catd ({0}), ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify this to explain that all formatted values should be the number of volumes -- since that's setting the dimensionality of the other two ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like "Number of volumes of catd..., etc. do not match"?
return comptable, seldict, betas, mmix_new | ||
|
||
|
||
def kundu_metrics(comptable, metric_maps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about not yet having v3 incorporated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an eventual plan to put it in there, it might not hurt to stub it and place a %TODO
under it, at least in part so there's a prototype for how you would slot in a new method after this refactor.
Closes #214 and references #16, #84, #166. This is a substantial refactor of
selcomps
andfitmodels_direct
. I'm happy to split these changes up into multiple PRs to prevent bloat, but for the most part I feel that these changes fit together pretty well.Changes proposed in this pull request:
Rename selcomps to kundu_selection_v2([REF] Reorganize selcomps and fitmodels_direct #266)Simplify arguments for kundu_selection_v2([REF] Reorganize selcomps and fitmodels_direct #266)Split manual selection portion of selcomps into new selection.manual_selection function.([REF] Reorganize selcomps and fitmodels_direct #266)Add support for manual rejection as well, although this is not yet supported in the actual workflow or CLI.([REF] Reorganize selcomps and fitmodels_direct #266)Rename ICA component selection from([REF] Reorganize selcomps and fitmodels_direct #266)select_comps.py
totedica.py
Replace "acc" with "unclf" (unclassified)([REF, DOC] Document and refactor selcomps #262)Eliminate unnecessary intersections between arrays (e.g., candartA, candartB, ign_add0)([REF, DOC] Document and refactor selcomps #262)Clean up unnecessarily complicated indexing with .loc([REF, DOC] Document and refactor selcomps #262)Fix one use of d_table_score that should be d_table_score_scrubUse mean of ranks instead of sum for decision table scores, which lets us drop the n_decision_metrics variable([REF, DOC] Document and refactor selcomps #262)Add kappa ratio score to component table([REF, DOC] Document and refactor selcomps #262)Sort component table by variance explained when identifying varex outliersEliminate unnecessary ordinal ranking within decision metric table.([REF, DOC] Document and refactor selcomps #262)Allow manacc to be a list instead of just a string with commas. This makes it easier to call it in Python.([ENH] Improve manual component selection #263)Retain original classification and rationale in separate columns if manacc is used.([ENH] Improve manual component selection #263)Add out_dir argument to tedpca.([REF] Split eigendecomp into ICA and PCA files #265)Move([REF] Split eigendecomp into ICA and PCA files #265)kundu_tedpca
fromdecomposition/eigendecomp.py
toselection/tedpca.py
Split([REF] Split eigendecomp into ICA and PCA files #265)decomposition/eigendecomp.py
intodecomposition/pca.py
anddecomposition/ica.py
Rename to dependence_metrics.([REF] Reorganize selcomps and fitmodels_direct #266)Replace full_sel with method (can be([REF] Reorganize selcomps and fitmodels_direct #266)'kundu_v2'
,'kundu_v3'
, orNone
). Now the contents of seldict will vary based on the method requested. Plus, full_sel is described as involving selection, even though no selection was done in fitmodels_direct.Switch order of outputs so comptable is first (since seldict is empty sometimes).([REF] Reorganize selcomps and fitmodels_direct #266)Split Kundu metrics (currently in selcomps) into separate function within model.fit (kundu_metrics).([REF] Reorganize selcomps and fitmodels_direct #266)Simplify arguments for fitmodels_direct. If we feed in optimally combined data as an argument, there's no need to compute it within the function and we can drop([REF] Reorganize selcomps and fitmodels_direct #266)t2s_full
andcombmode
.Improve handling and logging of interactions between mixm, ctab, and manacc.([ENH] Improve manual component selection #263)Allow users to use files from existing directory.([ENH] Improve manual component selection #263)copyfile
gets mad if you try to copy a file to itself, which made it difficult in the past to re-runtedana
with manually selected components.Argument ste —> source_tes([REF, DOC] Document PAID combination method #264)Value ‘ste’ —> ‘paid’([REF, DOC] Document PAID combination method #264)Drop PAID support from tedana workflow([REF, DOC] Document PAID combination method #264)