[REF, DOC] Reorganize selcomps and fitmodels_direct #247

tsalo · 2019-04-03T11:56:34Z

Closes #214 and references #16, #84, #166. This is a substantial refactor of selcomps and fitmodels_direct. I'm happy to split these changes up into multiple PRs to prevent bloat, but for the most part I feel that these changes fit together pretty well.

Changes proposed in this pull request:

Within selcomps:
- ~~Rename selcomps to kundu_selection_v2~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Simplify arguments for kundu_selection_v2~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Split manual selection portion of selcomps into new selection.manual_selection function.~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Add support for manual rejection as well, although this is not yet supported in the actual workflow or CLI.~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Rename ICA component selection from select_comps.py to tedica.py~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Replace "acc" with "unclf" (unclassified)~~ ([REF, DOC] Document and refactor selcomps #262)
- ~~Eliminate unnecessary intersections between arrays (e.g., candartA, candartB, ign_add0)~~ ([REF, DOC] Document and refactor selcomps #262)
- ~~Clean up unnecessarily complicated indexing with .loc~~ ([REF, DOC] Document and refactor selcomps #262)
- ~~Fix one use of d_table_score that should be d_table_score_scrub~~
- ~~Use mean of ranks instead of sum for decision table scores, which lets us drop the n_decision_metrics variable~~ ([REF, DOC] Document and refactor selcomps #262)
- ~~Add kappa ratio score to component table~~ ([REF, DOC] Document and refactor selcomps #262)
- ~~Sort component table by variance explained when identifying varex outliers~~
- ~~Eliminate unnecessary ordinal ranking within decision metric table.~~ ([REF, DOC] Document and refactor selcomps #262)
- ~~Allow manacc to be a list instead of just a string with commas. This makes it easier to call it in Python.~~ ([ENH] Improve manual component selection #263)
- ~~Retain original classification and rationale in separate columns if manacc is used.~~ ([ENH] Improve manual component selection #263)
Within eigendecomp:
- ~~Add out_dir argument to tedpca.~~ ([REF] Split eigendecomp into ICA and PCA files #265)
- ~~Move kundu_tedpca from decomposition/eigendecomp.py to selection/tedpca.py~~ ([REF] Split eigendecomp into ICA and PCA files #265)
- ~~Split decomposition/eigendecomp.py into decomposition/pca.py and decomposition/ica.py~~ ([REF] Split eigendecomp into ICA and PCA files #265)
Within fitmodels_direct:
- ~~Rename to dependence_metrics.~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- Replace full_sel with method (can be 'kundu_v2', 'kundu_v3', or None). Now the contents of seldict will vary based on the method requested. Plus, full_sel is described as involving selection, even though no selection was done in fitmodels_direct. ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Switch order of outputs so comptable is first (since seldict is empty sometimes).~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Split Kundu metrics (currently in selcomps) into separate function within model.fit (kundu_metrics).~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
- ~~Simplify arguments for fitmodels_direct. If we feed in optimally combined data as an argument, there's no need to compute it within the function and we can drop t2s_full and combmode.~~ ([REF] Reorganize selcomps and fitmodels_direct #266)
Within tedana workflow:
- ~~Improve handling and logging of interactions between mixm, ctab, and manacc.~~ ([ENH] Improve manual component selection #263)
- ~~Allow users to use files from existing directory. copyfile gets mad if you try to copy a file to itself, which made it difficult in the past to re-run tedana with manually selected components.~~ ([ENH] Improve manual component selection #263)
- ~~Argument ste —> source_tes~~ ([REF, DOC] Document PAID combination method #264)
- ~~Value ‘ste’ —> ‘paid’~~ ([REF, DOC] Document PAID combination method #264)
- ~~Drop PAID support from tedana workflow~~ ([REF, DOC] Document PAID combination method #264)

1. Replace “acc” with “unclf” (unclassified) 2. Eliminate unnecessary intersections between - e.g., candartA, candartB, ign_add0 3. Clean up unnecessarily complicated indexing with .loc 4. Fix one use of d_table_score that should be d_table_score_scrub 5. Use mean of ranks instead of sum for decision table scores, which lets us drop the n_decision_metrics variable 6. Add kappa ratio score to component table 7. Sort component table by variance explained when identifying varex outliers

- Add new `selection.manual_selection` function. - Allow manacc to be a list instead of just a string with commas. - Switch order of outputs from `fitmodels_direct` so `comptable` is first. - Improve handling and logging of interactions between mixm, ctab, and manacc.

- Rename selcomps to kundu_selection_v2_5 - Simplify arguments for kundu_selection_v2_5 - Simplify arguments for fitmodels_direct - Add function with metrics from selcomps to model.fit (kundu_metrics)

codecov · 2019-04-03T12:19:26Z

Codecov Report

Merging #247 into master will increase coverage by 3.08%.
The diff coverage is 41.44%.

@@            Coverage Diff             @@
##           master     #247      +/-   ##
==========================================
+ Coverage   46.11%   49.19%   +3.08%     
==========================================
  Files          33       37       +4     
  Lines        2045     2120      +75     
==========================================
+ Hits          943     1043     +100     
+ Misses       1102     1077      -25

Impacted Files	Coverage Δ
tedana/model/__init__.py	`100% <ø> (ø)`	⬆️
tedana/workflows/tedana.py	`12.77% <0%> (-1.88%)`	⬇️
tedana/tests/test_selection.py	`100% <100%> (ø)`
tedana/tests/test_model_fit_kundu_metrics.py	`100% <100%> (ø)`
tedana/selection/__init__.py	`100% <100%> (ø)`	⬆️
tedana/selection/_utils.py	`25% <100%> (+10.71%)`	⬆️
tedana/decomposition/__init__.py	`100% <100%> (ø)`	⬆️
tedana/workflows/t2smap.py	`67.1% <100%> (ø)`	⬆️
tedana/tests/test_model_fit_dependence_metrics.py	`100% <100%> (ø)`
tedana/tests/test_combine.py	`100% <100%> (ø)`	⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82447c5...b1cd29d. Read the comment docs.

Also fix up docstrings and eliminate step where component number is set as a column. Always keep it as the index of the DataFrame.

* Speed up spatclust function. This also clusters positive and negative values separately. * Rename spatclust to threshold_map and add binarize/sided arguments. * Replace manually generated binary structure with function-made one.

# Conflicts: # tedana/model/__init__.py # tedana/model/fit.py # tedana/selection/select_comps.py # tedana/workflows/tedana.py

# Conflicts: # tedana/decomposition/eigendecomp.py # tedana/io.py # tedana/model/fit.py # tedana/viz.py # tedana/workflows/tedana.py

tsalo · 2019-04-05T15:12:18Z

I think this PR is ready for review.

- ste —> source_tes - ‘ste’ —> ‘paid’ - Drop PAID support from tedana workflow

jbteves

Could you leave a comment saying what bug was squished, and when it was introduced, unless you introduced it downstream in this PR?

jbteves · 2019-04-19T17:04:49Z

Per the call today, I'm willing to help split these into separate branches with @tsalo listed as the author if we think this group of changes should be split into multiple PRs. From a quick examination, it seems like it would make sense to open the following as separate PRs:

Replace 'acc' with 'unclf'
Clean up unnecessary intersections
Get rid of ordinal ranking
Split Kundu v2.5 metrics into separate function
Clean up comptable handling
Split eigendecomp into ica and pca
Assorted documentation enhancements/corrections

I realize that's a lot, but this is a really big PR and I think changes of this scale will be more easily digested in smaller chunks, plus it will help us understand which parts are related to each other. Any thoughts, @emdupre and @tsalo ?

tsalo · 2019-04-20T17:23:40Z

I've split off most of the self-contained changes from this PR into new ones. The only changes that I haven't done over in new branches are the large-scale reorganizations of selcomps, fitmodels_direct, and eigendecomp (i.e., the changes proposed in the first comment that haven't been struck through). I will open a PR with that larger reorganization once all of these smaller PRs have been dealt with, to avoid annoying merge conflicts.

emdupre

I know this review is old but I'm mostly posting it for myself ! I had been trying to make my way through -- I think smaller PRs will definitely help !! Thanks a lot @jbteves and @tsalo 💖

emdupre · 2019-04-11T18:05:42Z

tedana/decomposition/eigendecomp.py

 def tedpca(catd, OCcatd, combmode, mask, t2s, t2sG,
           ref_img, tes, method='mle', ste=-1, kdaw=10., rdaw=1.,
-           verbose=False):
+           out_dir='.', verbose=False):


Can we update the docstring to reference this new argument ?

This is fixed in #265.

emdupre · 2019-04-11T18:06:19Z

tedana/decomposition/eigendecomp.py


    Returns
    -------
    n_components : :obj:`int`
        Number of components retained from PCA decomposition
-    dd : (S x T) :obj:`numpy.ndarray`
+    kept_data : (S x T) :obj:`numpy.ndarray`


This variable name doesn't seem to match with the description immediately below it. Can we sync these ?

I caught this a little after the fact. It's fixed in #265.

emdupre · 2019-04-11T18:26:32Z

tedana/decomposition/eigendecomp.py

+                    reindex=False, mmixN=vTmixN, method=None,
+                    label='mepca_', out_dir=out_dir, verbose=verbose)
+        # varex_norm overrides normalized varex computed by dependence_metrics
+        comptable['real normalized variance explained'] = varex_norm


'real normalized variance explained' makes it sound like there's a 'fake' one 😆 Maybe we should consider updating.

Yeah, I reconsidered that wording a bit later. In #266 I change it to "original normalized variance explained." I can change that to something better though, if you want. I just want to retain both versions, for now, until we can validate fitmodels_direct.

I would argue, though, that the one estimated in fitmodels_direct is kind of fake, since it doesn't match the one that comes directly from the PCA.

Whoops, that should really go in #265. Pushing fixed name now.

emdupre · 2019-04-11T18:39:09Z

tedana/model/fit.py

+        Whether to sort components in descending order by Kappa. Default: False
+    mmixN : (T x C) array_like, optional
+        Z-scored mixing matrix. Default: None
+    method : {'kundu_v2', 'kundu_v3', None}, optional


We haven't added back in v3, right ? Should we include it here ?

We calculate some metric maps that are only used in v3.2, like WTS, tsoc_B, PSC, and F_S0_maps. Those maps are currently sort of vestigial, but I didn't want to remove them completely in case we do add v3 back in.

emdupre · 2019-04-11T18:41:08Z

tedana/model/fit.py

-                         't2s_full array {1}'.format(t2s.shape,
-                                                     t2s_full.shape))
+    elif not (catd.shape[2] == tsoc.shape[1] == mmix.shape[0]):
+        raise ValueError('Third dimension (number of volumes) of catd ({0}), '


Can we clarify this to explain that all formatted values should be the number of volumes -- since that's setting the dimensionality of the other two ?

So something like "Number of volumes of catd..., etc. do not match"?

emdupre · 2019-04-11T18:42:35Z

tedana/model/fit.py

+    return comptable, seldict, betas, mmix_new
+
+
+def kundu_metrics(comptable, metric_maps):


Same comment about not yet having v3 incorporated.

If there's an eventual plan to put it in there, it might not hurt to stub it and place a %TODO under it, at least in part so there's a prototype for how you would slot in a new method after this refactor.

tsalo added 9 commits March 28, 2019 19:47

Eliminate unnecessary ordinal ranking.

1908ced

Replace "full_sel" with "method" in fitmodels_direct.

229a60e

Add out_dir argument to tedpca.

6c2a759

Split Kundu metrics into separate function.

a39a105

- Rename selcomps to kundu_selection_v2_5 - Simplify arguments for kundu_selection_v2_5 - Simplify arguments for fitmodels_direct - Add function with metrics from selcomps to model.fit (kundu_metrics)

Fix a bunch of bugs I managed to introduce.

20b6eff

Oh yeah, we have tests.

be4c39e

Style!

b0a9c4e

tsalo added 5 commits April 3, 2019 08:37

Add some tests to be a good contributor.

a4ab382

[FIX] Make figures using un-orthogonalized mixing matrix

6fadad3

[REF] Clean up comptable handling in tedana.io

d5e6776

Also fix up docstrings and eliminate step where component number is set as a column. Always keep it as the index of the DataFrame.

Merge remote-tracking branch 'ME-ICA/master' into ref-selcomps

10d2f84

# Conflicts: # tedana/model/__init__.py # tedana/model/fit.py # tedana/selection/select_comps.py # tedana/workflows/tedana.py

emdupre force-pushed the master branch from 3c59b50 to 82447c5 Compare April 3, 2019 15:54

tsalo added 4 commits April 3, 2019 12:02

Merge remote-tracking branch 'ME-ICA/master' into ref-selcomps

78dafd2

# Conflicts: # tedana/decomposition/eigendecomp.py # tedana/io.py # tedana/model/fit.py # tedana/viz.py # tedana/workflows/tedana.py

Move kundu_tedpca into selection and rename select_comps to tedica.

c159335

Fix style issues.

1be0870

Squish bug in kundu_tedpca.

d91a398

tsalo changed the title ~~[WIP, REF, DOC] Reorganize selcomps and fitmodels_direct~~ [REF, DOC] Reorganize selcomps and fitmodels_direct Apr 5, 2019

tsalo requested a review from emdupre April 5, 2019 15:15

tsalo mentioned this pull request Apr 6, 2019

Operate on data as imgs and lists of imgs throughout package #251

Closed

tsalo added 6 commits April 16, 2019 21:04

Rename variances, improve documentation, and pcastate file.

10e401a

Fix.

9527df4

Fix.

a93458c

Fix.

f14bb4f

Remove pcastate file from Circle outputs.

7375de9

Split eigendecomp into ica and pca.

650b8a9

- ste —> source_tes - ‘ste’ —> ‘paid’ - Drop PAID support from tedana workflow

Fix test.

b1cd29d

jbteves reviewed Apr 19, 2019

View reviewed changes

This was referenced Apr 20, 2019

Finish adding math to function docstrings #47

Closed

Add component selection documentation #84

Closed

This was referenced Apr 21, 2019

[REF] Split eigendecomp into ICA and PCA files #265

Merged

[REF] Reorganize selcomps and fitmodels_direct #266

Merged

tsalo closed this Apr 21, 2019

emdupre reviewed Apr 22, 2019

View reviewed changes

tsalo mentioned this pull request May 23, 2019

[FIX] Sort comptable by varex before identifying outlier components #295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REF, DOC] Reorganize selcomps and fitmodels_direct #247

[REF, DOC] Reorganize selcomps and fitmodels_direct #247

tsalo commented Apr 3, 2019 •

edited

Loading

codecov bot commented Apr 3, 2019 •

edited

Loading

tsalo commented Apr 5, 2019

jbteves left a comment

jbteves commented Apr 19, 2019 •

edited

Loading

tsalo commented Apr 20, 2019

emdupre left a comment

emdupre Apr 11, 2019

tsalo Apr 23, 2019

emdupre Apr 11, 2019

tsalo Apr 23, 2019

emdupre Apr 11, 2019

tsalo Apr 23, 2019

tsalo Apr 23, 2019

emdupre Apr 11, 2019

tsalo Apr 23, 2019

emdupre Apr 11, 2019

tsalo Apr 23, 2019

emdupre Apr 11, 2019

jbteves Apr 22, 2019 •

edited

Loading

		return comptable, seldict, betas, mmix_new


		def kundu_metrics(comptable, metric_maps):

[REF, DOC] Reorganize selcomps and fitmodels_direct #247

[REF, DOC] Reorganize selcomps and fitmodels_direct #247

Conversation

tsalo commented Apr 3, 2019 • edited Loading

codecov bot commented Apr 3, 2019 • edited Loading

Codecov Report

tsalo commented Apr 5, 2019

jbteves left a comment

Choose a reason for hiding this comment

jbteves commented Apr 19, 2019 • edited Loading

tsalo commented Apr 20, 2019

emdupre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbteves Apr 22, 2019 • edited Loading

Choose a reason for hiding this comment

tsalo commented Apr 3, 2019 •

edited

Loading

codecov bot commented Apr 3, 2019 •

edited

Loading

jbteves commented Apr 19, 2019 •

edited

Loading

jbteves Apr 22, 2019 •

edited

Loading