Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF, DOC] Reorganize selcomps and fitmodels_direct #247

Closed
wants to merge 25 commits into from

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Apr 3, 2019

Closes #214 and references #16, #84, #166. This is a substantial refactor of selcomps and fitmodels_direct. I'm happy to split these changes up into multiple PRs to prevent bloat, but for the most part I feel that these changes fit together pretty well.

Changes proposed in this pull request:

tsalo added 9 commits March 28, 2019 19:47
1. Replace “acc” with “unclf” (unclassified)
2. Eliminate unnecessary intersections between
    - e.g., candartA, candartB, ign_add0
3. Clean up unnecessarily complicated indexing with .loc
4. Fix one use of d_table_score that should be d_table_score_scrub
5. Use mean of ranks instead of sum for decision table scores, which
lets us drop the n_decision_metrics variable
6. Add kappa ratio score to component table
7. Sort component table by variance explained when identifying varex
outliers
- Add new `selection.manual_selection` function.
- Allow manacc to be a list instead of just a string with commas.
- Switch order of outputs from `fitmodels_direct` so `comptable` is
first.
- Improve handling and logging of interactions between mixm, ctab, and
manacc.
- Rename selcomps to kundu_selection_v2_5
- Simplify arguments for kundu_selection_v2_5
- Simplify arguments for fitmodels_direct
- Add function with metrics from selcomps to model.fit (kundu_metrics)
@codecov
Copy link

codecov bot commented Apr 3, 2019

Codecov Report

Merging #247 into master will increase coverage by 3.08%.
The diff coverage is 41.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #247      +/-   ##
==========================================
+ Coverage   46.11%   49.19%   +3.08%     
==========================================
  Files          33       37       +4     
  Lines        2045     2120      +75     
==========================================
+ Hits          943     1043     +100     
+ Misses       1102     1077      -25
Impacted Files Coverage Δ
tedana/model/__init__.py 100% <ø> (ø) ⬆️
tedana/workflows/tedana.py 12.77% <0%> (-1.88%) ⬇️
tedana/tests/test_selection.py 100% <100%> (ø)
tedana/tests/test_model_fit_kundu_metrics.py 100% <100%> (ø)
tedana/selection/__init__.py 100% <100%> (ø) ⬆️
tedana/selection/_utils.py 25% <100%> (+10.71%) ⬆️
tedana/decomposition/__init__.py 100% <100%> (ø) ⬆️
tedana/workflows/t2smap.py 67.1% <100%> (ø) ⬆️
tedana/tests/test_model_fit_dependence_metrics.py 100% <100%> (ø)
tedana/tests/test_combine.py 100% <100%> (ø) ⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82447c5...b1cd29d. Read the comment docs.

tsalo added 5 commits April 3, 2019 08:37
Also fix up docstrings and eliminate step where component number is set
as a column. Always keep it as the index of the DataFrame.
* Speed up spatclust function.

This also clusters positive and negative values separately.

* Rename spatclust to threshold_map and add binarize/sided arguments.

* Replace manually generated binary structure with function-made one.
# Conflicts:
#	tedana/model/__init__.py
#	tedana/model/fit.py
#	tedana/selection/select_comps.py
#	tedana/workflows/tedana.py
tsalo added 4 commits April 3, 2019 12:02
# Conflicts:
#	tedana/decomposition/eigendecomp.py
#	tedana/io.py
#	tedana/model/fit.py
#	tedana/viz.py
#	tedana/workflows/tedana.py
@tsalo tsalo changed the title [WIP, REF, DOC] Reorganize selcomps and fitmodels_direct [REF, DOC] Reorganize selcomps and fitmodels_direct Apr 5, 2019
@tsalo
Copy link
Member Author

tsalo commented Apr 5, 2019

I think this PR is ready for review.

Copy link
Collaborator

@jbteves jbteves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a comment saying what bug was squished, and when it was introduced, unless you introduced it downstream in this PR?

@jbteves
Copy link
Collaborator

jbteves commented Apr 19, 2019

Per the call today, I'm willing to help split these into separate branches with @tsalo listed as the author if we think this group of changes should be split into multiple PRs. From a quick examination, it seems like it would make sense to open the following as separate PRs:

  1. Replace 'acc' with 'unclf'
  2. Clean up unnecessary intersections
  3. Get rid of ordinal ranking
  4. Split Kundu v2.5 metrics into separate function
  5. Clean up comptable handling
  6. Split eigendecomp into ica and pca
  7. Assorted documentation enhancements/corrections

I realize that's a lot, but this is a really big PR and I think changes of this scale will be more easily digested in smaller chunks, plus it will help us understand which parts are related to each other. Any thoughts, @emdupre and @tsalo ?

@tsalo
Copy link
Member Author

tsalo commented Apr 20, 2019

I've split off most of the self-contained changes from this PR into new ones. The only changes that I haven't done over in new branches are the large-scale reorganizations of selcomps, fitmodels_direct, and eigendecomp (i.e., the changes proposed in the first comment that haven't been struck through). I will open a PR with that larger reorganization once all of these smaller PRs have been dealt with, to avoid annoying merge conflicts.

Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this review is old but I'm mostly posting it for myself ! I had been trying to make my way through -- I think smaller PRs will definitely help !! Thanks a lot @jbteves and @tsalo 💖

def tedpca(catd, OCcatd, combmode, mask, t2s, t2sG,
ref_img, tes, method='mle', ste=-1, kdaw=10., rdaw=1.,
verbose=False):
out_dir='.', verbose=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the docstring to reference this new argument ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed in #265.


Returns
-------
n_components : :obj:`int`
Number of components retained from PCA decomposition
dd : (S x T) :obj:`numpy.ndarray`
kept_data : (S x T) :obj:`numpy.ndarray`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name doesn't seem to match with the description immediately below it. Can we sync these ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caught this a little after the fact. It's fixed in #265.

reindex=False, mmixN=vTmixN, method=None,
label='mepca_', out_dir=out_dir, verbose=verbose)
# varex_norm overrides normalized varex computed by dependence_metrics
comptable['real normalized variance explained'] = varex_norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'real normalized variance explained' makes it sound like there's a 'fake' one 😆 Maybe we should consider updating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I reconsidered that wording a bit later. In #266 I change it to "original normalized variance explained." I can change that to something better though, if you want. I just want to retain both versions, for now, until we can validate fitmodels_direct.

I would argue, though, that the one estimated in fitmodels_direct is kind of fake, since it doesn't match the one that comes directly from the PCA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that should really go in #265. Pushing fixed name now.

Whether to sort components in descending order by Kappa. Default: False
mmixN : (T x C) array_like, optional
Z-scored mixing matrix. Default: None
method : {'kundu_v2', 'kundu_v3', None}, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't added back in v3, right ? Should we include it here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We calculate some metric maps that are only used in v3.2, like WTS, tsoc_B, PSC, and F_S0_maps. Those maps are currently sort of vestigial, but I didn't want to remove them completely in case we do add v3 back in.

't2s_full array {1}'.format(t2s.shape,
t2s_full.shape))
elif not (catd.shape[2] == tsoc.shape[1] == mmix.shape[0]):
raise ValueError('Third dimension (number of volumes) of catd ({0}), '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify this to explain that all formatted values should be the number of volumes -- since that's setting the dimensionality of the other two ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like "Number of volumes of catd..., etc. do not match"?

return comptable, seldict, betas, mmix_new


def kundu_metrics(comptable, metric_maps):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about not yet having v3 incorporated.

Copy link
Collaborator

@jbteves jbteves Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an eventual plan to put it in there, it might not hurt to stub it and place a %TODO under it, at least in part so there's a prototype for how you would slot in a new method after this refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop PAID support from tedana workflow
3 participants