Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH, FIX] PCA variance enhancements and consistency improvements #877

Merged
merged 4 commits into from
May 13, 2022

Conversation

handwerkerd
Copy link
Member

@handwerkerd handwerkerd commented May 12, 2022

Closes #876.

Changes proposed in this pull request:

  • We changed AIC to the default PCA selection criterion in Made AIC the default maPCA option #849 for the API. I noticed it was still defaulting to MDL for the command line interface, so I fixed that
  • As discussed in understanding variance and normalized variance explained #876, normalized variance explained was calculated one way for mdl, kic, & aic, and a different way for the low_mem option or if a specific variance explained as selected. Everything now matches the calculation used for mdl, kic, & aic
  • The sum of the normalized variances across PCA components is the % variance of the original data that's explained by the PCA according to the PCA function in sklearn. I am now including this value in the info logger. It doesn't match the percent variance explained values that are given as later outputs. Figuring out how to make sense of the difference is a good future goal.
  • The PCA function from sklearn has an option to input a pre-specified number of components as an integer. We were unnecessarily blocking off that option (which I wanted to use for testing some stuff). I updated check_tedpca_value to allow integers >=1 and added unit tests for valid and invalid integers.
  • doc strings and logging messages were edited in response to the above changes.

Notes:

  • Just to this is cleanly documented, the current version of the code calculated varex_norm = varex / varex.sum() for the pre-defined variance-explained option and also for the low-mem option. It uses varex_norm = ppca.explained_variance_ratio_ for mdl, kic, & aic.
  • The testing coverage drops slightly because only integration testing covers decomposition/pca.py and tedana.py We can add another integration test for when a fixed integer number of components are predefined, but since that would non-trivially increase the duration of the tests, I didn't want to do that without some discussion. Adding direct tests for pca.py and modularizing tedana.py more would be a better approach, but well beyond the scope of this PR.

@codecov
Copy link

codecov bot commented May 12, 2022

Codecov Report

Merging #877 (65645fc) into main (d4406e4) will decrease coverage by 0.05%.
The diff coverage is 77.27%.

@@            Coverage Diff             @@
##             main     #877      +/-   ##
==========================================
- Coverage   93.15%   93.09%   -0.06%     
==========================================
  Files          27       27              
  Lines        2234     2245      +11     
==========================================
+ Hits         2081     2090       +9     
- Misses        153      155       +2     
Impacted Files Coverage Δ
tedana/workflows/tedana.py 90.00% <ø> (ø)
tedana/decomposition/pca.py 85.26% <61.53%> (-1.26%) ⬇️
tedana/workflows/parser_utils.py 95.83% <100.00%> (+1.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4406e4...65645fc. Read the comment docs.

Copy link
Collaborator

@jbteves jbteves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor typo, and one place where I'd prefer you use the word instead of the symbol because my brain keeps reading it as a bitwise and since this is code. Otherwise LGTM.

tedana/decomposition/pca.py Outdated Show resolved Hide resolved
tedana/workflows/tedana.py Outdated Show resolved Hide resolved
handwerkerd and others added 2 commits May 12, 2022 09:29
typo

Co-authored-by: Joshua Teves <jbtevespro@gmail.com>
Co-authored-by: Joshua Teves <jbtevespro@gmail.com>
@handwerkerd handwerkerd requested a review from jbteves May 12, 2022 14:59
Copy link
Collaborator

@jbteves jbteves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@eurunuela eurunuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you @handwerkerd !

@handwerkerd handwerkerd merged commit e44d488 into ME-ICA:main May 13, 2022
@handwerkerd handwerkerd deleted the consistent-normalized-variance branch May 13, 2022 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

understanding variance and normalized variance explained
3 participants