-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default settings for non-MLE PCA raise warning #255
Comments
Pending the complexity of the decision tree, which I have yet to fully wrap my head around, I would think that we should just completely remove anything with a variance of zero. If we trust our variance metric, then variance 0 components don't matter. I could see this being a problem, though, if the number of components from PCA is somehow used in the decision tree (though I don't think it is from what I've read/seen in the code). |
One of our three measures of variance explained (now called Classifications aren't directly dependent on the number of components (afaik), but including extra components does affect the classification because of the elbows. |
I suppose this is waiting on team-decomp ? |
I’d say it’s save to fix the number of components to be @tsalo , how does changing the number of components impact the decision tree? |
As far as I can recall, it doesn't impact the results much, since it only changes the number of components by one. |
Then I’d fix it and close the issue. Does that sound right to you @tsalo ? |
So... it looks like I already addressed this in #364 and forgot to tag this issue in the PR. Sorry about that! Closing now. |
Summary
The default settings for sklearn's PCA (used for the Kundu tedpca decision tree, but not MLE) involve deriving as many components as there are volumes in the time series. The last component explains no variance and causes the following warning to be raised for both the three-echo rest and five-echo task test datasets:
Additional Detail
From what I recall, PCA should estimate
n_vols - 1
components, notn_vols
, which would explain why the last component explains no variance and is causing this divide-by-zero warning. I think we should override the default number of components here withn_vols - 1
. This will prevent the warning, but the components that matter will be the same.However, I also noticed that changing the number of components will also impacts the decision tree, so I decided to raise an issue where we could discuss this rather that opening a small bug-fix PR.
Next Steps
tedana/tedana/decomposition/eigendecomp.py
Lines 301 to 306 in 82447c5
we can simply change
ppca = PCA()
toppca = PCA(n_components=(n_vols - 1))
The text was updated successfully, but these errors were encountered: