Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default settings for non-MLE PCA raise warning #255

Closed
tsalo opened this issue Apr 17, 2019 · 7 comments · Fixed by #358
Closed

Default settings for non-MLE PCA raise warning #255

tsalo opened this issue Apr 17, 2019 · 7 comments · Fixed by #358
Labels
bug issues describing a bug or error found in the project decomposition issues related to decomposition methods discussion issues that still need to be discussed priority: low issues that are not urgent

Comments

@tsalo
Copy link
Member

tsalo commented Apr 17, 2019

Summary

The default settings for sklearn's PCA (used for the Kundu tedpca decision tree, but not MLE) involve deriving as many components as there are volumes in the time series. The last component explains no variance and causes the following warning to be raised for both the three-echo rest and five-echo task test datasets:

tedana/tedana/model/fit.py:169: RuntimeWarning: divide by zero encountered in true_divide
  F_S0 = (alpha - SSE_S0) * (n_echos - 1) / (SSE_S0)

Additional Detail

From what I recall, PCA should estimate n_vols - 1 components, not n_vols, which would explain why the last component explains no variance and is causing this divide-by-zero warning. I think we should override the default number of components here with n_vols - 1. This will prevent the warning, but the components that matter will be the same.

However, I also noticed that changing the number of components will also impacts the decision tree, so I decided to raise an issue where we could discuss this rather that opening a small bug-fix PR.

Next Steps

  1. If we decide to override the default number of PCA components, then here:
    ppca = PCA()
    ppca.fit(dz)
    comp_ts = ppca.components_
    varex = ppca.explained_variance_
    voxel_comp_weights = np.dot(np.dot(dz, comp_ts.T),
    np.diag(1. / varex))

    we can simply change ppca = PCA() to ppca = PCA(n_components=(n_vols - 1))
@tsalo tsalo added bug issues describing a bug or error found in the project discussion issues that still need to be discussed priority: low issues that are not urgent labels Apr 17, 2019
@jbteves
Copy link
Collaborator

jbteves commented Apr 19, 2019

Pending the complexity of the decision tree, which I have yet to fully wrap my head around, I would think that we should just completely remove anything with a variance of zero. If we trust our variance metric, then variance 0 components don't matter. I could see this being a problem, though, if the number of components from PCA is somehow used in the decision tree (though I don't think it is from what I've read/seen in the code).

@tsalo
Copy link
Member Author

tsalo commented Apr 27, 2019

One of our three measures of variance explained (now called original normalized variance explained) comes pretty much directly from the PCA, and that one's zero for this component as well, so I think we can trust the metric.

Classifications aren't directly dependent on the number of components (afaik), but including extra components does affect the classification because of the elbows.

@tsalo tsalo added the decomposition issues related to decomposition methods label Oct 4, 2019
@emdupre
Copy link
Member

emdupre commented Nov 8, 2019

I suppose this is waiting on team-decomp ?

@eurunuela
Copy link
Collaborator

I’d say it’s save to fix the number of components to be n_vols - 1.

@tsalo , how does changing the number of components impact the decision tree?

@tsalo
Copy link
Member Author

tsalo commented Jan 17, 2020

As far as I can recall, it doesn't impact the results much, since it only changes the number of components by one.

@eurunuela
Copy link
Collaborator

Then I’d fix it and close the issue. Does that sound right to you @tsalo ?

@tsalo
Copy link
Member Author

tsalo commented Feb 20, 2020

So... it looks like I already addressed this in #364 and forgot to tag this issue in the PR. Sorry about that! Closing now.

@tsalo tsalo closed this as completed Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issues describing a bug or error found in the project decomposition issues related to decomposition methods discussion issues that still need to be discussed priority: low issues that are not urgent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants