Default settings for non-MLE PCA raise warning #255

tsalo · 2019-04-17T13:11:27Z

Summary

The default settings for sklearn's PCA (used for the Kundu tedpca decision tree, but not MLE) involve deriving as many components as there are volumes in the time series. The last component explains no variance and causes the following warning to be raised for both the three-echo rest and five-echo task test datasets:

tedana/tedana/model/fit.py:169: RuntimeWarning: divide by zero encountered in true_divide
  F_S0 = (alpha - SSE_S0) * (n_echos - 1) / (SSE_S0)

Additional Detail

From what I recall, PCA should estimate n_vols - 1 components, not n_vols, which would explain why the last component explains no variance and is causing this divide-by-zero warning. I think we should override the default number of components here with n_vols - 1. This will prevent the warning, but the components that matter will be the same.

However, I also noticed that changing the number of components will also impacts the decision tree, so I decided to raise an issue where we could discuss this rather that opening a small bug-fix PR.

Next Steps

If we decide to override the default number of PCA components, then here:

tedana/tedana/decomposition/eigendecomp.py

Lines 301 to 306 in 82447c5

    
           ppca = PCA() 
        
           ppca.fit(dz) 
        
           comp_ts = ppca.components_ 
        
           varex = ppca.explained_variance_ 
        
           voxel_comp_weights = np.dot(np.dot(dz, comp_ts.T), 
        
                                       np.diag(1. / varex))

we can simply change ppca = PCA() to ppca = PCA(n_components=(n_vols - 1))

The text was updated successfully, but these errors were encountered:

jbteves · 2019-04-19T16:23:07Z

Pending the complexity of the decision tree, which I have yet to fully wrap my head around, I would think that we should just completely remove anything with a variance of zero. If we trust our variance metric, then variance 0 components don't matter. I could see this being a problem, though, if the number of components from PCA is somehow used in the decision tree (though I don't think it is from what I've read/seen in the code).

tsalo · 2019-04-27T17:46:56Z

One of our three measures of variance explained (now called original normalized variance explained) comes pretty much directly from the PCA, and that one's zero for this component as well, so I think we can trust the metric.

Classifications aren't directly dependent on the number of components (afaik), but including extra components does affect the classification because of the elbows.

emdupre · 2019-11-08T17:51:54Z

I suppose this is waiting on team-decomp ?

eurunuela · 2020-01-17T07:36:24Z

I’d say it’s save to fix the number of components to be n_vols - 1.

@tsalo , how does changing the number of components impact the decision tree?

tsalo · 2020-01-17T14:24:59Z

As far as I can recall, it doesn't impact the results much, since it only changes the number of components by one.

eurunuela · 2020-01-17T15:28:15Z

Then I’d fix it and close the issue. Does that sound right to you @tsalo ?

tsalo · 2020-02-20T21:51:17Z

So... it looks like I already addressed this in #364 and forgot to tag this issue in the PR. Sorry about that! Closing now.

tsalo added bug issues describing a bug or error found in the project discussion issues that still need to be discussed priority: low issues that are not urgent labels Apr 17, 2019

tsalo mentioned this issue Jul 11, 2019

[ENH] Only use good echoes for metric calculation/optimal combination #358

Merged

tsalo added the decomposition issues related to decomposition methods label Oct 4, 2019

handwerkerd mentioned this issue Jan 10, 2020

Topics for January 2020 Developers' call #521

Closed

tsalo closed this as completed Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default settings for non-MLE PCA raise warning #255

Default settings for non-MLE PCA raise warning #255

tsalo commented Apr 17, 2019 •

edited

Loading

jbteves commented Apr 19, 2019

tsalo commented Apr 27, 2019

emdupre commented Nov 8, 2019

eurunuela commented Jan 17, 2020

tsalo commented Jan 17, 2020

eurunuela commented Jan 17, 2020

tsalo commented Feb 20, 2020

Default settings for non-MLE PCA raise warning #255

Default settings for non-MLE PCA raise warning #255

Comments

tsalo commented Apr 17, 2019 • edited Loading

Summary

Additional Detail

Next Steps

jbteves commented Apr 19, 2019

tsalo commented Apr 27, 2019

emdupre commented Nov 8, 2019

eurunuela commented Jan 17, 2020

tsalo commented Jan 17, 2020

eurunuela commented Jan 17, 2020

tsalo commented Feb 20, 2020

tsalo commented Apr 17, 2019 •

edited

Loading