-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCA is applying an inappropriate transformation #654
Comments
Would transposing impact the decomposition? I know that the number of components is limited based on both T and N, but I assume that one way would be a spatial decomposition vs. a temporal one... |
In theory, transposing shouldn't change PCA results. The most significant difference would be a swap between the U<->Vt matrices. However, since sklearn internally normalizes the data along For ICA, I'm not sure. Could just try it on dummy data and check if there is a difference. The current ICA in tedana looks a bit strange to me because it doesn't using How did the original meica code store the data? Was it NxT or TxN? (N = num voxels, T = number of TRs) |
I'm not sure I'm following you @notZaki . What PCA are we talking about? maPCA? Also, as you mention, transposing the data before the PCA would only swap U and Vt. I'm not familiar with sklearn's internal normalization. |
@eurunuela I am focusing on sklearn's implementation of PCA and ICA. This is partially relevant to maPCA because it uses PCA at the very end, once the number of components is estimated. |
Gotcha! We could simply transpose the input matrix and the resulting eigenvalues, then swap U and Vt. |
Summary
The PCA implementation in sklearn performs normalization along the voxel dimension and this isn't an appropriate strategy for fMRI data. We should switch to a different PCA implementation or transpose the data before PCA.
Additional Detail
The PCA implementation in sklearn centers the input data along the first dimension before decomposition. For tedana, the data is an NxT matrix and the appropriate normalization should be along the second dimension (#636).
The simplest change would be to define our own PCA function. This actually already exists in
decomposition.ma_pca._icatb_svd
.Alternatively, we could transpose the data so that its dimensions are TxN. Then we can continue using sklearn, however this will require heavier refactoring because the left/right sides of the decomposition will be swapped.
This normalization could be one of the reasons that the ICA decomposition is inconsistent (#629). Possible explanation is that the PCA step normalizes across voxels and this distorts the time-series which makes ICA's life difficult.
l did a quick test on the three echo test data, and the variability across python versions, CPU, and single/multi-threading vanished by swapping sklearn's PCA with my own definition. [log]
The text was updated successfully, but these errors were encountered: