You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current split-half resampling implementation, which is identical to the implementation in the Matlab PLS toolbox, could potentially use some tweaking. For a primer/background/deep dive into the math and rationale behind the current code, check the original paper by Kovacevic et al. (2013).
The intended goal of split-half resampling is to provide a metric of reliability. That is, it aims to offer an assessment of how much the observed effects (i.e., latent variables) are supported by the data, regardless of the samples (i.e., subjects) that are driving those decompositions. In a way, this aim could perhaps be better achieved by cross-validation, as described in #21.
If cross-validation is implemented, then we could eliminate split-half resampling altogether. However, another option would be to only eliminate performing split-half resampling during the permutation testing, and instead assess reliability of split-half resampling for the original (non-permuted) data.
Doing split-half resampling on the original data would result in a distribution of correlations for each left/right singular vector (U and V). Rather than returning a non-parametric p-value, as is done with the current split-half resampling + permutation paradigm, we could generate some basic metrics for interpreting the distribution (e.g., confidence intervals, central tendency, skewness). These metrics could be reported with the standard PLSResults. Notably, the proposed regime would be significantly less computationally expensive (see below for step-by-step)
The proposal (with math!)
Where n_split = 100 and n_perm=1000, and X and Y are input data matrices of shape (N x M1) and (N x M2).
Current split-half resampling paradigm
Generate the cross-covariance matrix, D = Y.T @ X, and perform SVD on it D = U @ S @ V.T
Randomly split D into two halves (row-wise), D1 and D2, and project them onto the original left/right singular vectors: U1 = D1.T @ V, U2 = D2.T @ V, V1 = D1 @ U, V2 = D2 @ U;
Compute the Pearson correlation of the projected singular vectors: U_corr = corr(U1, U2) and V_corr = corr(V1, V2), where U_corr and V_corr are vectors of correlations for each singular vector separately;
Repeat steps 2-3 n_split times and take the average of U_corr and V_corr across all splits: U_corr_mean = mean(U_corr) and V_corr_mean = mean(V_corr);
Permute Y randomly and repeat steps 1-4 n_perm times;
Assess how many times U_corr_mean and V_corr_mean from the permuted decompositions (steps 5) are higher than the original values, and divide by the number of permutations (1000) to generate a p-value to report
Proposed split-half resampling paradigm
Generate the cross-covariance matrix, D = Y.T @ X, and perform SVD on it D = U @ S @ V.T
Randomly split D into two halves (row-wise), D1 and D2, and project them onto the original left/right singular vectors: U1 = D1.T @ V, U2 = D2.T @ V, V1 = D1 @ U, V2 = D2 @ U;
Compute the Pearson correlation of the projected singular vectors: U_corr = corr(U1, U2) and V_corr = corr(V1, V2), where U_corr and V_corr are vectors of correlations for each singular vector separately;
Repeat steps 2-3 n_split times to generate a distribution of correlations for each singular vector
Compute various metrics on the distributions (i.e., 95%ile values, central tendency, skewness) and report
The text was updated successfully, but these errors were encountered:
The current split-half resampling implementation, which is identical to the implementation in the Matlab PLS toolbox, could potentially use some tweaking. For a primer/background/deep dive into the math and rationale behind the current code, check the original paper by Kovacevic et al. (2013).
The intended goal of split-half resampling is to provide a metric of reliability. That is, it aims to offer an assessment of how much the observed effects (i.e., latent variables) are supported by the data, regardless of the samples (i.e., subjects) that are driving those decompositions. In a way, this aim could perhaps be better achieved by cross-validation, as described in #21.
If cross-validation is implemented, then we could eliminate split-half resampling altogether. However, another option would be to only eliminate performing split-half resampling during the permutation testing, and instead assess reliability of split-half resampling for the original (non-permuted) data.
Doing split-half resampling on the original data would result in a distribution of correlations for each left/right singular vector (
U
andV
). Rather than returning a non-parametric p-value, as is done with the current split-half resampling + permutation paradigm, we could generate some basic metrics for interpreting the distribution (e.g., confidence intervals, central tendency, skewness). These metrics could be reported with the standardPLSResults
. Notably, the proposed regime would be significantly less computationally expensive (see below for step-by-step)The proposal (with math!)
Where
n_split = 100
andn_perm=1000
, andX
andY
are input data matrices of shape(N x M1)
and(N x M2)
.Current split-half resampling paradigm
D = Y.T @ X
, and perform SVD on itD = U @ S @ V.T
D
into two halves (row-wise),D1
andD2
, and project them onto the original left/right singular vectors:U1 = D1.T @ V
,U2 = D2.T @ V
,V1 = D1 @ U
,V2 = D2 @ U
;U_corr = corr(U1, U2)
andV_corr = corr(V1, V2)
, whereU_corr
andV_corr
are vectors of correlations for each singular vector separately;n_split
times and take the average ofU_corr
andV_corr
across all splits:U_corr_mean = mean(U_corr)
andV_corr_mean = mean(V_corr)
;Y
randomly and repeat steps 1-4n_perm
times;U_corr_mean
andV_corr_mean
from the permuted decompositions (steps 5) are higher than the original values, and divide by the number of permutations (1000) to generate a p-value to reportProposed split-half resampling paradigm
D = Y.T @ X
, and perform SVD on itD = U @ S @ V.T
D
into two halves (row-wise),D1
andD2
, and project them onto the original left/right singular vectors:U1 = D1.T @ V
,U2 = D2.T @ V
,V1 = D1 @ U
,V2 = D2 @ U
;U_corr = corr(U1, U2)
andV_corr = corr(V1, V2)
, whereU_corr
andV_corr
are vectors of correlations for each singular vector separately;n_split
times to generate a distribution of correlations for each singular vectorThe text was updated successfully, but these errors were encountered: