Change split-half resampling mechanism #24

rmarkello · 2018-04-04T15:11:17Z

The current split-half resampling implementation, which is identical to the implementation in the Matlab PLS toolbox, could potentially use some tweaking. For a primer/background/deep dive into the math and rationale behind the current code, check the original paper by Kovacevic et al. (2013).

The intended goal of split-half resampling is to provide a metric of reliability. That is, it aims to offer an assessment of how much the observed effects (i.e., latent variables) are supported by the data, regardless of the samples (i.e., subjects) that are driving those decompositions. In a way, this aim could perhaps be better achieved by cross-validation, as described in #21.

If cross-validation is implemented, then we could eliminate split-half resampling altogether. However, another option would be to only eliminate performing split-half resampling during the permutation testing, and instead assess reliability of split-half resampling for the original (non-permuted) data.

Doing split-half resampling on the original data would result in a distribution of correlations for each left/right singular vector (U and V). Rather than returning a non-parametric p-value, as is done with the current split-half resampling + permutation paradigm, we could generate some basic metrics for interpreting the distribution (e.g., confidence intervals, central tendency, skewness). These metrics could be reported with the standard PLSResults. Notably, the proposed regime would be significantly less computationally expensive (see below for step-by-step)

The proposal (with math!)

Where n_split = 100 and n_perm=1000, and X and Y are input data matrices of shape (N x M1) and (N x M2).

Current split-half resampling paradigm

Generate the cross-covariance matrix, D = Y.T @ X, and perform SVD on it D = U @ S @ V.T
Randomly split D into two halves (row-wise), D1 and D2, and project them onto the original left/right singular vectors: U1 = D1.T @ V, U2 = D2.T @ V, V1 = D1 @ U, V2 = D2 @ U;
Compute the Pearson correlation of the projected singular vectors: U_corr = corr(U1, U2) and V_corr = corr(V1, V2), where U_corr and V_corr are vectors of correlations for each singular vector separately;
Repeat steps 2-3 n_split times and take the average of U_corr and V_corr across all splits: U_corr_mean = mean(U_corr) and V_corr_mean = mean(V_corr);
Permute Y randomly and repeat steps 1-4 n_perm times;
Assess how many times U_corr_mean and V_corr_mean from the permuted decompositions (steps 5) are higher than the original values, and divide by the number of permutations (1000) to generate a p-value to report

Proposed split-half resampling paradigm

Generate the cross-covariance matrix, D = Y.T @ X, and perform SVD on it D = U @ S @ V.T
Randomly split D into two halves (row-wise), D1 and D2, and project them onto the original left/right singular vectors: U1 = D1.T @ V, U2 = D2.T @ V, V1 = D1 @ U, V2 = D2 @ U;
Compute the Pearson correlation of the projected singular vectors: U_corr = corr(U1, U2) and V_corr = corr(V1, V2), where U_corr and V_corr are vectors of correlations for each singular vector separately;
Repeat steps 2-3 n_split times to generate a distribution of correlations for each singular vector
Compute various metrics on the distributions (i.e., 95%ile values, central tendency, skewness) and report

The text was updated successfully, but these errors were encountered:

rmarkello added the enhancement New feature or request label Apr 4, 2018

rmarkello added refactor Not an enhancement, but not a bug and removed enhancement New feature or request labels Sep 26, 2019

JohannesWiesner mentioned this issue Apr 7, 2022

What exactly is the cv argument doing? #60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change split-half resampling mechanism #24

Change split-half resampling mechanism #24

rmarkello commented Apr 4, 2018 •

edited

Loading

Change split-half resampling mechanism #24

Change split-half resampling mechanism #24

Comments

rmarkello commented Apr 4, 2018 • edited Loading

The proposal (with math!)

Current split-half resampling paradigm

Proposed split-half resampling paradigm

rmarkello commented Apr 4, 2018 •

edited

Loading