Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change split-half resampling mechanism #24

Open
rmarkello opened this issue Apr 4, 2018 · 0 comments
Open

Change split-half resampling mechanism #24

rmarkello opened this issue Apr 4, 2018 · 0 comments
Labels
refactor Not an enhancement, but not a bug

Comments

@rmarkello
Copy link
Member

rmarkello commented Apr 4, 2018

The current split-half resampling implementation, which is identical to the implementation in the Matlab PLS toolbox, could potentially use some tweaking. For a primer/background/deep dive into the math and rationale behind the current code, check the original paper by Kovacevic et al. (2013).

The intended goal of split-half resampling is to provide a metric of reliability. That is, it aims to offer an assessment of how much the observed effects (i.e., latent variables) are supported by the data, regardless of the samples (i.e., subjects) that are driving those decompositions. In a way, this aim could perhaps be better achieved by cross-validation, as described in #21.

If cross-validation is implemented, then we could eliminate split-half resampling altogether. However, another option would be to only eliminate performing split-half resampling during the permutation testing, and instead assess reliability of split-half resampling for the original (non-permuted) data.

Doing split-half resampling on the original data would result in a distribution of correlations for each left/right singular vector (U and V). Rather than returning a non-parametric p-value, as is done with the current split-half resampling + permutation paradigm, we could generate some basic metrics for interpreting the distribution (e.g., confidence intervals, central tendency, skewness). These metrics could be reported with the standard PLSResults. Notably, the proposed regime would be significantly less computationally expensive (see below for step-by-step)

The proposal (with math!)

Where n_split = 100 and n_perm=1000, and X and Y are input data matrices of shape (N x M1) and (N x M2).

Current split-half resampling paradigm

  1. Generate the cross-covariance matrix, D = Y.T @ X, and perform SVD on it D = U @ S @ V.T
  2. Randomly split D into two halves (row-wise), D1 and D2, and project them onto the original left/right singular vectors: U1 = D1.T @ V, U2 = D2.T @ V, V1 = D1 @ U, V2 = D2 @ U;
  3. Compute the Pearson correlation of the projected singular vectors: U_corr = corr(U1, U2) and V_corr = corr(V1, V2), where U_corr and V_corr are vectors of correlations for each singular vector separately;
  4. Repeat steps 2-3 n_split times and take the average of U_corr and V_corr across all splits: U_corr_mean = mean(U_corr) and V_corr_mean = mean(V_corr);
  5. Permute Y randomly and repeat steps 1-4 n_perm times;
  6. Assess how many times U_corr_mean and V_corr_mean from the permuted decompositions (steps 5) are higher than the original values, and divide by the number of permutations (1000) to generate a p-value to report

Proposed split-half resampling paradigm

  1. Generate the cross-covariance matrix, D = Y.T @ X, and perform SVD on it D = U @ S @ V.T
  2. Randomly split D into two halves (row-wise), D1 and D2, and project them onto the original left/right singular vectors: U1 = D1.T @ V, U2 = D2.T @ V, V1 = D1 @ U, V2 = D2 @ U;
  3. Compute the Pearson correlation of the projected singular vectors: U_corr = corr(U1, U2) and V_corr = corr(V1, V2), where U_corr and V_corr are vectors of correlations for each singular vector separately;
  4. Repeat steps 2-3 n_split times to generate a distribution of correlations for each singular vector
  5. Compute various metrics on the distributions (i.e., 95%ile values, central tendency, skewness) and report
@rmarkello rmarkello added the enhancement New feature or request label Apr 4, 2018
@rmarkello rmarkello added refactor Not an enhancement, but not a bug and removed enhancement New feature or request labels Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Not an enhancement, but not a bug
Projects
None yet
Development

No branches or pull requests

1 participant