Support veRSA #92

SergeantChris · 2025-01-28T16:52:15Z

The goal of this PR is to support computing RSA on top of voxelwise encoding, to implement the method sometimes mentioned as veRSA. While implementing this though, I ended up making some other changes as well, to support usecases I thought important and to avoid recomputation of transforms.

Altogether, the changes are outlined as follows:

veRSA support: A parameter is added to the encoding functions to optionally create RDMs and compute RSA between predicted and real voxels - for the moment only the very basic RSA is supported with pearson for RDMs and spearman for RSA, this can be extended in the future. When veRSA is enabled, the regression returns a single value instead of a list of coefficients, as it is no longer in voxel space. Thus, the option return_correlations is not supported with veRSA.
Existing code recomputed the PCA transform on the model layer features for every ROI, even though the ROIs do not influence this computation (the same random seed is used for the folds for each ROI, e.g. for 3 folds, 42, 43, and, 44 by default). As the PCA is probably the biggest overhead in this evaluation, and the number of ROIs can easily be in the range of 20-40, this was a major bottleneck. In this PR, the ROI loop is moved inside the encoding_metric function, at the innermost loop. Also, as for different subjects the fold splitting is also not influenced, the way I see this done for multiple subject ROIs is to have them named with unique names e.g. V1_subj1.npy to be processed as ROIs.
In the encode_layer function activations for all samples were currently stacked in memory, and for some models with very large feature vectors this can no longer fit in memory (even 64G of RAM). So an option mem_mode is added for the user to chose either to stack the features or transform them one by one (they are still stacked up to batch_size to fit PCA). Also, the last batch was not processed correctly in case the number of samples was not exactly divided by batch_size.
There is a use case where the user might want to train the cross-validated regressions and then use them to predict voxels in an unseen test set. In this case the regression models of all folds must be saved, as well as the PCA transforms.
(Minor) A shuffle argument is added and integer input is supported in train_test_split: these are to support the option of training and evaluating on an exact (known) train-test split.

Finally, although I implemented these for both the Linar and Ridge regression functions, I haven't tested it on the Ridge one, and I am a bit confused on what takes place in encode_layer_ridge. Is this code complete?

~~Also, TODO: Update user-guide notebooks with new parameters.~~
~~and TODO: Re-target this PR on development after the small_fixes PR is merged.~~

…ave mode for PCA; Avoid recomputing PCA for different ROIs.

ToastyDom · 2025-02-01T23:35:20Z

Hey, thank you for the PR!

I love the veRSA feature addition - it's a really valuable contribution! The improved computational speed of the encoding functionality is also fantastic. I've tested the code and it works well!

Regarding the encode_layer_ridge function - good point. This is actually a leftover from the linear encoding function, which uses PCA for dimensionality reduction. The ridge regression version simply splits and flattens the activation without PCA. The docstring incorrectly states it's using Incremental PCA. I'll fix this in another push to dev and add averaging across features to make it equivalent to linear encoding.

While replicating some notebooks to test the function, I noticed it only runs when save_model and save_pca are set to true. Otherwise, I get this error:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

which probably comes from

pca_trn, pca_tst = encode_layer(trn_Idx, tst_Idx, feat_path, layer_id, avg_across_feat, batch_size, n_components, mem_mode=mem_mode, save_pca=save_pca, save_path=f'{prediction_save_path}/pca.pkl' if save_pca else None)

This is probably not planned, right?

As always great contribution, thank you so much!!

review-notebook-app · 2025-02-04T15:07:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

SergeantChris · 2025-02-04T15:12:33Z

Again thanks for the review!
Good catch with the save_model / save_pca False cases, I guess it escaped my attention - I fixed it now.
Also, I updated the notebook documentation (you can check it on the ReviewNB link above).

ToastyDom · 2025-02-05T11:59:32Z

It looks great, thank you so much Christina!

I honestly think its my inability but I still struggle with the save_model / save_pca False cases. I am executing the Cognition Academy Dresden Notebook 2.ipynb to usually test for Linear Encoding (because its quick and has some data) and here the function still only runs when the parameters are set to true. See error below

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

# Relevant traceback:
File ~/Documents/Repositories/Net2Brain/net2brain/evaluations/encoding.py:104, in encode_layer
    if mem_mode == 'saver' or os.path.exists(save_path):

The code is from this section:

# Start Linear Encoding
from net2brain.evaluations.encoding import Linear_Encoding, Ridge_Encoding

# Function to run linear encoding
def run_linear_encoding(models, subjects, hemispheres, rois, n_folds, n_components, batch_size):
    if n_components >= batch_size:
        print("n_components must be smaller than batch_size")
        return

    for subject in subjects:
        for hemisphere in hemispheres:
            for roi in rois:
                roi_file = f'{roi}_{hemisphere}_subj{subject}.npy'
                roi_data_path = os.path.join(roi_data, roi_file)

                config = f"{n_folds}f_{n_components}c_{batch_size}b"

                for model in models:
                    model_name = model + '_feats'

                    # Start Net2Brains Linear Encoding
                    print(os.path.join(current_directory, model_name))
                    print(roi_data_path)

                    Linear_Encoding(
                        feat_path=os.path.join(current_directory, model_name),
                        roi_path=roi_data_path,
                        model_name=f"{model_name}_{config}",
                        trn_tst_split=0.8,
                        n_folds=n_folds,
                        n_components=n_components,
                        batch_size=batch_size,
                        random_state=42,
                        return_correlations=True,
                        save_path=f"Tutorial_LE_Results_Harry/subj{subject}",
                        file_name=f"{model_name}_{roi}_{hemisphere}_{config}",
                        avg_across_feat=True
                    )

                    print("")
                    print(f"Finished running Linear Encoding for subject={subject}, hemisphere={hemisphere}, roi={roi}, model={model_name}")


# Create and display widgets for linear encoding
print(current_directory)
create_linear_encoding_widgets()

Am I missing something?

Thanks again for everything!!

SergeantChris · 2025-02-07T11:56:58Z

You're right, I missed one case. Sorry about that! Can you check if the notebook runs now?

SergeantChris added 4 commits January 27, 2025 18:22

Support veRSA, saving regression model and PCA transform, and a mem-s…

c0ecd2c

…ave mode for PCA; Avoid recomputing PCA for different ROIs.

Merge branch 'refs/heads/small_fixes' into support_veRSA

4388944

Merge branch 'refs/heads/small_fixes' into support_veRSA

26b1e20

Fix small bug

e5135b4

SergeantChris added 4 commits February 3, 2025 10:59

Minor fix - remove redundant join

6d6bbe3

Merge branch 'refs/heads/small_fixes' into support_veRSA

8aea2a3

Fix save_path=None cases

d735d11

Update documentation

e33a732

SergeantChris changed the base branch from small_fixes to development February 4, 2025 15:17

Fix failing case - performance+notsaving

e79bf55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support veRSA #92

Support veRSA #92

SergeantChris commented Jan 28, 2025 •

edited

Loading

ToastyDom commented Feb 1, 2025

review-notebook-app bot commented Feb 4, 2025

SergeantChris commented Feb 4, 2025 •

edited

Loading

ToastyDom commented Feb 5, 2025

SergeantChris commented Feb 7, 2025

Support veRSA #92

Are you sure you want to change the base?

Support veRSA #92

Conversation

SergeantChris commented Jan 28, 2025 • edited Loading

ToastyDom commented Feb 1, 2025

review-notebook-app bot commented Feb 4, 2025

SergeantChris commented Feb 4, 2025 • edited Loading

ToastyDom commented Feb 5, 2025

SergeantChris commented Feb 7, 2025

SergeantChris commented Jan 28, 2025 •

edited

Loading

SergeantChris commented Feb 4, 2025 •

edited

Loading