[ENH] Adjusted SSD algorithm to support non-full rank data #11458

tsbinns · 2023-02-07T20:50:02Z

Reference issue

Spatio-spectral decomposition (SSD) implemented in mne.decoding.SSD can fail to fit data that is not full rank, even when specifying an appropriate number of components to reduce the dimensionality of the data to using the rank argument. This is due to the dimensionality reduction performed when computing covariance producing matrices which are not positive definite, causing the subsequent eigenvalue decomposition to fail with a LinAlgError:

What does this implement/fix?

This change adds a new private function for performing the dimensionality reduction after the covariance matrices have been computed (mne.decoding.ssd._dimensionality_reduction(); called by the fit() method of mne.decoding.SSD), which allows SSD to fit both full and non-full rank data. The rank argument of SSD maintains the same behaviour as before, however it is now used in _dimensionality_reduction() rather than passed when computing covariance.

The updated algorithm is based on that implemented and used by the original authors of the method (see e.g. https://github.com/svendaehne/matlab_SSD/blob/master/ssd.m), and compared to the current algorithm, the updated approach is more similar to that employed in the literature for related dimensionality reduction and reconstruction problems (see e.g. Ewald et al., 2012, NeuroImage).

Alongside the updated algorithm, a new test (test_non_full_rank_data()) has been added to the test suite (mne.decoding.tests.test_ssd) which checks that the new approach does in fact work for non-full rank data (the existing tests already show the approach continues to work for full rank data).

Additional information

No changes to the API or documentation have been made, and the ultimate function of SSD to enhance the contrast between a set of 'signal' and 'noise' frequencies has been preserved; the only change is that the updated algorithm is generalisable to wider forms of data. In line with the lack of change to the API, when using SSD with full rank data or rank="full", the results will be identical as the current algorithm's.

The argument could be made that this is an unecessary change, as the end user could instead first perform their own dimensionality reduction on non-full rank data and then fit SSD to this data, however: 1) this adds an additional burden to the end user; and 2) this would require end users interested in the spatial topographies of the SSD components (SSD.patterns_) to transform the patterns themselves from the dimensionality-reduced space, to their original channel space. Ultimately, those users less familiar with such methods may be disuaded from using this SSD implementation, and may feel forced to rely on e.g. the MATLAB implementation.

This issue is not unique to non-full rank data (e.g. the same problem can arise when full rank data is used but a rank less than the true rank of the data is specified, such as EEG data with rank 30 and rank={'eeg':25}, raising the same LinAlgError), however this is less of an issue, as SSD can be computed on the full rank data and then the desired number of components taken. On the other hand, for non-full rank data, there is no possibility of using SSD without prior manipulations of the data.

I am of course happy to discuss the suggested changes further!
Thomas

welcome · 2023-02-07T20:50:05Z

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

agramfort · 2023-02-08T13:44:19Z

mne/decoding/ssd.py

 # License: BSD-3-Clause

 import numpy as np
+import scipy as sp


Suggested change

import scipy as sp

from scipy import linalg

and then use linalg.XXX below

Apologies, fixed now!

Hi @agramfort, it seems from scipy import linalg causes test_import_nesting.py to fail. How should I handle this?

nest the import inside the function that needs linalg like

def whatever(...): from scipy import linalg ... u, s, v = linalg.svd(...)

agramfort · 2023-02-08T13:48:21Z

@vpeterson do you have time to look? 🙏

thanks @tsbinns

agramfort · 2023-02-09T08:47:09Z

@tsbinns can you touch this file ssd_spatial_filters.py in the doc so we see it this changes any result visually?

vpeterson · 2023-02-09T11:01:16Z

@vpeterson do you have time to look? 🙏

thanks @tsbinns
I will give it a look as soon as I can. Thank you @tsbinns for raising this up. Will check the changes and let you know

tsbinns · 2023-02-09T14:04:08Z

@tsbinns can you touch this file ssd_spatial_filters.py in the doc so we see it this changes any result visually?

@agramfort @vpeterson here is a look at the differences between the old and new implementations, using the example data of ssd_spatial_filters.py:

PSD

No differences are seen for the power spectra. This is expected, as the data used here is full rank, so no dimensionality reduction is performed (i.e. it is the same process as the existing implementation, just with some extra multiplications of the covariance and spatial filters by the identity matrix in place of any dimensionality reduction transformations).

Topographies

Likewise, the spatial patterns of the filters are also identical. I have seen three ways of computing the spatial patterns: inverting the spatial filters, W (the method currently used for SSD), multiplying the noise covariance, C_n, with W, or the matrix division/solving of the covariance and spatial filters (given in the MATLAB code, and which I have put in the new implementation). These approaches give identical results for both full rank data and non-full rank data where W is no longer a square matrix (non-full rank data topographies shown below).

As you may notice, the patterns for the second component onward look to be an inverted form of that when no dimensionality reduction is performed. Although the full and non-full rank data is different, I simulated the non-full rank data by changing the signal of only a single channel from the full rank data, which I would not expect to cause such a shift in the topographies. Fig. 5 of Haufe et al. (2014) suggests that large negative pattern values also indicate strong source activity, so perhaps the absolute value is considered more informative for source strength than the sign. In any case, the spatial filters for these components still produce high SNRs in the power spectra, which is also broadly similar to before:

Thinking on it, I would anticipate the matrix mutiplication approach to be the most computationally efficient option for recovering spatial patterns, however if you would prefer to stick with the filter inversion approach as is currently employed, you can see the results are equivalent.

Spectral ratios

Finally, you can see that the spectral ratios are identical across implementations.

agramfort · 2023-02-11T10:19:18Z

thx @tsbinns for the detailed report.

+1 for MRG when green.

tsbinns · 2023-02-13T13:09:46Z

Thanks for the approval @agramfort!

I just reverted back to the original method for recovering patterns from filters by taking inv(filters) since this is the method stated in your example file, and as I show above, the results are equivalent (should also be faster than solving for the filters).

Unfortunately I am really struggling to understand why the make html_dev-noplot is failing. This error code make: *** [Makefile:70: html_dev-noplot] Error 1 of the output is really obscure to me, and I am not sure how to debug this. Is it something anyone is familiar with? Apologies for bothering you with such a trivial matter.

agramfort · 2023-02-13T13:36:57Z

should be ok now @tsbinns

welcome · 2023-02-13T14:55:35Z

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

agramfort · 2023-02-13T14:55:38Z

thx @tsbinns

vpeterson

I leave some comments on the proposed implementation.

vpeterson · 2023-02-13T14:16:18Z

mne/decoding/ssd.py

+            rank="full", info=self.info)
+
+        cov_signal_red, cov_noise_red, dim_red = (_dimensionality_reduction(
+            cov_signal, cov_noise, self.info, self.rank))


is there a need to change the name of the cov matrices?

Good point. Before when I was solving to find the spatial patterns this required the original covariance matrices, however this is no longer the case since I reverted to taking the inverse of the spatial filters.
This could be worth changing.

vpeterson · 2023-02-13T15:23:04Z

mne/decoding/ssd.py

+        dim_red = np.matmul(
+            eigvects[:, :rank], np.eye(rank) * (eigvals[:rank]**-0.5))
+        logger.info(
+            "Reducing covariance rank from %i -> %i" % (n_channels, rank,))


The rank of the covariance is not reduced, that is given by your matrix itself. What it is being done here is more a subspace projection. I would recommend changing the info msg.

I am not sure I understand entirely. Applying dim_red to the covariance matrices later in the function does reduce their rank, right?

mne-python/mne/decoding/ssd.py

Lines 335 to 336 in ae7ecf8

cov_signal_red = np.matmul(dim_red.T, np.matmul(cov_signal, dim_red))

cov_noise_red = np.matmul(dim_red.T, np.matmul(cov_noise, dim_red))

well, you are estimating the rank of the two matrices and taking the smallest value (which BTW, is there any practical case in which the rank(cov_signal) != rank(cov_noise) ? )

So, personally, I don't see this step as a dimensionality reduction of "the rank" (as you have stated in the log message) but a projection to the rank subspace of the data. In these lines, you are in fact making a projection. so dim_reg is a projection matrix to the rank subspace.

BTW, is there any practical case in which the rank(cov_signal) != rank(cov_noise) ?

I could not think of anything, but maybe there is some super strange edge case where the ranks are not equal. Since np.min of two values is so computationally cheap, I thought it could be better safe than sorry to check.

So, personally, I don't see this step as a dimensionality reduction of "the rank" (as you have stated in the log message) but a projection to the rank subspace of the data. In these lines, you are in fact making a projection. so dim_reg is a projection matrix to the rank subspace.

Okay, I think I understand. So something like "Projecting covariance of X channels to Y rank subspace" would be more appropriate?

vpeterson · 2023-02-13T15:24:59Z

mne/decoding/ssd.py

+
+    cov_signal_red = np.matmul(dim_red.T, np.matmul(cov_signal, dim_red))
+    cov_noise_red = np.matmul(dim_red.T, np.matmul(cov_noise, dim_red))
+    return cov_signal_red, cov_noise_red, dim_red


consider changing the name of the projection matrix from dim_red to proj_rank ? or something more accurate.

Sure, it is a bit obtuse; perhaps I could change this as well.

vpeterson · 2023-02-13T15:31:55Z

mne/decoding/ssd.py

@@ -172,17 +177,22 @@ def fit(self, X, y=None):

        cov_signal = _regularized_covariance(
            X_signal, reg=self.reg, method_params=self.cov_method_params,
-            rank=self.rank, info=self.info)
+            rank="full", info=self.info)


if rank will always be "full" then it should be removed from the Parameters of the SSD class.

Here rank is specified to be "full" to ensure the transformations of the covariance matrices are not performed when they are being computed, but rather after the fact in _dimensionality_reduction, to which rank is passed, so it is still in use.

vpeterson · 2023-02-13T15:41:18Z

It seems I was too slow. I have added some comments on the ssd.py class changes, maybe they are still helpful @tsbinns

As a general comment, although I know that the projection to the rank subspace is utilized for estimating the covariance matrices in the literature, I am worried about how this implementation goes along with other existing spatial filters methods based on eigenvalue decomposition of cov matrices, such as CSP or SPoC. We made the estimation of the covariance matrices following other decoding implementations, which consider low-rank data. Should not the _regularized_covariance function also work for non-full rank data? @agramfort Maybe I am wrong and misunderstanding how the implementation of _compute_covariance_auto works.

tsbinns · 2023-02-13T16:10:22Z

Hi @vpeterson, yes, they are still helpful! I will have a look through them.

As for the compatibility with non-full rank data, I kept encountering the issue that _compute_covariance_auto would reduce the rank of the covariance matrix to an appropriate value, however the matrix would not be positive definite, and so trying to perform the generalised eigendecomposition on the covariance matrices would fail. The method Vadim Nikulin uses (which I added here) ensures that the covariance matrices will have an appropriate rank, whilst remaining positive definite.

Just to reiterate, as I mention in the first post, the behaviour of _compute_covariance_auto producing non-positive definite matrices is not a non-full rank data specific issue, but more a product of the matrix transformation method used, e.g. you can see the same issue if you specify an inappropriate rank for full rank data:

This issue is not unique to non-full rank data (e.g. the same problem can arise when full rank data is used but a rank less than the true rank of the data is specified, such as EEG data with rank 30 and rank={'eeg':25}, raising the same LinAlgError), however this is less of an issue, as SSD can be computed on the full rank data and then the desired number of components taken. On the other hand, for non-full rank data, there is no possibility of using SSD without prior manipulations of the data.

tsbinns · 2023-02-13T16:49:50Z

Hi @agramfort,
@vpeterson raised good points for some internal variable renaming. I had also planned to make a separate PR for some minor changes to the documentation, e.g.:

setting sort_by_spectral_ratio : bool (default False) to sort_by_spectral_ratio : bool (default True).

mne-python/mne/decoding/ssd.py

Lines 86 to 90 in ae7ecf8

    
           def __init__(self, info, filt_params_signal, filt_params_noise, 
        
                        reg=None, n_components=None, picks=None, 
        
                        sort_by_spectral_ratio=True, return_filtered=False, 
        
                        n_fft=None, cov_method_params=None, rank=None): 
        
               """Initialize instance."""

rewording this to read "If n_components is None, the number of components equal to the rank of the data are returned.", since I felt using the term dimensionality reduction here is now somewhat confusing, given that the rank argument is now tied to the available number of components.

mne-python/mne/decoding/ssd.py

Lines 46 to 48 in ae7ecf8

    
               n_components : int | None (default None) 
        
                   The number of components to extract from the signal. 
        
                   If n_components is None, no dimensionality reduction is applied.

similarly, making it more explicit that the rank argument determines the degree of dimensionality reduction performed.

mne-python/mne/decoding/ssd.py

Lines 67 to 72 in ae7ecf8

    
               rank : None | dict | ‘info’ | ‘full’ 
        
                   As in :class:`mne.decoding.SPoC` 
        
                   This controls the rank computation that can be read from the 
        
                   measurement info or estimated from the data. 
        
                   See Notes of :func:`mne.compute_rank` for details. 
        
                   We recommend to use 'full' when working with epoched data.

@agramfort if you agree with these or Victoria's suggested changes, I could open a new PR for the minor changes.

agramfort · 2023-02-13T20:35:06Z

+1 for new PR sorry I merged too quickly

…

Message ID: ***@***.***>

* upstream/main: (46 commits) Fix docstrings by replacing str with path-like and fix double backticks for formatting (mne-tools#11499) Use pathlib.Path instead of os.path to handle files and folders [circle deploy] (mne-tools#11473) MAINT: Fix Circle [circle deploy] (mne-tools#11497) MAINT: Use mamba in CIs (mne-tools#11471) Updating documentation to clarify full vs half-bandwidth and defaults in time_frequency.multitaper.py (mne-tools#11479) Fix typo in tutorial (mne-tools#11498) Typo fix and added colons before code (mne-tools#11496) [MRG] ENH/DOC: demo custom spectrum creation (mne-tools#11493) Accept only left-clicks for adding annotations (mne-tools#11491) [BUG, MRG] Fix pial surface loading, logging in locate_ieeg (mne-tools#11489) [ENH] Added unit_role to add non-breaking space between magnitude and units (mne-tools#11469) MAINT: Fix CircleCI build (mne-tools#11488) [DOC] Updated decoding.SSD documentation and internal variable naming (mne-tools#11475) Typo fix (mne-tools#11485) [MRG] Forward argument axes from plot_sensors to DigMontage.plot (mne-tools#11470) [MRG] Improve error message raised on channels missing from DigMontage (mne-tools#11472) MAINT: Deal with pkg_resources usage bugs (mne-tools#11478) Add object array support and docstring (mne-tools#11465) [ENH] Adjusted SSD algorithm to support non-full rank data (mne-tools#11458) [BUG] fix nibabel reference (mne-tools#11467) ...

updated SSD algorithm and added corresponding test

40049f4

fixed importing and style errors

bb13fb7

agramfort reviewed Feb 8, 2023

View reviewed changes

tsbinns added 3 commits February 8, 2023 15:01

fixed imports style

ffc661f

Merge branch 'main' into v1.0

0d7b0a5

fixed broken scipy import

d5ef90f

Merge branch 'main' into v1.0

f030788

agramfort approved these changes Feb 11, 2023

View reviewed changes

tsbinns added 4 commits February 13, 2023 13:30

added author and noted changes

64df39b

removed redundant code

f977f29

switch to generating patterns from inv(filters)

9898e05

Merge branch 'main' into v1.0

ddb025c

fix rst latest.inc

59c689d

agramfort merged commit ae7ecf8 into mne-tools:main Feb 13, 2023

vpeterson reviewed Feb 13, 2023

View reviewed changes

tsbinns mentioned this pull request Feb 14, 2023

[DOC] Updated decoding.SSD documentation and internal variable naming #11474

Closed

tsbinns mentioned this pull request Feb 14, 2023

[DOC] Updated decoding.SSD documentation and internal variable naming #11475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Adjusted SSD algorithm to support non-full rank data #11458

[ENH] Adjusted SSD algorithm to support non-full rank data #11458

tsbinns commented Feb 7, 2023

welcome bot commented Feb 7, 2023

agramfort Feb 8, 2023

tsbinns Feb 8, 2023

tsbinns Feb 8, 2023

larsoner Feb 8, 2023

agramfort commented Feb 8, 2023

agramfort commented Feb 9, 2023

vpeterson commented Feb 9, 2023

tsbinns commented Feb 9, 2023 •

edited

Loading

agramfort commented Feb 11, 2023

tsbinns commented Feb 13, 2023

agramfort commented Feb 13, 2023

welcome bot commented Feb 13, 2023

agramfort commented Feb 13, 2023

vpeterson left a comment

vpeterson Feb 13, 2023

tsbinns Feb 13, 2023

vpeterson Feb 13, 2023

tsbinns Feb 13, 2023

vpeterson Feb 13, 2023

tsbinns Feb 13, 2023

vpeterson Feb 13, 2023

tsbinns Feb 13, 2023 •

edited

Loading

vpeterson Feb 13, 2023

tsbinns Feb 13, 2023

vpeterson commented Feb 13, 2023

tsbinns commented Feb 13, 2023

tsbinns commented Feb 13, 2023

agramfort commented Feb 13, 2023 via email

	cov_signal_red = np.matmul(dim_red.T, np.matmul(cov_signal, dim_red))
	cov_noise_red = np.matmul(dim_red.T, np.matmul(cov_noise, dim_red))

[ENH] Adjusted SSD algorithm to support non-full rank data #11458

[ENH] Adjusted SSD algorithm to support non-full rank data #11458

Conversation

tsbinns commented Feb 7, 2023

Reference issue

What does this implement/fix?

Additional information

welcome bot commented Feb 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agramfort commented Feb 8, 2023

agramfort commented Feb 9, 2023

vpeterson commented Feb 9, 2023

tsbinns commented Feb 9, 2023 • edited Loading

agramfort commented Feb 11, 2023

tsbinns commented Feb 13, 2023

agramfort commented Feb 13, 2023

welcome bot commented Feb 13, 2023

agramfort commented Feb 13, 2023

vpeterson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsbinns Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vpeterson commented Feb 13, 2023

tsbinns commented Feb 13, 2023

tsbinns commented Feb 13, 2023

agramfort commented Feb 13, 2023 via email

tsbinns commented Feb 9, 2023 •

edited

Loading

tsbinns Feb 13, 2023 •

edited

Loading