Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add psd and mvdr methods to functional & Refactor PSD and MVDR module in transforms #2181

Closed
wants to merge 1 commit into from

Conversation

nateanl
Copy link
Member

@nateanl nateanl commented Jan 27, 2022

  • Provide following methods in torchaudio.functional:
    compute_power_spectral_density_matrix
    compute_mvdr_weights_souden
    compute_mvdr_weights_rtf
    compute_rtf_evd
    compute_rtf_power
    apply_beamforming
  • Add doc strings for the methods above.
  • Refactor the current torchaudio.transforms.PSD and torchaudio.transforms.MVDR by using the added methods.

@nateanl nateanl force-pushed the refactor_mvdr branch 2 times, most recently from 81c0870 to 4a65e2e Compare February 9, 2022 16:52
@nateanl nateanl changed the title [WIP] Add psd and mvdr methods to functional & Refactor PSD and MVDR module in transforms Add psd and mvdr methods to functional & Refactor PSD and MVDR module in transforms Feb 9, 2022
@nateanl nateanl marked this pull request as ready for review February 9, 2022 16:53
@nateanl
Copy link
Member Author

nateanl commented Feb 9, 2022

@Emrys365 @popcornell @boeddeker Would you like to review the algorithms in the methods? Thanks!

Copy link

@boeddeker boeddeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to the core code. The code feels somehow familiar, it is not too far away from our.

torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/functional/functional.py Outdated Show resolved Hide resolved
torchaudio/transforms.py Outdated Show resolved Hide resolved
Copy link

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments

torchaudio/transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms.py Outdated Show resolved Hide resolved
torch.random.manual_seed(2434)
channel = 4
psd_speech = torch.rand(64, channel, channel, dtype=torch.cfloat)
psd_noise = torch.rand(64, channel, channel, dtype=torch.cfloat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autograd tests are one of the most time-consuming ones. I think batch size 3 is suffice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is this frame length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 is the frequency dimension. We can reduce it to 5 or 10.

self.assert_grad(F.compute_mvdr_weights_rtf, (rtf, psd_noise, 0))

# The eigenvector can be different in different runs, expected to fail
@expectedFailure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading https://pytorch.org/docs/stable/generated/torch.linalg.eigh.html, there seems no way to control the randomness. Since this is expected to fail, what value does this test provides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as a flag showing eigenvalue decomposition itself can have different results on the same matrix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can normalize the rtf vector (rtf / rtf[0].conj()), then it should work.
If it still does not work, the eigenvalue gap should be larger, i.e. the difference between the largest and second-largest eigenvalue, e.g.:

import numpy as np
channel = 4
samples = 100  # samples >> channel
data = np.random.randn(channel, samples)
cov = data @ data.T.conj()  # Hermetian matrix
eigenvalues, eigenvectors = np.linalg.eigh(cov)
eigenvalues[...] = np.random.uniform(0, 1, size=eigenvalues.shape)  # Change the eigenvalues to be between 0 and 1
eigenvalues[0] = 2  # Set first eigenvalue to be the largest
cov = np.einsum('ij,j,kj->ik', eigenvectors, eigenvalues, eigenvectors.conj())  # Revert the eigh decomposition
print(np.linalg.eigh(cov))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. probably the gradient for eig and eigh was derived with the constraint, that you have no gradient in the direction of the length or a phase change, i.e. the objective does not change, when you replace an eigenvector with itself multiplied with any nonzero complex number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the returned eigenvalue can be different even when the eigenvectors are unique. One simple example is -eigenvalue and -eigenvector become another solution.
From PyTorch side, they seem to hard-coded check it and throw a runtime error in
https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/FunctionsManual.cpp#L2907-L2910

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boeddeker I verified that using eigenvector / eigenvector[..., -1:].conj() passed the gradcheck. Thanks! In my case the eigenvalues are in ascending order, hence the normalization should rely on the last dimension instead of 0th?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my mistake. I converted the psd matrix to real Tensor by mistake. It still fails the gradcheck now...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that using eigenvector / eigenvector[..., -1:].conj() passed the gradcheck.

Sorry, I copied my code example from above and renamed the variable.
The correct code is

...
assert rtf.ndim == 1, rtf.shape
reference_channel = 0
rtf = rtf / rtf[reference_channel].conj()
...

So I assumed to have a single eigenvector and normalize it by a reference channel. In this way, the eigenvectors are unique.

When I implemented the gradient check, I had issues with inputs like psd_speech.
It was necessary to include something like psd_speech = 0.5 * (psd_speech + hermitian(psd_speech)) in the checked function, otherwise the changes in the input were not correctly reflected in the output, because eigh ignores values of the matrix (e.g. all values above the diagonal aren't used). In the user code, this is not necessary, because F.compute_power_spectral_density_matrix takes care of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should't be better to handle RTF normalization in F.compute_mvdr_weights_rtf with an additional argument ? Aside from testing it is handy to be able to phase-align the beamforming solution to a particular reference channel (e.g to compute losses which require sample-level alignment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, the rtf normalization seems is there already and is True by default.

@@ -644,6 +644,59 @@ def func(tensor):
tensor = torch.view_as_complex(torch.randn(2, 1025, 400, 2))
self._assert_consistency_complex(func, tensor)

def test_compute_power_spectral_density_matrix(self):
def func(tensor):
return F.compute_power_spectral_density_matrix(tensor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems no benefit of defining this func.

numerator = torch.linalg.solve(psd_n, psd_s) # psd_n.inv() @ psd_s
# ws: (..., C, C) / (...,) -> (..., C, C)
ws = numerator / (_compute_mat_trace(numerator)[..., None, None] + eps)
if isinstance(reference_channel, int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would swapping isinstance with torch.jit.isinstance help support TorchScript?

@mthrok mthrok added this to the v0.11 milestone Feb 15, 2022
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nateanl Once the technical discussion with experts are settled, can you split this PR into smaller PRs, each contain one addition of new function, and another PR that changes the transforms?
It's hard to see which function supports which features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants