-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add psd and mvdr methods to functional & Refactor PSD and MVDR module in transforms #2181
Conversation
81c0870
to
4a65e2e
Compare
@Emrys365 @popcornell @boeddeker Would you like to review the algorithms in the methods? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to the core code. The code feels somehow familiar, it is not too far away from our.
07bc845
to
bad0a08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments
4232058
to
4c1a539
Compare
torch.random.manual_seed(2434) | ||
channel = 4 | ||
psd_speech = torch.rand(64, channel, channel, dtype=torch.cfloat) | ||
psd_noise = torch.rand(64, channel, channel, dtype=torch.cfloat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Autograd tests are one of the most time-consuming ones. I think batch size 3 is suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is this frame length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 is the frequency dimension. We can reduce it to 5 or 10.
self.assert_grad(F.compute_mvdr_weights_rtf, (rtf, psd_noise, 0)) | ||
|
||
# The eigenvector can be different in different runs, expected to fail | ||
@expectedFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading https://pytorch.org/docs/stable/generated/torch.linalg.eigh.html, there seems no way to control the randomness. Since this is expected to fail, what value does this test provides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works as a flag showing eigenvalue decomposition itself can have different results on the same matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can normalize the rtf vector (rtf / rtf[0].conj()
), then it should work.
If it still does not work, the eigenvalue gap should be larger, i.e. the difference between the largest and second-largest eigenvalue, e.g.:
import numpy as np
channel = 4
samples = 100 # samples >> channel
data = np.random.randn(channel, samples)
cov = data @ data.T.conj() # Hermetian matrix
eigenvalues, eigenvectors = np.linalg.eigh(cov)
eigenvalues[...] = np.random.uniform(0, 1, size=eigenvalues.shape) # Change the eigenvalues to be between 0 and 1
eigenvalues[0] = 2 # Set first eigenvalue to be the largest
cov = np.einsum('ij,j,kj->ik', eigenvectors, eigenvalues, eigenvectors.conj()) # Revert the eigh decomposition
print(np.linalg.eigh(cov))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. probably the gradient for eig and eigh was derived with the constraint, that you have no gradient in the direction of the length or a phase change, i.e. the objective does not change, when you replace an eigenvector with itself multiplied with any nonzero complex number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the returned eigenvalue can be different even when the eigenvectors are unique. One simple example is -eigenvalue
and -eigenvector
become another solution.
From PyTorch side, they seem to hard-coded check it and throw a runtime error in
https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/FunctionsManual.cpp#L2907-L2910
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boeddeker I verified that using eigenvector / eigenvector[..., -1:].conj()
passed the gradcheck. Thanks! In my case the eigenvalues are in ascending order, hence the normalization should rely on the last dimension instead of 0th?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my mistake. I converted the psd matrix to real Tensor by mistake. It still fails the gradcheck now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that using eigenvector / eigenvector[..., -1:].conj() passed the gradcheck.
Sorry, I copied my code example from above and renamed the variable.
The correct code is
...
assert rtf.ndim == 1, rtf.shape
reference_channel = 0
rtf = rtf / rtf[reference_channel].conj()
...
So I assumed to have a single eigenvector and normalize it by a reference channel. In this way, the eigenvectors are unique.
When I implemented the gradient check, I had issues with inputs like psd_speech
.
It was necessary to include something like psd_speech = 0.5 * (psd_speech + hermitian(psd_speech))
in the checked function, otherwise the changes in the input were not correctly reflected in the output, because eigh ignores values of the matrix (e.g. all values above the diagonal aren't used). In the user code, this is not necessary, because F.compute_power_spectral_density_matrix
takes care of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should't be better to handle RTF normalization in F.compute_mvdr_weights_rtf
with an additional argument ? Aside from testing it is handy to be able to phase-align the beamforming solution to a particular reference channel (e.g to compute losses which require sample-level alignment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, the rtf normalization seems is there already and is True by default.
@@ -644,6 +644,59 @@ def func(tensor): | |||
tensor = torch.view_as_complex(torch.randn(2, 1025, 400, 2)) | |||
self._assert_consistency_complex(func, tensor) | |||
|
|||
def test_compute_power_spectral_density_matrix(self): | |||
def func(tensor): | |||
return F.compute_power_spectral_density_matrix(tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems no benefit of defining this func
.
torchaudio/functional/functional.py
Outdated
numerator = torch.linalg.solve(psd_n, psd_s) # psd_n.inv() @ psd_s | ||
# ws: (..., C, C) / (...,) -> (..., C, C) | ||
ws = numerator / (_compute_mat_trace(numerator)[..., None, None] + eps) | ||
if isinstance(reference_channel, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would swapping isinstance
with torch.jit.isinstance
help support TorchScript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nateanl Once the technical discussion with experts are settled, can you split this PR into smaller PRs, each contain one addition of new function, and another PR that changes the transforms?
It's hard to see which function supports which features.
torchaudio.functional
:compute_power_spectral_density_matrix
compute_mvdr_weights_souden
compute_mvdr_weights_rtf
compute_rtf_evd
compute_rtf_power
apply_beamforming
torchaudio.transforms.PSD
andtorchaudio.transforms.MVDR
by using the added methods.