Add SoudenMVDR module #2367

nateanl · 2022-05-06T10:31:41Z

Add a new design of MVDR module.
The SoudenMVDR module supports the method proposed by Souden et, al..
The input arguments are:

multi-channel spectrum.
PSD matrix of target speech.
PSD matrix of noise.
reference channel in the microphone array.
diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
diag_eps for computing the inverse of the matrix.
eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

facebook-github-bot · 2022-05-06T10:34:45Z

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

nateanl · 2022-05-06T11:08:21Z

Docs: https://output.circle-artifacts.com/output/job/ea1e9c6d-ddf0-4a92-8070-f03a67187951/artifacts/0/docs/transforms.html#soudenmvdr

torchaudio/transforms/_transforms.py

hwangjeff · 2022-05-06T18:34:27Z

torchaudio/transforms/_transforms.py

+            specgram (Tensor): Multi-channel complex-valued spectrum.
+                Tensor of dimension `(..., channel, freq, time)`
+            psd_s (Tensor): The complex-valued power spectral density (PSD) matrix of target speech.
+                Tensor of dimension `(..., freq, channel, channel)`


is there a check somewhere that enforces equality between the last two dimensions?

Good point. Adding such check helps users better understand the module usage.

This check is applicable to F.rtf_power, F.rtf_evd, F.mvdr_weights_rtf, F.mvdr_weights_souden. It's better to add it in a separate PR.

sure sounds good

torchaudio/transforms/_transforms.py

facebook-github-bot · 2022-05-06T21:32:08Z

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torchaudio/functional/functional.py

hwangjeff

lg — just some small things

hwangjeff · 2022-05-06T22:24:29Z

torchaudio/transforms/_transforms.py

+    Given the multi-channel complex-valued spectrum :math:`\textbf{Y}`, the power spectral density (PSD) matrix
+    of target speech :math:`\bf{\Phi}_{\textbf{SS}}`, the PSD matrix of noise :math:`\bf{\Phi}_{\textbf{NN}}`, and
+    a one-hot vector that represents the reference channel :math:`\bf{u}`, the module computes the single-channel
+    complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as:


Suggested change

complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as:

complex-valued spectrum of the enhanced speech :math:`\hat{\textbf{S}}`. The formula is defined as:

hwangjeff · 2022-05-06T22:27:50Z

torchaudio/transforms/_transforms.py

+    .. math::
+        \textbf{w}_{\text{MVDR}}(f) =
+        \frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bf{\Phi}_{\textbf{SS}}}}(f)}
+        {\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}


is "tr" more standard?

Suggested change

{\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}

{\text{tr}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}

I saw different usages in different publications, trace in https://www.merl.com/publications/docs/TR2016-072.pdf, Trace in https://arxiv.org/pdf/2005.10479.pdf, and Tr in https://ieeexplore.ieee.org/abstract/document/7952756.
For a better understanding we can put Trace here.

facebook-github-bot · 2022-05-07T08:40:03Z

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Add SoudenMVDR module

8f304ef

nateanl added new feature module: ops labels May 6, 2022

facebook-github-bot added the CLA Signed label May 6, 2022

nateanl added 2 commits May 6, 2022 15:14

fix test name

61e2a51

fix

ad231cd

nateanl mentioned this pull request May 6, 2022

[Migration] TorchAudio Beamforming Module Migration #2280

Closed

11 tasks

hwangjeff reviewed May 6, 2022

View reviewed changes

fix docstring

a29ba8e

nateanl force-pushed the mvdr_souden branch from 2fd2f37 to a29ba8e Compare May 6, 2022 21:11

carolineechen reviewed May 6, 2022

View reviewed changes

torchaudio/functional/functional.py Show resolved Hide resolved

hwangjeff approved these changes May 6, 2022

View reviewed changes

fix typo

82677e5

facebook-github-bot closed this in aed5eb8 May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SoudenMVDR module #2367

Add SoudenMVDR module #2367

nateanl commented May 6, 2022 •

edited

Loading

facebook-github-bot commented May 6, 2022

nateanl commented May 6, 2022

hwangjeff May 6, 2022

nateanl May 6, 2022

nateanl May 6, 2022

hwangjeff May 6, 2022

facebook-github-bot commented May 6, 2022

hwangjeff left a comment

hwangjeff May 6, 2022

hwangjeff May 6, 2022

nateanl May 7, 2022

facebook-github-bot commented May 7, 2022

	complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as:
	complex-valued spectrum of the enhanced speech :math:`\hat{\textbf{S}}`. The formula is defined as:

	{\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}
	{\text{tr}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}

Add SoudenMVDR module #2367

Add SoudenMVDR module #2367

Conversation

nateanl commented May 6, 2022 • edited Loading

facebook-github-bot commented May 6, 2022

nateanl commented May 6, 2022

hwangjeff May 6, 2022

Choose a reason for hiding this comment

nateanl May 6, 2022

Choose a reason for hiding this comment

nateanl May 6, 2022

Choose a reason for hiding this comment

hwangjeff May 6, 2022

Choose a reason for hiding this comment

facebook-github-bot commented May 6, 2022

hwangjeff left a comment

Choose a reason for hiding this comment

hwangjeff May 6, 2022

Choose a reason for hiding this comment

hwangjeff May 6, 2022

Choose a reason for hiding this comment

nateanl May 7, 2022

Choose a reason for hiding this comment

facebook-github-bot commented May 7, 2022

nateanl commented May 6, 2022 •

edited

Loading