Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SoudenMVDR module #2367

Closed
wants to merge 5 commits into from
Closed

Add SoudenMVDR module #2367

wants to merge 5 commits into from

Conversation

nateanl
Copy link
Member

@nateanl nateanl commented May 6, 2022

Add a new design of MVDR module.
The SoudenMVDR module supports the method proposed by Souden et, al..
The input arguments are:

  • multi-channel spectrum.
  • PSD matrix of target speech.
  • PSD matrix of noise.
  • reference channel in the microphone array.
  • diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
  • diag_eps for computing the inverse of the matrix.
  • eps for computing the beamforming weight.

The output of the module is the single-channel complex-valued spectrum for the enhanced speech.

@facebook-github-bot
Copy link
Contributor

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
specgram (Tensor): Multi-channel complex-valued spectrum.
Tensor of dimension `(..., channel, freq, time)`
psd_s (Tensor): The complex-valued power spectral density (PSD) matrix of target speech.
Tensor of dimension `(..., freq, channel, channel)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a check somewhere that enforces equality between the last two dimensions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Adding such check helps users better understand the module usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is applicable to F.rtf_power, F.rtf_evd, F.mvdr_weights_rtf, F.mvdr_weights_souden. It's better to add it in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure sounds good

torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms/_transforms.py Outdated Show resolved Hide resolved
torchaudio/transforms/_transforms.py Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@hwangjeff hwangjeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg — just some small things

Given the multi-channel complex-valued spectrum :math:`\textbf{Y}`, the power spectral density (PSD) matrix
of target speech :math:`\bf{\Phi}_{\textbf{SS}}`, the PSD matrix of noise :math:`\bf{\Phi}_{\textbf{NN}}`, and
a one-hot vector that represents the reference channel :math:`\bf{u}`, the module computes the single-channel
complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as:
complex-valued spectrum of the enhanced speech :math:`\hat{\textbf{S}}`. The formula is defined as:

.. math::
\textbf{w}_{\text{MVDR}}(f) =
\frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bf{\Phi}_{\textbf{SS}}}}(f)}
{\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "tr" more standard?

Suggested change
{\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}
{\text{tr}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw different usages in different publications, trace in https://www.merl.com/publications/docs/TR2016-072.pdf, Trace in https://arxiv.org/pdf/2005.10479.pdf, and Tr in https://ieeexplore.ieee.org/abstract/document/7952756.
For a better understanding we can put Trace here.

@facebook-github-bot
Copy link
Contributor

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants