Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add audio doc #5299

Merged
merged 12 commits into from
Oct 20, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/api/paddle/audio/Overview_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
.. _cn_overview_callbacks:

paddle.audio
---------------------

paddle.audio 目录是飞桨在语音领域的高层 API。具体如下:

- :ref:`音频特征相关 API <about_features>`
- :ref:`音频处理基础函数相关 API <about_functional>`

.. _about_features:

音频特征相关 API
::::::::::::::::::::

.. csv-table::
:header: "API 名称", "API 功能"
:widths: 10, 30

" :ref:`LogMelSpectrogram<cn_api_paddle_audio_layers_LogMelSpectrogram>` ", "计算语音特征LogMelSpectrogram"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的几个超链都失效了哈~
image

" :ref:`MelSpectrogram <cn_api_paddle_audio_layers_MelSpectrogram>` ", "计算语音特征MelSpectrogram"
" :ref:`MFCC <cn_api_audio_layers_MFCC` ", "计算语音特征MFCC"
" :ref:`Spectrogram <cn_api_audio_layers_Spectrogram>` ", "计算语音特征Spectrogram"

.. _about_functional:

音频处理基础函数相关 API
::::::::::::::::::::

.. csv-table::
:header: "API 名称", "API 功能"
:widths: 10, 30

" :ref:`compute_fbank_matrix <cn_api_audio_functional_compute_fbank_matrix>` ", "计算fbank矩阵"
" :ref:`create_dct <cn_api_audio_functional_create_dct>` ", "计算离散余弦变化矩阵"
" :ref:`fft_frequencies <cn_api_audio_functional_fft_frequencies>` ", "计算离散傅里叶采样频率"
" :ref:`hz_to_mel<cn_api_audio_functional_hz_to_mel>` ", "转换hz频率为mel频率"
" :ref:`mel_to_hz<cn_api_audio_functional_mel_to_hz>` ", "转换mel频率为hz频率"
" :ref:`mel_frequencies<cn_api_audio_functional_mel_frequencies>` ", "计算mel频率"
" :ref:`power_to_db<cn_api_audio_functional_power_to_db>` ", "转换能量谱为分贝"
" :ref:`get_window<cn_api_audio_window_get_window` ", "得到各种窗函数"
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved

40 changes: 40 additions & 0 deletions docs/api/paddle/audio/features/LogMelSpectrogram_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _cn_api_audio_features_Spectrogram:

LogMelSpectrogram
-------------------------------

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogMelSpectrogram 有这么多参数,需要写明,以及源代码。
是class类的话,应该参考这么写:
image

总之,需要齐全。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

部分是没有默认参数的,有默认参数已经添加,源代码链接不知道是什么回事。

.. py:class::paddle.audio.features.LogMelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32')

计算给定信号的log-mel谱.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

老师,可以补充一个 “计算公式+公式参数说明: 不? 用户或读不懂怎么计算的。
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议不要加了,这是信号处理常用特征,直接看源码,比公式更加直接。

参数
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::

- **sr** (int, optional) - 采样率,默认22050.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512.
- **hop_length** (Options[int]) - 帧移,默认512.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **window** (str) - 窗函数名,默认'hann'.
- **power** (float) - 幅度谱的指数.
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则t以t*hop_length开始.
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'.
- **n_mels** (int) - mel bins的数目.
- **f_min** (float, optional) - 最小频率(hz),默认50.0.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **f_max** (float, optional) - 最大频率(hz),默认为None.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **htk** (bool, optional) - 在计算fbank矩阵时是否用在HTK公式缩放.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **norm** (Union[str, float], optional) - 计算fbank矩阵时正则化的种类,默认是'slaney',你也可以norm=0.5,使用p-norm正则化.
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **ref_value** (float) - 参照值,如果小于1.0, 信号的db会被提升,相反db会下降,默认值为1.0.
- **amin** (float) - 输入的幅值的最小值.
- **top_db** (Optional[float]) - log-mel谱的最大值(db).
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'.


返回
:::::::::

计算``LogMelSpectrogram``的可调用对象.

代码示例
:::::::::

COPY-FROM: paddle.audio.features.LogMelSpectrogram
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
40 changes: 40 additions & 0 deletions docs/api/paddle/audio/features/MFCC_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _cn_api_audio_features_MFCC:

MFCC
-------------------------------

.. py:class::paddle.audio.features.MFCC(sr=22050, n_mfcc=40, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样的问题,补充:
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有公式,请补充公式。方便用户理解这个方法

计算给定信号的MFCC.

参数
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::

- **sr** (int, optional) - 采样率,默认22050.
- **n_mfcc** (int, optional) - mfcc的维度,默认40.
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512.
- **hop_length** (Options[int]) - 帧移,默认512.
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None.
- **window** (str) - 窗函数名,默认'hann'.
- **power** (float) - 幅度谱的指数.
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'.
- **n_mels** (int) - mel bins的数目.
- **f_min** (float, optional) - 最小频率(hz),默认50.0.
- **f_max** (float, optional) - 最大频率(hz),默认为None.
- **htk** (bool, optional) - 在计算fbank矩阵时是否用在HTK公式缩放.
- **norm** (Union[str, float], optional) - 计算fbank矩阵时正则化的种类,默认是'slaney',你也可以norm=0.5,使用p-norm正则化.
- **ref_value** (float) - 参照值,如果小于1.0, 信号的db会被提升,相反db会下降,默认值为1.0.
- **amin** (float) - 输入的幅值的最小值.
- **top_db** (Optional[float]) - log-mel谱的最大值(db).
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'.

返回
:::::::::

计算``MFCC``的可调用对象.

代码示例
:::::::::

COPY-FROM: paddle.audio.features.MFCC
SmileGoat marked this conversation as resolved.
Show resolved Hide resolved
37 changes: 37 additions & 0 deletions docs/api/paddle/audio/features/MelSpectrogram_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. _cn_api_audio_features_Spectrogram:

MelSpectrogram
-------------------------------

.. py:class::paddle.audio.features.MelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', dtype='float32')

求得给定信号的Mel谱.

参数
::::::::::::

- **sr** (int, optional) - 采样率,默认22050.
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512.
- **hop_length** (Options[int]) - 帧移,默认512.
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None.
- **window** (str) - 窗函数名,默认'hann'.
- **power** (float) - 幅度谱的指数.
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则t以t*hop_length开始.
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'.
- **n_mels** (int) - mel bins的数目.
- **f_min** (float, optional) - 最小频率(hz),默认50.0.
- **f_max** (float, optional) - 最大频率(hz),默认为None.
- **htk** (bool, optional) - 在计算fbank矩阵时是否用在HTK公式缩放.
- **norm** (Union[str, float], optional) -计算fbank矩阵时正则化的种类,默认是'slaney',你也可以norm=0.5,使用p-norm正则化.
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'.


返回
:::::::::

计算``MelSpectrogram``的可调用对象.

代码示例
:::::::::

COPY-FROM: paddle.audio.features.MelSpectrogram
31 changes: 31 additions & 0 deletions docs/api/paddle/audio/features/Spectrogram_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.. _cn_api_audio_features_Spectrogram:

Spectrogram
-------------------------------

.. py:class::paddle.audio.features.Spectrogram(n_fft=512, hop_length=512, win_length=None, window='hann', power=1.0, center=True, pad_mode='reflect', dtype='float32')

通过给定信号的短时傅里叶变换得到频谱.

参数
::::::::::::

- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512.
- **hop_length** (Options[int]) - 帧移,默认512.
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None.
- **window** (str) - 窗函数名,默认'hann'.
- **power** (float) - 幅度谱的指数.
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则t以t*hop_length开始.
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'.
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'.


返回
:::::::::

计算``Spectrogram``的可调用对象.

代码示例
:::::::::

COPY-FROM: paddle.audio.features.Spectrogram
30 changes: 30 additions & 0 deletions docs/api/paddle/audio/functional/compute_fbank_matrix_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. _cn_api_audio_functional_compute_fbank_matrix:

compute_fbank_matrix
-------------------------------

.. py:function::paddle.audio.functional.compute_fbank_matrix(sr, n_fft, n_mels=64, f_min=0.0, f_max=None, htk=False, nrom='slaney', dtype='float32')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functional 的写法:参考:

image

计算mel变换矩阵.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补个公式


参数
::::::::::::

- **sr** (int) - 采样率.
- **n_fft** (int) - fft bins的数目.
- **n_mels** (float) - mels bins的数目.
- **f_min** (float) - 最小频率(hz).
- **f_max** (Optional[float]) -最大频率(hz).
- **htk** (bool) -是否使用htk缩放.
- **norm** (Union[str, float]) -norm的类型,默认是'slaney'.
- **dtype** (str) - 返回矩阵的数据类型,默认'float32'.

返回
:::::::::

``paddle.Tensor``,Tensor shape (n_mels, n_fft//2 + 1).

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.compute_fbank_matrix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样是,示例代码块没找到。
image

26 changes: 26 additions & 0 deletions docs/api/paddle/audio/functional/create_dct_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
.. _cn_api_audio_functional_create_dct:

create_dct
-------------------------------

.. py:function::paddle.audio.functional.create_dct(n_mfcc, n_mels, norm='ortho', dtype='float32')

计算离散余弦变换矩阵.

参数
::::::::::::

- **n_mfcc** (float) - mel倒谱系数数目.
- **n_mels** (int) - mel的fliterbank数.
- **norm** (float) - 正则化类型,默认值是'ortho'.
- **dtype** (str) - 默认'float32'.

返回
:::::::::

``paddle.Tensor``,Tensor shape (n_mels, n_mfcc).

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.create_dct
25 changes: 25 additions & 0 deletions docs/api/paddle/audio/functional/fft_frequencies_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
.. _cn_api_audio_functional_fft_frequencies:

fft_frequencies
-------------------------------

.. py:function::paddle.audio.functional.fft_frequencies(sr, n_fft, dtype='float32')

计算fft频率.

参数
::::::::::::

- **sr** (int) - 采样率.
- **n_fft** (int) - fft bins的数目.
- **dtype** (str) - 默认'float32'.

返回
:::::::::

``paddle.Tensor``,Tensor shape (n_fft//2 + 1,).

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.fft_frequencies
24 changes: 24 additions & 0 deletions docs/api/paddle/audio/functional/hz_to_mel_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. _cn_api_audio_functional_hz_to_mel:

hz_to_mel
-------------------------------

.. py:function::paddle.audio.functional.hz_to_mel(feq, htk=False)

转换Hz为Mels.

参数
::::::::::::

- **freq** (Tensor, float) - 输入tensor.
- **htk** (bool) - 是否使用htk缩放,默认False.

返回
:::::::::

``paddle.Tensor或float``,mels值.

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.hz_to_mel
27 changes: 27 additions & 0 deletions docs/api/paddle/audio/functional/mel_frequencies_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. _cn_api_audio_functional_mel_frequencies:

mel_frequencies
-------------------------------

.. py:function::paddle.audio.functional.mel_frequencies(n_mels=64, f_min=0.0, f_max=11025, htk=False, dtype='float32')

计算Mels频率.

参数
::::::::::::

- **n_mels** (int) - 输入tensor,默认64.
- **f_min** (float) - 最小频率(hz), 默认0.0.
- **f_max** (float) - 最大频率(hz), 默认11025.0.
- **htk** (bool) - 是否使用htk缩放,默认False.
- **dtype** (str) - 默认'float32'.

返回
:::::::::

``paddle.Tensor``,Tensor shape (n_mels,).

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.mel_frequencies
24 changes: 24 additions & 0 deletions docs/api/paddle/audio/functional/mel_to_hz_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. _cn_api_audio_functional_mel_to_hz:

mel_to_hz
-------------------------------

.. py:function::paddle.audio.functional.mel_to_hz(feq, htk=False)

转换Mels为Hz.

参数
::::::::::::

- **mel** (Tensor, float) - 输入tensor.
- **htk** (bool) - 是否使用htk缩放,默认False.

返回
:::::::::

``paddle.Tensor或float``,hz为单位的频率.

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.mel_to_hz
26 changes: 26 additions & 0 deletions docs/api/paddle/audio/functional/power_to_db_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
.. _cn_api_audio_functional_power_to_db:

power_to_db
-------------------------------

.. py:function::paddle.audio.functional.power_to_db(spect, ref_value=1.0, amin=1e-10, top_db=80.0)

转换能量谱为分贝单位.

参数
::::::::::::

- **spect** (Tensor) - stft能量谱,输入tensor.
- **ref_value** (float) - 参照值,振幅相对于ref进行缩放,默认1.0.
- **amin** (float) - 最小阈值,默认1e-10.
- **top_db** (Optional[float]) - 阈值,默认80.0.

返回
:::::::::

``paddle.Tensor或float``,db单位的能量谱.

代码示例
:::::::::

COPY-FROM: paddle.audio.functional.power_to_db