-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add audio doc #5299
add audio doc #5299
Changes from 2 commits
9a75a1c
b296fb4
ee6cddd
4c7d598
ddb2e0d
1a18879
d710ee4
d869c6c
d56b1b0
b9d3488
fc6bde2
d3f8ee7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
.. _cn_overview_callbacks: | ||
|
||
paddle.audio | ||
--------------------- | ||
|
||
paddle.audio 目录是飞桨在语音领域的高层 API。具体如下: | ||
|
||
- :ref:`音频特征相关 API <about_features>` | ||
- :ref:`音频处理基础函数相关 API <about_functional>` | ||
|
||
.. _about_features: | ||
|
||
音频特征相关 API | ||
:::::::::::::::::::: | ||
|
||
.. csv-table:: | ||
:header: "API 名称", "API 功能" | ||
:widths: 10, 30 | ||
|
||
" :ref:`LogMelSpectrogram<cn_api_paddle_audio_layers_LogMelSpectrogram>` ", "计算语音特征LogMelSpectrogram" | ||
" :ref:`MelSpectrogram <cn_api_paddle_audio_layers_MelSpectrogram>` ", "计算语音特征MelSpectrogram" | ||
" :ref:`MFCC <cn_api_audio_layers_MFCC` ", "计算语音特征MFCC" | ||
" :ref:`Spectrogram <cn_api_audio_layers_Spectrogram>` ", "计算语音特征Spectrogram" | ||
|
||
.. _about_functional: | ||
|
||
音频处理基础函数相关 API | ||
:::::::::::::::::::: | ||
|
||
.. csv-table:: | ||
:header: "API 名称", "API 功能" | ||
:widths: 10, 30 | ||
|
||
" :ref:`compute_fbank_matrix <cn_api_audio_functional_compute_fbank_matrix>` ", "计算fbank矩阵" | ||
" :ref:`create_dct <cn_api_audio_functional_create_dct>` ", "计算离散余弦变化矩阵" | ||
" :ref:`fft_frequencies <cn_api_audio_functional_fft_frequencies>` ", "计算离散傅里叶采样频率" | ||
" :ref:`hz_to_mel<cn_api_audio_functional_hz_to_mel>` ", "转换hz频率为mel频率" | ||
" :ref:`mel_to_hz<cn_api_audio_functional_mel_to_hz>` ", "转换mel频率为hz频率" | ||
" :ref:`mel_frequencies<cn_api_audio_functional_mel_frequencies>` ", "计算mel频率" | ||
" :ref:`power_to_db<cn_api_audio_functional_power_to_db>` ", "转换能量谱为分贝" | ||
" :ref:`get_window<cn_api_audio_window_get_window` ", "得到各种窗函数" | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
.. _cn_api_audio_features_Spectrogram: | ||
|
||
LogMelSpectrogram | ||
------------------------------- | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 部分是没有默认参数的,有默认参数已经添加,源代码链接不知道是什么回事。 |
||
.. py:class::paddle.audio.features.LogMelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32') | ||
|
||
计算给定信号的log-mel谱. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议不要加了,这是信号处理常用特征,直接看源码,比公式更加直接。 |
||
参数 | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
:::::::::::: | ||
|
||
- **sr** (int, optional) - 采样率,默认22050. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512. | ||
- **hop_length** (Options[int]) - 帧移,默认512. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **window** (str) - 窗函数名,默认'hann'. | ||
- **power** (float) - 幅度谱的指数. | ||
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则t以t*hop_length开始. | ||
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'. | ||
- **n_mels** (int) - mel bins的数目. | ||
- **f_min** (float, optional) - 最小频率(hz),默认50.0. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **f_max** (float, optional) - 最大频率(hz),默认为None. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **htk** (bool, optional) - 在计算fbank矩阵时是否用在HTK公式缩放. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **norm** (Union[str, float], optional) - 计算fbank矩阵时正则化的种类,默认是'slaney',你也可以norm=0.5,使用p-norm正则化. | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **ref_value** (float) - 参照值,如果小于1.0, 信号的db会被提升,相反db会下降,默认值为1.0. | ||
- **amin** (float) - 输入的幅值的最小值. | ||
- **top_db** (Optional[float]) - log-mel谱的最大值(db). | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'. | ||
|
||
|
||
返回 | ||
::::::::: | ||
|
||
计算``LogMelSpectrogram``的可调用对象. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.LogMelSpectrogram | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
.. _cn_api_audio_features_MFCC: | ||
|
||
MFCC | ||
------------------------------- | ||
|
||
.. py:class::paddle.audio.features.MFCC(sr=22050, n_mfcc=40, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 有公式,请补充公式。方便用户理解这个方法 |
||
计算给定信号的MFCC. | ||
|
||
参数 | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
:::::::::::: | ||
|
||
- **sr** (int, optional) - 采样率,默认22050. | ||
- **n_mfcc** (int, optional) - mfcc的维度,默认40. | ||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512. | ||
- **hop_length** (Options[int]) - 帧移,默认512. | ||
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None. | ||
- **window** (str) - 窗函数名,默认'hann'. | ||
- **power** (float) - 幅度谱的指数. | ||
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则 | ||
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'. | ||
- **n_mels** (int) - mel bins的数目. | ||
- **f_min** (float, optional) - 最小频率(hz),默认50.0. | ||
- **f_max** (float, optional) - 最大频率(hz),默认为None. | ||
- **htk** (bool, optional) - 在计算fbank矩阵时是否用在HTK公式缩放. | ||
- **norm** (Union[str, float], optional) - 计算fbank矩阵时正则化的种类,默认是'slaney',你也可以norm=0.5,使用p-norm正则化. | ||
- **ref_value** (float) - 参照值,如果小于1.0, 信号的db会被提升,相反db会下降,默认值为1.0. | ||
- **amin** (float) - 输入的幅值的最小值. | ||
- **top_db** (Optional[float]) - log-mel谱的最大值(db). | ||
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'. | ||
|
||
返回 | ||
::::::::: | ||
|
||
计算``MFCC``的可调用对象. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.MFCC | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
.. _cn_api_audio_features_Spectrogram: | ||
|
||
MelSpectrogram | ||
------------------------------- | ||
|
||
.. py:class::paddle.audio.features.MelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', dtype='float32') | ||
|
||
求得给定信号的Mel谱. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int, optional) - 采样率,默认22050. | ||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512. | ||
- **hop_length** (Options[int]) - 帧移,默认512. | ||
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None. | ||
- **window** (str) - 窗函数名,默认'hann'. | ||
- **power** (float) - 幅度谱的指数. | ||
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则t以t*hop_length开始. | ||
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'. | ||
- **n_mels** (int) - mel bins的数目. | ||
- **f_min** (float, optional) - 最小频率(hz),默认50.0. | ||
- **f_max** (float, optional) - 最大频率(hz),默认为None. | ||
- **htk** (bool, optional) - 在计算fbank矩阵时是否用在HTK公式缩放. | ||
- **norm** (Union[str, float], optional) -计算fbank矩阵时正则化的种类,默认是'slaney',你也可以norm=0.5,使用p-norm正则化. | ||
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'. | ||
|
||
|
||
返回 | ||
::::::::: | ||
|
||
计算``MelSpectrogram``的可调用对象. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.MelSpectrogram |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
.. _cn_api_audio_features_Spectrogram: | ||
|
||
Spectrogram | ||
------------------------------- | ||
|
||
.. py:class::paddle.audio.features.Spectrogram(n_fft=512, hop_length=512, win_length=None, window='hann', power=1.0, center=True, pad_mode='reflect', dtype='float32') | ||
|
||
通过给定信号的短时傅里叶变换得到频谱. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认512. | ||
- **hop_length** (Options[int]) - 帧移,默认512. | ||
- **win_length** (Options[int]) - 短时FFT的窗长,默认为None. | ||
- **window** (str) - 窗函数名,默认'hann'. | ||
- **power** (float) - 幅度谱的指数. | ||
- **center** (bool) - 对输入信号填充,如果True, 那么t以t*hop_length为中心,如果为False,则t以t*hop_length开始. | ||
- **pad_mode** (str) - 如果center是True,选择填充的方式.默认值是'reflect'. | ||
- **dtype** ('float32') - 输入和窗的数据类型,默认是'float32'. | ||
|
||
|
||
返回 | ||
::::::::: | ||
|
||
计算``Spectrogram``的可调用对象. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.features.Spectrogram |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
.. _cn_api_audio_functional_compute_fbank_matrix: | ||
|
||
compute_fbank_matrix | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.compute_fbank_matrix(sr, n_fft, n_mels=64, f_min=0.0, f_max=None, htk=False, nrom='slaney', dtype='float32') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
计算mel变换矩阵. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 补个公式 |
||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int) - 采样率. | ||
- **n_fft** (int) - fft bins的数目. | ||
- **n_mels** (float) - mels bins的数目. | ||
- **f_min** (float) - 最小频率(hz). | ||
- **f_max** (Optional[float]) -最大频率(hz). | ||
- **htk** (bool) -是否使用htk缩放. | ||
- **norm** (Union[str, float]) -norm的类型,默认是'slaney'. | ||
- **dtype** (str) - 返回矩阵的数据类型,默认'float32'. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_mels, n_fft//2 + 1). | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.compute_fbank_matrix | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _cn_api_audio_functional_create_dct: | ||
|
||
create_dct | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.create_dct(n_mfcc, n_mels, norm='ortho', dtype='float32') | ||
|
||
计算离散余弦变换矩阵. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **n_mfcc** (float) - mel倒谱系数数目. | ||
- **n_mels** (int) - mel的fliterbank数. | ||
- **norm** (float) - 正则化类型,默认值是'ortho'. | ||
- **dtype** (str) - 默认'float32'. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_mels, n_mfcc). | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.create_dct |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
.. _cn_api_audio_functional_fft_frequencies: | ||
|
||
fft_frequencies | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.fft_frequencies(sr, n_fft, dtype='float32') | ||
|
||
计算fft频率. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **sr** (int) - 采样率. | ||
- **n_fft** (int) - fft bins的数目. | ||
- **dtype** (str) - 默认'float32'. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_fft//2 + 1,). | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.fft_frequencies |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
.. _cn_api_audio_functional_hz_to_mel: | ||
|
||
hz_to_mel | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.hz_to_mel(feq, htk=False) | ||
|
||
转换Hz为Mels. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **freq** (Tensor, float) - 输入tensor. | ||
- **htk** (bool) - 是否使用htk缩放,默认False. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor或float``,mels值. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.hz_to_mel |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
.. _cn_api_audio_functional_mel_frequencies: | ||
|
||
mel_frequencies | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.mel_frequencies(n_mels=64, f_min=0.0, f_max=11025, htk=False, dtype='float32') | ||
|
||
计算Mels频率. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **n_mels** (int) - 输入tensor,默认64. | ||
- **f_min** (float) - 最小频率(hz), 默认0.0. | ||
- **f_max** (float) - 最大频率(hz), 默认11025.0. | ||
- **htk** (bool) - 是否使用htk缩放,默认False. | ||
- **dtype** (str) - 默认'float32'. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor``,Tensor shape (n_mels,). | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.mel_frequencies |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
.. _cn_api_audio_functional_mel_to_hz: | ||
|
||
mel_to_hz | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.mel_to_hz(feq, htk=False) | ||
|
||
转换Mels为Hz. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **mel** (Tensor, float) - 输入tensor. | ||
- **htk** (bool) - 是否使用htk缩放,默认False. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor或float``,hz为单位的频率. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.mel_to_hz |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _cn_api_audio_functional_power_to_db: | ||
|
||
power_to_db | ||
------------------------------- | ||
|
||
.. py:function::paddle.audio.functional.power_to_db(spect, ref_value=1.0, amin=1e-10, top_db=80.0) | ||
|
||
转换能量谱为分贝单位. | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **spect** (Tensor) - stft能量谱,输入tensor. | ||
- **ref_value** (float) - 参照值,振幅相对于ref进行缩放,默认1.0. | ||
- **amin** (float) - 最小阈值,默认1e-10. | ||
- **top_db** (Optional[float]) - 阈值,默认80.0. | ||
|
||
返回 | ||
::::::::: | ||
|
||
``paddle.Tensor或float``,db单位的能量谱. | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.functional.power_to_db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的几个超链都失效了哈~