-
Notifications
You must be signed in to change notification settings - Fork 764
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update audio datasets && backend (#5363)
* update audio datasets && backend * add overview * format * fix function info * rm seed in TESS * rename some api * fix load * fix return * fix codestyle
- Loading branch information
Showing
9 changed files
with
224 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
.. _cn_api_audio_backends_get_current_backend: | ||
|
||
get_current_backend | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.backends.get_current_backend() | ||
获取现在的处理语音 I/O 的后端名称。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
返回 | ||
::::::::: | ||
|
||
``str``,语音 I/O 的后端名称。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.backends.get_current_backend |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
.. _cn_api_audio_info: | ||
|
||
info | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.info(filepath:str) | ||
获取音频的相关信息,如采用率,通道数等。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **filepath** (str) - 输入音频路径。 | ||
返回 | ||
::::::::: | ||
|
||
``AudioInfo``, 音频相关信息。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.info |
21 changes: 21 additions & 0 deletions
21
docs/api/paddle/audio/backends/list_available_backends_cn.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
.. _cn_api_audio_backends_list_available_backends: | ||
|
||
list_available_backends | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.backends.list_available_backends() | ||
获取可用的音频 I/O 后端。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
返回 | ||
::::::::: | ||
|
||
``List[str]``, 可用的音频 I/O 后端集合。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.backends.list_available_backends |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _cn_api_audio_load: | ||
|
||
load | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.load(filepath: Union[str, Path], frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True) | ||
获取音频数据。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **filepath** (str 或者 Path) - 输入音频路径。 | ||
- **frame_offset** (int) - 默认是 0,开始读取音频起始帧。 | ||
- **num_frames** (int) - 默认是-1,读取音频帧数, -1 表示读取全部帧。 | ||
- **normalize** (bool) - 默认是 True。如果是 True,返回是音频值被规整到[-1.0, 1.0],如果是 False,那么就返回原始值。 | ||
- **channels_first** (bool) - 默认是 True。如果是 True,那么返回的形状是[channel,time],如果是 False,则是[time, channel]。 | ||
返回 | ||
::::::::: | ||
|
||
``Tuple[paddle.Tensor, int]``, 音频数据值, 采样率。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.load |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. _cn_api_audio_save: | ||
|
||
save | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.save(filepath: str, src: paddle.Tensor, sample_rate: int, channels_first: bool = True, encoding: Optional[str] = None, bits_per_sample: Optional[int] = 16) | ||
保存音频数据。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **filepath** (str 或者 Path) - 保存音频路径。 | ||
- **src** (paddle.Tensor) - 音频数据。 | ||
- **sample_rate** (int) - 采样率。 | ||
- **channels_first** (bool) - 如果是 True,那么 src 的 Tensor 形状是[channel,time],如果是 False,则是[time, channel]。 | ||
- **encoding** (Optional[str]) - 默认是 None,编码信息。 | ||
- **bits_per_sample** (Optional[int]) - 默认是 16。编码位长。 | ||
返回 | ||
::::::::: | ||
无 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.save |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
.. _cn_api_audio_backends_set_backend: | ||
|
||
set_backend | ||
------------------------------- | ||
|
||
.. py:function:: paddle.audio.backends.set_backend(backend_name: str) | ||
设置处理语音 I/O 的后端。 | ||
|
||
参数 | ||
:::::::::::: | ||
|
||
- **backend_name** (str) - 语音 I/O 后端名称,现支持‘wave_backend’,如果安装了 paddleaudio >=1.0.2,则也支持‘soundfile’。 | ||
|
||
返回 | ||
::::::::: | ||
无 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.backends.set_backend |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
.. _cn_api_audio_datasets_ESC50: | ||
|
||
ESC50 | ||
------------------------------- | ||
|
||
.. py:class:: paddle.audio.datasets.ESC50(mode: str = 'train', split: int = 1, feat_type: str = 'raw', archive=None, **kwargs) | ||
`ESC50 <http://dx.doi.org/10.1145/2733373.2806390>`_ 数据集的实现。 | ||
|
||
参数 | ||
::::::::: | ||
|
||
- **mode** (str,可选) - ``'train'`` 或 ``'dev'`` 模式两者之一,默认值为 ``'train'``。 | ||
- **split** (int) - 默认是 1,指定 dev 的文件夹。 | ||
- **feat_type** (str) - 默认是 raw,raw 是原始语音,支持 mfcc,spectrogram,melspectrogram,logmelspectrogram。指定从音频提取的语音特征。 | ||
- **archive** (dict) - 默认是 None,类中已经设置默认 archive,指定数据集的下载链接和 md5 值。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
:ref:`cn_api_io_cn_Dataset`,ESC50 数据集实例。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.datasets.ESC50 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
.. _cn_api_audio_datasets_TESS: | ||
|
||
TESS | ||
------------------------------- | ||
|
||
.. py:class:: paddle.audio.datasets.TESS(mode: str = 'train', n_folds = 5, split = 1, feat_type = 'raw', archive=None, **kwargs) | ||
`TESS <https://tspace.library.utoronto.ca/handle/1807/24487>`_ 数据集的实现。 | ||
|
||
参数 | ||
::::::::: | ||
|
||
- **mode** (str,可选) - ``'train'`` 或 ``'dev'`` 模式两者之一,默认值为 ``'train'``。 | ||
- **n_folds** (int) - 默认是 5,指定把数据集分为的文件夹数目, 1 个文件夹是 dev,其他是 train。 | ||
- **split** (int) - 默认是 1,指定 dev 的文件夹。 | ||
- **feat_type** (str) - 默认是 raw,raw 是原始语音,支持 mfcc,spectrogram,melspectrogram,logmelspectrogram。指定从音频提取的语音特征。 | ||
- **archive** (dict) - 默认是 None,类中已经设置默认 archive,指定数据集的下载链接和 md5 值。 | ||
|
||
返回 | ||
::::::::: | ||
|
||
:ref:`cn_api_io_cn_Dataset`,TESS 数据集实例。 | ||
|
||
代码示例 | ||
::::::::: | ||
|
||
COPY-FROM: paddle.audio.datasets.TESS |