Skip to content

Commit

Permalink
Add deep clustering end-to-end training method
Browse files Browse the repository at this point in the history
  • Loading branch information
earthmanylf committed Apr 6, 2022
2 parents 3e6167c + c4aba12 commit feb28ba
Show file tree
Hide file tree
Showing 297 changed files with 10,434 additions and 286 deletions.
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +0,0 @@
[submodule "doc/notebook"]
path = doc/notebook
url = https://github.com/espnet/notebook
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ ESPnet uses [pytorch](http://pytorch.org/) as a deep learning engine and also fo
- Set `frontend` to be `s3prl`
- Select any upstream model by setting the `frontend_conf` to the corresponding name.
- Streaming Transformer/Conformer ASR with blockwise synchronous beam search.
- Restricted Self-Attention based on [Longformer](https://arxiv.org/abs/2004.05150) as an encoder for long sequences

### SUM: Speech Summarization
- End to End Speech Summarization Recipe for Instructional Videos using Restricted Self-Attention [[Sharma et al., 2022]](https://arxiv.org/abs/2110.06263)

Demonstration
- Real-time ASR demo with ESPnet2 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/espnet/notebook/blob/master/espnet2_asr_realtime_demo.ipynb)
Expand Down Expand Up @@ -129,7 +133,7 @@ To train the neural vocoder, please check the following repositories:
- Multi-speaker speech separation
- Unified encoder-separator-decoder structure for time-domain and frequency-domain models
- Encoder/Decoder: STFT/iSTFT, Convolution/Transposed-Convolution
- Separators: BLSTM, Transformer, Conformer, DPRNN, [DCCRN](https://arxiv.org/abs/2008.00264), [Deep Clustering](https://ieeexplore.ieee.org/document/7471631), [Deep Attractor Network](https://pubmed.ncbi.nlm.nih.gov/29430212/), Neural Beamformers, etc.
- Separators: BLSTM, Transformer, Conformer, [TasNet](https://arxiv.org/abs/1809.07454), [DPRNN](https://arxiv.org/abs/1910.06379), [DC-CRN](https://web.cse.ohio-state.edu/~wang.77/papers/TZW.taslp21.pdf), [DCCRN](https://arxiv.org/abs/2008.00264), [Deep Clustering](https://ieeexplore.ieee.org/document/7471631), [Deep Attractor Network](https://pubmed.ncbi.nlm.nih.gov/29430212/), Neural Beamformers, etc.
- Flexible ASR integration: working as an individual task or as the ASR frontend
- Easy to import pretrained models from [Asteroid](https://github.com/asteroid-team/asteroid)
- Both the pre-trained models from Asteroid and the specific configuration are supported.
Expand Down
2 changes: 2 additions & 0 deletions ci/doc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ set -euo pipefail
find ./utils/{*.sh,spm_*} -exec ./doc/usage2rst.sh {} \; | tee ./doc/_gen/utils_sh.rst
find ./espnet2/bin/*.py -exec ./doc/usage2rst.sh {} \; | tee ./doc/_gen/espnet2_bin.rst

./doc/notebook2rst.sh > ./doc/_gen/notebooks.rst

# generate package doc
./doc/module2rst.py --root espnet espnet2 --dst ./doc --exclude espnet.bin

Expand Down
2 changes: 1 addition & 1 deletion ci/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ ${CXX:-g++} -v
. ./activate_python.sh
make TH_VERSION="${TH_VERSION}"

make warp-ctc.done warp-transducer.done chainer_ctc.done nkf.done moses.done mwerSegmenter.done pesq pyopenjtalk.done py3mmseg.done s3prl.done transformers.done phonemizer.done fairseq.done k2.done gtn.done
make warp-ctc.done warp-transducer.done chainer_ctc.done nkf.done moses.done mwerSegmenter.done pesq pyopenjtalk.done py3mmseg.done s3prl.done transformers.done phonemizer.done fairseq.done k2.done gtn.done longformer.done
rm -rf kaldi
)
. tools/activate_python.sh
Expand Down
2 changes: 1 addition & 1 deletion doc/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
_gen/
_build/
build/

notebook/
11 changes: 1 addition & 10 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,7 @@ ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end
./espnet2_task.md
./espnet2_distributed.md

.. toctree::
:maxdepth: 1
:caption: Notebook:

./notebook/asr_cli.ipynb
./notebook/asr_library.ipynb
./notebook/tts_cli.ipynb
./notebook/pretrained.ipynb
./notebook/tts_realtime_demo.ipynb
./notebook/st_demo.ipynb
.. include:: ./_gen/notebooks.rst

.. include:: ./_gen/modules.rst

Expand Down
4 changes: 2 additions & 2 deletions doc/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ the following packages are installed using Anaconda, so you can skip them.)
# For CentOS
$ sudo yum install libsndfile
```
- ffmpeg (This is not required when installataion, but used in some recipes)
- ffmpeg (This is not required when installing, but used in some recipes)
```sh
# For Ubuntu
$ sudo apt-get install ffmpeg
# For CentOS
$ sudo yum install ffmpeg
```
- flac (This is not required when installataion, but used in some recipes)
- flac (This is not required when installing, but used in some recipes)
```sh
# For Ubuntu
$ sudo apt-get install flac
Expand Down
1 change: 0 additions & 1 deletion doc/notebook
Submodule notebook deleted from ef3cbf
17 changes: 17 additions & 0 deletions doc/notebook2rst.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash

set -euo pipefail

cd "$(dirname "$0")"

if [ ! -d notebook ]; then
git clone https://github.com/espnet/notebook --depth 1
fi

echo "\
.. toctree::
:maxdepth: 1
:caption: Notebook:
"

find ./notebook/*.ipynb -exec echo " {}" \;
Loading

0 comments on commit feb28ba

Please sign in to comment.