From c7c7b008d92a7d22cf7d994c1ca0bb39fa696826 Mon Sep 17 00:00:00 2001 From: Chaitanya Narisetty Date: Tue, 31 May 2022 14:30:54 +0530 Subject: [PATCH] Squashed commit of the following: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 047d0c474c18a87c205e566948410be16787e477 Merge: 9396ed37d bfe7bca3a Author: Shinji Watanabe Date: Thu May 19 09:50:02 2022 -0400 Merge pull request #4378 from akreal/fix-check_short_utt Fix minimum input length for Conv2dSubsampling2 in check_short_utt commit bfe7bca3a98da52714e1c45906cf826704464b7c Author: Pavel Denisov Date: Thu May 19 13:41:59 2022 +0200 Fix minimum input length for Conv2dSubsampling2 in check_short_utt commit 9396ed37deb8b101fd064d46c85975ad9047bf87 Merge: c54b585c1 e047156ec Author: Naoyuki Kamo Date: Thu May 19 14:50:56 2022 +0900 Merge pull request #4376 from kamo-naoyuki/libsndfile Remove the restriction for libsndfile version commit c54b585c1ca6693ae7ba7e299a48af762eda6adf Merge: 9ca49caed 88465607c Author: Tomoki Hayashi Date: Thu May 19 12:29:02 2022 +0900 Merge pull request #4374 from YosukeHiguchi/master Minor fixes for the intermediate loss usage and Mask-CTC decoding commit e047156ec8df3266259aed03742ac798e365f648 Author: kamo-naoyuki Date: Thu May 19 10:11:08 2022 +0900 remove version restiction for libsndfile commit 9ca49caed98410cd7d2c71e4781819a1e92b35d9 Merge: b008ac7d5 2952c3bca Author: Naoyuki Kamo Date: Thu May 19 09:38:33 2022 +0900 Merge pull request #4375 from espnet/kamo-naoyuki-patch-1 Update .mergify.yml commit 88465607cf5e899b8ce1b93c5c9fe09b69a2ab83 Author: Yosuke Higuchi Date: Thu May 19 07:05:29 2022 +0900 fix for test commit 2952c3bca26a70723094d5a160387b7936f71769 Author: Naoyuki Kamo Date: Thu May 19 06:59:02 2022 +0900 Update .mergify.yml commit b008ac7d58e9ced1a9f8c89cc85ee69d9e9461ab Merge: 3c96908ed 4203c9c9c Author: Naoyuki Kamo Date: Thu May 19 06:32:44 2022 +0900 Merge pull request #4372 from kamo-naoyuki/isort Add isort checking to the CI tests commit 4de7aa562f74c596e5b616fd8278a50a707d0198 Author: Yosuke Higuchi Date: Thu May 19 06:19:20 2022 +0900 fix for test commit 9c83ddb46404334914764a8e4356ea8a4c3c806c Author: Yosuke Higuchi Date: Thu May 19 05:05:01 2022 +0900 support gpu decoding for mask-ctc commit 49100e4f1b3fc389c5672dc2ca17973525c4bf02 Author: Yosuke Higuchi Date: Thu May 19 05:03:29 2022 +0900 fix bug for returning intermediate states commit 4203c9c9c9d5a68cd13d464290cead3738ed003d Author: kamo-naoyuki Date: Wed May 18 17:47:22 2022 +0900 apply isort commit d0f2eac70a5521adf59618ba3ce6603e2863f0c5 Author: kamo-naoyuki Date: Wed May 18 17:46:47 2022 +0900 modified for isort options commit 8f73b73d23d34bf5f3e8ed2f625dca1916ea8683 Author: kamo-naoyuki Date: Wed May 18 16:38:34 2022 +0900 apply black commit 6974dd4efc11e465d4a3d1a34190c7ed782dacee Author: kamo-naoyuki Date: Wed May 18 16:35:15 2022 +0900 Add descriptions for isort commit 24c3676a8d4c2e60d2726e9bcd9bdbed740610e0 Author: kamo-naoyuki Date: Wed May 18 16:16:53 2022 +0900 Apply isort commit 3c96908edc5c592c9c99bba0640428613dc7c3cb Merge: c173c3093 aa5d6ffff Author: Jiatong <728307998@qq.com> Date: Tue May 17 18:00:40 2022 -0700 Merge pull request #4341 from chintu619/st_bugfix bug fixes in ST recipes commit c173c30930631731e6836c274a591ad571749741 Merge: e0e0620ac d38188cc3 Author: Naoyuki Kamo Date: Tue May 17 15:20:31 2022 +0900 Merge pull request #4371 from espnet/kamo-naoyuki-patch-1 Update .mergify.yml commit d38188cc30af6cffc4ad0233e7e705e93511c11d Author: Naoyuki Kamo Date: Tue May 17 13:43:40 2022 +0900 Update .mergify.yml commit e0e0620acca0df345cf317a13c839d7d4d5c773f Merge: df053b8c1 2cfbbd337 Author: Tomoki Hayashi Date: Tue May 17 13:01:02 2022 +0900 Merge pull request #4369 from kan-bayashi/minor_fix_jets commit 2cfbbd337d64f68e1f937e37feeb544d972c4e0b Author: kan-bayashi Date: Tue May 17 11:06:00 2022 +0900 updated jets test commit 17ab7747fe7e0d4d6885847f2c738253a859dedf Author: kan-bayashi Date: Tue May 17 11:05:52 2022 +0900 updated README commit 6ec8c27815c6fded4c13b01b8d2707016e9e8e95 Author: kan-bayashi Date: Tue May 17 09:25:41 2022 +0900 updated README commit b1e6c752b0d94f3209593e0cdbd5b43d79e8076d Author: kan-bayashi Date: Tue May 17 09:19:54 2022 +0900 shorten jets test commit df053b8c13c26fe289fc882751801fd781e9d43e Merge: afa8f8ec5 5aa543a9f Author: Tomoki Hayashi Date: Tue May 17 08:13:36 2022 +0900 Merge pull request #4364 from imdanboy/master add e2e tts model: JETS commit 5aa543a9ff6c329f5fc601f3aa053ffd4afb19ba Author: Tomoki Hayashi Date: Mon May 16 21:13:30 2022 +0900 minor fix of docstrings and comments commit a82e78d18aca9c00bcf8f378c42e78a0de24940e Author: imdanboy Date: Fri May 13 22:28:31 2022 +0900 JETS; e2e tts model commit afa8f8ec5b8ec77deb1a3c1531915ebbee7b80e6 Merge: fffb3444f cd77501a8 Author: Shinji Watanabe Date: Fri May 13 17:36:30 2022 -0400 Merge pull request #4349 from pyf98/quantization Add quantization in ESPnet2 for asr inference commit fffb3444fe4d8ef2630a22dd145d6f1fb0caab46 Merge: f840b8114 5331890e6 Author: Naoyuki Kamo Date: Fri May 13 20:36:39 2022 +0900 Merge pull request #4361 from espnet/kamo-naoyuki-patch-1 Update README.md commit aa5d6ffff67079f2cbe6a7e1eba852e459f0f6a4 Author: Chaitanya Narisetty Date: Fri May 13 05:15:32 2022 -0400 fix lm tag names commit 3cac7bb7f732a694f4b87007271d394a9ee3838e Author: Chaitanya Narisetty Date: Fri May 13 05:07:55 2022 -0400 resolve conflicts and fix lm_train filenames commit ea44663e8a24ebfcaa03f3bba149e561e970fdf3 Author: Chaitanya Narisetty Date: Fri May 13 04:43:18 2022 -0400 review suggested changes commit 650c733437da32627f88fe369555ce1955536087 Merge: 6d1bd3a8e f840b8114 Author: Chaitanya Narisetty Date: Fri May 13 03:18:08 2022 -0400 Merge branch 'espnet_master' into st_bugfix commit 5331890e6a6a61a3006e5e2c13d47172f5587a29 Author: Naoyuki Kamo Date: Fri May 13 13:15:40 2022 +0900 Update README.md commit f840b8114452b4803b8fb25c1f22a93da146e9ba Merge: 1b1241040 9cfd6af64 Author: Naoyuki Kamo Date: Fri May 13 13:13:34 2022 +0900 Merge pull request #4348 from kamo-naoyuki/1.11.0 Add pytorch=1.10.2 and 1.11.0 to ci configurations commit 9cfd6af64a28237019196cd495fbd2943790ce21 Author: kamo-naoyuki Date: Fri May 13 09:58:04 2022 +0900 fix commit 2625be71a722e7eb030dff4f71d8dc9599a33844 Author: kamo-naoyuki Date: Fri May 13 03:46:24 2022 +0900 remove warning commit 9a2001fac56dddf5ba1c2eaec092cb420f83f7c9 Author: kamo-naoyuki Date: Fri May 13 03:44:11 2022 +0900 fix for pytorch1.11 (+= became inplace op) commit 5518b6ba0af0bba9e9d59d6c47607656f49c9988 Author: kamo-naoyuki Date: Thu May 12 22:04:42 2022 +0900 fix import order commit 98689a5f0bfd88efffdbbcdd5d924e186d563a91 Author: kamo-naoyuki Date: Thu May 12 21:17:35 2022 +0900 change to show the error logs when jobs are failed commit bb0d0aaa9e9f9076ac88aad425ad2f2caef369a7 Author: kamo-naoyuki Date: Thu May 12 20:40:39 2022 +0900 fix code style commit 934b161f1f714637c3d7d47c14f8c810a9df6fe2 Author: kamo-naoyuki Date: Thu May 12 20:33:58 2022 +0900 change to show the error logs when jobs are failed commit 5c474b96c543c3d26e95b432355bcfd2bf8dc116 Author: kamo-naoyuki Date: Thu May 12 20:20:18 2022 +0900 remove verbosity options commit 005aad11b37acf388c6b70143ab40a5231bc7a39 Author: kamo-naoyuki Date: Thu May 12 20:04:57 2022 +0900 fix commit 5c4b966a957062e4de298bcb69fe8cf6f1365fd1 Author: kamo-naoyuki Date: Thu May 12 19:36:11 2022 +0900 remove tests for python=3.10.0 temporary commit 809ac3741814b7d9ebdd351b9e0e9343e236977c Author: kamo-naoyuki Date: Thu May 12 19:27:20 2022 +0900 fix commit 86186b744fb2bfc259909c49cc906fb0856d15bf Author: kamo-naoyuki Date: Thu May 12 19:10:18 2022 +0900 add installation for packaging commit 8fbac77268906075043cbecfb3e1c5625b145fce Author: kamo-naoyuki Date: Thu May 12 18:59:17 2022 +0900 fix commit b0050d97da3d0545b62a5d21b029ddd016ce6ca1 Author: kamo-naoyuki Date: Thu May 12 18:56:52 2022 +0900 fix commit 6e9035d42eea31cad87a7c8b87fc79635a6df7c2 Author: kamo-naoyuki Date: Thu May 12 18:32:33 2022 +0900 fix commit 1c344a95ceb83b4b44675aee5326afeb9284d8e8 Author: kamo-naoyuki Date: Thu May 12 18:25:35 2022 +0900 change LooseVersion to parse commit f899a05768436cc38fb432d6f002ab667983abbd Author: kamo-naoyuki Date: Thu May 12 18:09:33 2022 +0900 fix commit 7d5242212403e740c4d5b8ebd9a346a991ea50a9 Author: kamo-naoyuki Date: Thu May 12 18:09:15 2022 +0900 fix commit b7cfdd9a70559271e45de103e242228f94e837ff Author: kamo-naoyuki Date: Thu May 12 18:05:41 2022 +0900 Change LooseVersion to parse commit d234b9ab30bbc2bb6fd42d6335421a6f8a9ed637 Author: kamo-naoyuki Date: Thu May 12 17:10:40 2022 +0900 fix commit 1b1241040e1e30e575a182b6be8b8e4602badeb8 Merge: 39bae01e4 52c238d02 Author: Shinji Watanabe Date: Wed May 11 13:00:13 2022 -0400 Merge pull request #4352 from espnetUser/master Add unit test to streaming ASR inference commit 52c238d02d50fcfb2c4e2a5058c743c7db913eec Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Wed May 11 16:10:04 2022 +0200 Applied black formating to test_asr_inference.py for PR commit 87c7573874aeec096dd1e902478d3dd6e2c83ad2 Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Wed May 11 15:43:01 2022 +0200 Update asr_inference_streaming.py Fix CI error on mismatch in Tensor dtypes commit 39bae01e4a132da69b9b0d025da8c579a5f38b77 Merge: dd24d7d41 71f3c8813 Author: Tomoki Hayashi Date: Wed May 11 17:53:04 2022 +0900 Merge pull request #4355 from kan-bayashi/fix_lid_in_gan_tts commit dd24d7d41517202b308afb186f466c8006ae4c14 Merge: 2dde7734b f7b390582 Author: Tomoki Hayashi Date: Wed May 11 17:52:09 2022 +0900 Merge pull request #4206 from WeiGodHorse/master commit 2dde7734bade874d4f8cfe7df4be069e64259fd5 Merge: beb336027 ec7e2b07b Author: Tomoki Hayashi Date: Wed May 11 16:27:55 2022 +0900 Merge pull request #4356 from kan-bayashi/fix_mixed_precision_vits fix loss = NaN in VITS with mixed precision commit 7a590ccd0da4897ef283486776f134eabe865ce0 Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Wed May 11 09:25:03 2022 +0200 Applied black formating to test_asr_inference.py for PR commit ec7e2b07bfa85c8a2292de7a2edbf1c2cd956d99 Author: kan-bayashi Date: Wed May 11 14:48:36 2022 +0900 fixed black commit 2be9ddc5a2c0a7c4aad2b155fa1450222ca0c7a3 Author: kan-bayashi Date: Wed May 11 14:28:05 2022 +0900 fixed mixed_precision NaN (#4236) commit 71f3c88133c7a29db54baa7eaa3b4fdf329cbdf5 Author: kan-bayashi Date: Wed May 11 13:39:59 2022 +0900 fixed optional data names for TTS commit ee57ff94dfa2c3ced30c1b103076b4ae18fa9199 Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Tue May 10 22:37:18 2022 +0200 Update asr_inference_streaming.py Fix dtype CI error commit 272d5d015f89f1520c82c31bd309fdce89d88f50 Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Tue May 10 21:52:21 2022 +0200 Update test_asr_inference.py Remove streaming=true parameter commit c96e0d7f79e6e94e568b22156eb61004d5d8cf8c Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Tue May 10 21:25:57 2022 +0200 Aplied black formating to test_asr_inference.py for PR commit cd77501a8f09b5b11bf5422b0e24b8316820af77 Author: Yifan Peng Date: Tue May 10 12:02:07 2022 -0400 fix error for rnn encoders flatten_parameters commit 3aafdb9d92c8c61d62be72f0907da957d177aa8c Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Tue May 10 17:05:48 2022 +0200 Update asr_inference_streaming.py Bugfix in streaming inference #4216 commit 61b50138b7e8828506a18067cc2f482e745e83d7 Author: espnetUser <81252087+espnetUser@users.noreply.github.com> Date: Tue May 10 16:58:14 2022 +0200 Update test_asr_inference.py Added edge test case for streaming asr unit test and increased execution time out commit 052dd603900362048675f65058b7a6f4bd94bc7d Author: Yifan Peng Date: Mon May 9 23:27:41 2022 -0400 fix ci commit 06e2a7a16a06cda326035d03c84734d18c852cd3 Author: Yifan Peng Date: Mon May 9 23:10:14 2022 -0400 apply black commit a48423fda5ab75d1205396ca5f744dc8ca98df00 Author: Yifan Peng Date: Mon May 9 22:59:57 2022 -0400 add test for espnet2 quantization commit acb24c886f47fec7a00063cb66423e7bd52ea0bc Author: Yifan Peng Date: Mon May 9 22:59:39 2022 -0400 add quantization to asr_inference commit b98fc861939310b73b50f959bc45176da10ef493 Author: kamo-naoyuki Date: Tue May 10 11:52:27 2022 +0900 fix commit 3428f032d58c73902b5e6fe80307eb08cfc64ff6 Merge: 4ff2ce124 beb336027 Author: Naoyuki Kamo Date: Tue May 10 11:42:23 2022 +0900 Merge branch 'master' into 1.11.0 commit 4ff2ce1244e0af72439deaa59226eba434a70618 Author: kamo-naoyuki Date: Tue May 10 11:34:31 2022 +0900 add pytorch=1.10.1, 1.11.0 to ci configurations commit beb3360276aa9ff65fe84f4c5e99c0c063c2a6be Merge: 537f9b6c1 79cda74ba Author: Shinji Watanabe Date: Mon May 9 16:27:37 2022 -0400 Merge pull request #4347 from YosukeHiguchi/espnet2_maskctc2 Minor fix for Mask-CTC forward function commit 79cda74ba20f0b795251e23a9cb9fd624e2be02d Author: Yosuke Higuchi Date: Mon May 9 22:43:29 2022 +0900 add kwargs in forward argument commit 537f9b6c14ab195cdcd21c404656c8534295f15d Merge: 793b999a5 9e8e75315 Author: Shinji Watanabe Date: Sun May 8 17:34:55 2022 -0400 Merge pull request #4343 from Emrys365/complex_support Fix a bug in stats aggregation when PITSolver is used commit 9e8e753154f5f71c9cb26217483427adb278759c Author: Wangyou Zhang Date: Sat May 7 13:16:35 2022 +0800 Apply black commit 5ea4e087a311ab7c798950e68ae92e10b1bb41d8 Author: Wangyou Zhang Date: Sat May 7 12:05:49 2022 +0800 Fix a bug in stats aggregation when PITSolver is used commit 6d1bd3a8ef695a75358d019cc1b33100817c0dad Merge: eb6dc2d55 793b999a5 Author: Chaitanya Narisetty Date: Fri May 6 10:51:14 2022 -0400 Merge branch 'espnet:master' into st_bugfix commit eb6dc2d55faac7e62742d0b7791d8f3a991e91d1 Author: Chaitanya Narisetty Date: Fri May 6 10:08:19 2022 -0400 typo fix commit 8c56ee817867358f2a8130372fd914c136bd7a5b Author: Chaitanya Narisetty Date: Fri May 6 08:59:26 2022 -0400 bug fixes in ST recipes * Change sampling frequency in `fbank.conf` and `pitch.conf` in Covost2 recipe * In `run.sh`, if language is low resource, then have more speed perturbations. Fix typos for test sets * In `st.sh` * fix directory naming issues to avoid replacement for different language pairs * Replace `>>` with `>` to replace previous inference results * Fix removing of empty text in stage 4 * When removing utterance-ID in `ref.trn.org` or `hyp.trn.org`, the current implementation removes all words in parenthesis instead of removing just the utterance-ID from the end of each line. Fixed this by changing `perl -pe 's/\([^\)]+\)//g;'` to `perl -pe 's/\([^\)]+\)$//g;'` commit f7b390582d2d77b113a92a5e52f907d5832d6f04 Author: 魏宪豪 Date: Fri May 6 20:18:05 2022 +0800 change a test file to conform new pypinyin package commit b83128fafc913e775a49d37a5cad24a893718020 Author: 魏宪豪 Date: Fri May 6 17:54:20 2022 +0800 Fix missing punctuation commit 931fd226babe69b35c6e3a6a288e5e0c901736a1 Author: 魏宪豪 Date: Fri May 6 16:54:31 2022 +0800 reformat commit 793b999a50af484a5eaf6227ef7556b48514ef15 Merge: 4f41a1a06 6d0672882 Author: Shinji Watanabe Date: Thu May 5 21:54:27 2022 -0400 Merge pull request #4330 from pyf98/show_translation_result Update show_translation_result.sh to show all decoding results under the given exp directory commit 4f41a1a06ecd96af567bc73d1d6734531dd3cb44 Merge: a49cc60cd f0d7cc2bf Author: Shinji Watanabe Date: Thu May 5 21:53:10 2022 -0400 Merge pull request #4329 from roshansh-cmu/wandb Wandb Minor Fix for Model Resume commit a49cc60cda690e448d925c3e2bfdc5a85b3f5cd3 Merge: de624ed58 21fba33c6 Author: Shinji Watanabe Date: Thu May 5 21:51:43 2022 -0400 Merge pull request #4338 from espnet/ftshijt-patch-1 Fix typo commit 21fba33c69d9199c6897ffc6da8433ab94b7051d Author: Jiatong <728307998@qq.com> Date: Thu May 5 21:25:10 2022 -0400 Fix typo commit de624ed58953d17907fb241c5cb6514f27510162 Merge: b757b89d4 fe288000d Author: Shinji Watanabe Date: Thu May 5 16:10:44 2022 -0400 Merge pull request #4332 from simpleoier/chime6 add chime6 recipe commit c504336661fa3cefa60b2214da39fbf0118fce49 Merge: 50269e8b4 b757b89d4 Author: 魏宪豪 Date: Wed May 4 21:58:43 2022 +0800 Merge remote-tracking branch 'upstream/master' commit fe288000dbde339b4c386408af488af4bac423b6 Author: simpleoier Date: Tue May 3 17:51:36 2022 -0400 add egs2/chime6/asr1 recipe commit 6d06728820576ed96a729b3477a29ccab12542f1 Author: Yifan Peng Date: Sat Apr 30 20:53:52 2022 -0400 fix ci commit 72333a892d16ef913633111120f159008812795e Author: Yifan Peng Date: Sat Apr 30 20:34:06 2022 -0400 fix ci commit f15e6adaafaca380ea152cf2b38d604eea3603d3 Author: Yifan Peng Date: Sat Apr 30 18:54:37 2022 -0400 quote expansion commit f6731cd97565bf4108f1064a83f1fffea4ca351b Author: Yifan Peng Date: Sat Apr 30 18:43:49 2022 -0400 update mt.sh commit 552060a1d5670d0fd838bd8e10fc9e47a1122346 Author: Yifan Peng Date: Sat Apr 30 18:41:41 2022 -0400 update show translation result commit f0d7cc2bfbc8f68c42820262a8ca6e4906f3818b Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Fri Apr 29 20:57:18 2022 -0400 Delete resnet.py commit 79c071e9ecd268a1963e8ca3863a2f5eaf34a525 Author: roshansh-cmu Date: Fri Apr 29 20:54:37 2022 -0400 Wandb minor fix for model resume commit ffe7c58ac8a255769f6952b8c7225a5158a00068 Merge: 835033c70 b757b89d4 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Fri Apr 29 20:45:47 2022 -0400 Merge branch 'espnet:master' into master commit b757b89d45d5574cebf44e225cbe32e3e9e4f522 Merge: 930b380de 664414c8f Author: Tomoki Hayashi Date: Fri Apr 29 16:11:56 2022 +0900 Merge pull request #4320 from cadia-lvl/add-progress-bar commit 930b380de02b31f8d2da4144d471e60ed41d70fc Merge: 2a48371b8 de81cf979 Author: Shinji Watanabe Date: Thu Apr 28 16:30:34 2022 -0400 Merge pull request #4316 from simpleoier/enh_s2t add egs2/chime4/enh_asr1 recipe and results commit de81cf979fd61ab13e0ab0fe0432fbbaa4776be3 Author: simpleoier Date: Thu Apr 28 11:54:10 2022 -0400 update egs2/chime4/enh_asr1/README.md and related enh1, asr1 configs. commit 664414c8f27d5148377ffa733c7f8369eaf7ebd4 Author: kan-bayashi Date: Thu Apr 28 21:31:45 2022 +0900 fixed flake8 commit 2a48371b8ceffd4899dc08f2fc5df092ed1d8a93 Merge: 72c1d8f2b 5a9178236 Author: Shinji Watanabe Date: Thu Apr 28 07:40:31 2022 -0400 Merge pull request #4243 from D-Keqi/master Add streaming ST/SLU commit 72c1d8f2bde996febde895c603722dba1634cf20 Merge: b7f0a5a6f 406656cdc Author: Shinji Watanabe Date: Thu Apr 28 07:37:23 2022 -0400 Merge pull request #4110 from earthmanylf/dpclanddan Merge Deep Clustering and Deep Attractor Network to enh separator commit b7f0a5a6fc227049c1b8735d8ac4362c27333022 Merge: 44971ff96 2d950f962 Author: Shinji Watanabe Date: Thu Apr 28 07:33:11 2022 -0400 Merge pull request #4328 from Emrys365/egs2_aishell4 Rename egs2/clarity21/enh_2021 to egs2/clarity21/enh1 commit 2d950f96223fd4823203b6a4e9afdc86b2357e7e Author: Wangyou Zhang Date: Thu Apr 28 16:58:26 2022 +0800 Rename egs2/clarity21/enh_2021/ commit 2b663318cd1773fb8685b1e03295b6bc6889c283 Author: simpleoier Date: Thu Apr 28 00:59:22 2022 -0400 fix small bugs and add CHiME4 enh_asr1 recipe & results commit 406656cdcb668a77910074b4382b557b6f845c54 Author: earthmanylf <411214987@qq.com> Date: Thu Apr 28 11:10:11 2022 +0800 Add custom name in __init__ in tf_domain.py; Merge test_dpcl_loss.py to test_tf_domain.py commit 5a9178236bc1a7a4a5db82ad84773d9c43199c81 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 28 10:31:29 2022 +0800 use the another st_inference commit 9e4bb7fa88e8c63e69712e77c5b783c64181fbc2 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 28 10:13:59 2022 +0800 fix conflict commit 21d2ac6331ec0779b8ec2d3265ccdfabfaacbd61 Merge: b801ddc96 44971ff96 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 28 10:12:15 2022 +0800 Merge pull request #17 from espnet/master merge the latest espnet commit b801ddc96aedd2a9b4e63d2e3612c3cf7417799a Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 28 10:11:11 2022 +0800 Add files via upload commit 316cf02340a627548b71317ba04afac457f68101 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 28 10:04:29 2022 +0800 fix conflict commit 9b33b791d7c7b509f514b7540a8ec5dd7fff9d0b Author: earthmanylf <411214987@qq.com> Date: Wed Apr 27 23:22:22 2022 +0800 Fix format commit 346a42467881e5bbd9414200dd3c915935eb56dd Author: earthmanylf <411214987@qq.com> Date: Wed Apr 27 22:37:22 2022 +0800 Fix format commit 44971ff962aae30c962226f1ba3d87de057ac00e Merge: 0ae377389 c4b93e8fd Author: Jiatong <728307998@qq.com> Date: Wed Apr 27 10:13:03 2022 -0400 Merge pull request #4324 from ftshijt/master Add Test Functions for ST Train and Inference commit 0d3be31602306650fee44c367cbc788e0b0462db Author: earthmanylf <411214987@qq.com> Date: Wed Apr 27 22:09:12 2022 +0800 Fix format commit b24d108b0d7d501b2faa1971feca5a281198d351 Merge: 4c679c061 f1312a8b2 Author: earthmanylf <411214987@qq.com> Date: Wed Apr 27 21:29:33 2022 +0800 Fix conflict commit 4c679c061c1a0be411f613bdbdeb7849af19edf4 Merge: a90e2ecef 0ae377389 Author: earthmanylf <411214987@qq.com> Date: Wed Apr 27 21:15:33 2022 +0800 Fix conflict commit 10e6c7ea2e5783442631213dfc20dd7b9543839d Author: Gunnar Thor Date: Wed Apr 27 09:30:47 2022 +0000 split docstring to conform with linter commit c4b93e8fd870954ec2649abc3fc6172d78d92166 Author: ftshijt <728307998@qq.com> Date: Wed Apr 27 01:49:00 2022 -0400 apply black commit 04d0cd84878701a0ff5e09933581c98ef7e0adac Merge: 72b6b21d5 4a12ab320 Author: ftshijt <728307998@qq.com> Date: Wed Apr 27 01:27:36 2022 -0400 Merge branch 'master' of https://github.com/ftshijt/espnet commit 72b6b21d509a26d30a454525811c3530ee6b297b Author: ftshijt <728307998@qq.com> Date: Wed Apr 27 01:27:09 2022 -0400 add st unit test commit d1e8ac3d8717f8717fb645592c25ee8cafc4060c Author: ftshijt <728307998@qq.com> Date: Wed Apr 27 01:15:18 2022 -0400 update test commit 5fb7dd619293dcd1cc02c6371c4079c22a40a23b Author: ftshijt <728307998@qq.com> Date: Wed Apr 27 00:53:46 2022 -0400 remove requirement for src_token_list commit 4118b1b21f25fc7d8aa56658cd7ff691684884be Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 27 10:31:42 2022 +0800 fix conflict commit 5436784241eaa4f60e0990627758a841e7927651 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 27 10:06:19 2022 +0800 Update test_integration_espnet2.sh commit 469168b4451b4922306b3393598d199a514acd50 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 27 10:04:56 2022 +0800 fix issue commit 06ddfe19a346f1ea8b620e4eb5bf61bfdcfc3309 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 27 10:01:38 2022 +0800 fix conflict commit 5a81f91ce6734745272e6d960261797cfcb3dd41 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 27 09:57:18 2022 +0800 fix conflict commit 91d48d920c229af3902fc05c361ba1b5f1636c67 Author: Gunnar Thor Date: Tue Apr 26 22:21:13 2022 +0000 applied black commit ec518ccc74b85e3b50304ab70ae5a1f069df0038 Author: Gunnar Thor Date: Wed Feb 23 11:31:56 2022 +0000 Add progress bar to phonemization commit f1312a8b2eeecf57f740b963b832dc4a806ac5f8 Author: earthmanylf <43513215+earthmanylf@users.noreply.github.com> Date: Mon Apr 25 10:37:19 2022 +0800 Update README.md Co-authored-by: Wangyou Zhang commit a90e2ecef4854884dc525345a466f33fce79bd0a Author: earthmanylf <411214987@qq.com> Date: Sun Apr 24 22:55:54 2022 +0800 Fix format problems commit be0112bf99c7caf787feba50c7dbc47a1879dbfb Author: earthmanylf <411214987@qq.com> Date: Sun Apr 24 22:06:45 2022 +0800 Fix format problems commit 16acdadb6dba56d0f91a3132b540a01c9bd25c89 Merge: feb28baf9 f6a2522ad Author: earthmanylf <411214987@qq.com> Date: Sun Apr 24 21:14:02 2022 +0800 Fix conflict commit 95be28ab0e48415922677a92639833d648f3844c Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Apr 23 14:47:11 2022 +0800 Fix CI commit a0966f61701041228c96924359b8e6678960a31a Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Apr 23 14:46:10 2022 +0800 Fix CI commit 1daecd4570f477da905e4365ff30e4c0be53ca44 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Apr 23 14:44:21 2022 +0800 fix CI commit 7261735b82173ae5ac377844fad2f3b9289e08ec Merge: 809106e2a f6a2522ad Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Apr 23 14:21:06 2022 +0800 Merge pull request #15 from espnet/master Merging the latest ESPnet commit 809106e2a512990b30fd1afcf2c7bf897d185d58 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Apr 23 12:33:18 2022 +0800 show the log result commit 65b53563cac0fdc09d653112f85dd735313cb650 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Apr 23 11:10:41 2022 +0800 show the error report in the log commit 36bdfcbfd0731e543db130b6fb756e140f9f2cb2 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 21 15:21:07 2022 +0800 fix ci commit c8e05efd90ea4c9f775b149916d05f0f74092157 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 21 11:30:54 2022 +0800 fix ci commit 4831a6671728e52f0b2a0766a7c4cb60dd3d470f Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 20 20:34:26 2022 +0800 fix CI commit 26fc7e1b41c57dc5c6a6882fe20a8847ee5a055c Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 20 16:37:29 2022 +0800 Add files via upload commit b7c7bf13f9df6d9c09888c21c5c071c15f1023bc Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 20 15:19:37 2022 +0800 fix ci commit 2b1b6bbef15553a11862a9c74352bed95412337d Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 20 11:33:40 2022 +0800 fix fbank_pitch issue commit 0d5736fc393332465ae49a620392735a22312c97 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 20 11:33:21 2022 +0800 fix fbank_pitch issue commit 835033c70cb2821340481b6e3f695d3afe6cbcd0 Merge: fcf13c412 42eb3108a Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Tue Apr 19 07:36:09 2022 -0400 Merge branch 'espnet:master' into master commit 70c1980b7c8d396bd5d05d8eba50bf90a84bff55 Author: D-Keqi <462975470@qq.com> Date: Tue Apr 19 19:01:41 2022 +0800 fix CI commit fabb3a1fd17b10cbcf252240e0c40243a8c2f971 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:39:39 2022 +0800 update the test_integration_espnet2 commit c08e023e429ad90399f3722d825ccaa33c84b291 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:36:09 2022 +0800 Update and rename tmp to path.sh commit 838d2ecfa767585a3df0161388f5dd5de426695a Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:35:08 2022 +0800 Add files via upload commit 62162ae8938d71f0f9040ee1e27eb40c83882808 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:33:31 2022 +0800 Create tmp commit 9a5585e282b68d44921879385f5a3796bacd1fdb Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:33:00 2022 +0800 Delete t commit 349f4ab3498bc296d46ad4b42a77fda25d5e2286 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:31:43 2022 +0800 add conf commit e3486d24210cb53491518d913df2268a2f03eded Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:28:12 2022 +0800 Create t commit 652cf1774dd442d55082652713bbadbc4b6946a6 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:27:47 2022 +0800 Delete tmp commit 48fcab7a8d8b0ad1a97798fa823d315aa7708d3d Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:27:12 2022 +0800 add st1 of mini_an4 commit 1800b0be298111842ab2a3cf5f39a9ac79c3a86f Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Tue Apr 19 16:25:21 2022 +0800 Create tmp commit 0a1d05b61d611ca8a7b7ca1815ae089781cbdfde Merge: 73ca6e4e4 952a70a70 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Apr 13 10:20:46 2022 +0800 Merge pull request #14 from espnet/master Merge the latest ESPnet commit 73ca6e4e4baddd5f3fb6075788ed3e902021b9c8 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:59:52 2022 +0800 fix ci fix ci commit acd3e0acdc4d4c6eadfa531711906aa29ffb01a0 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:58:34 2022 +0800 fix CI fix CI commit e6da9baea12c6383282bdb716745060be5011a08 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:16:45 2022 +0800 Add files via upload commit fc45fa368bc55b92f94e9ae6f9a6953728f3c894 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:13:53 2022 +0800 Delete README.md commit 5b8c0b567f6b172e2112c5460c45e44b934478a6 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:13:11 2022 +0800 Delete egs2/chime4/asr1/exp/asr_train_asr_streaming_transformer_raw_en_char/decode_asr_streamindt05_real_beamformit_2micsg_lm_lm_train_lm_en_char_valid.loss.ave_asr_model_valid.acc.ave directory commit 87ac110aaf70e2c339bac6ed7c5b60a856acc535 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:10:14 2022 +0800 streaming slu commit 7b7fde9752cd9cd4905d642996215a158bf8d026 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:09:27 2022 +0800 streaming slu commit fcd129620bbbc063dd918b83961d568ad694e45a Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:08:55 2022 +0800 streaming st commit 17fe79ca89b496e4f9b6b4caaa2497816d4855b3 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:07:28 2022 +0800 streaming st commit 812a527bb836a2fbd12ceb6d3bcabcc728d88427 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:06:31 2022 +0800 streaming st commit e69a6d8efcd1ae57aca6315d70a20e484d360f7f Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 17:05:25 2022 +0800 streaming st commit e488037b8d9b3e46476874f62b095ae5b7323e19 Merge: 9fb445053 189e1593d Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Thu Apr 7 15:32:57 2022 +0800 Merge pull request #13 from espnet/master Update lastest espnet commit fcf13c412842d57cf48580dd89ff0d1fc5e6c3e0 Merge: 39700a054 c4aba12f9 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Wed Apr 6 13:35:13 2022 -0400 Merge branch 'espnet:master' into master commit feb28baf9dd6af564fe30920c1c6e70c2258e0de Merge: 3e6167c51 c4aba12f9 Author: earthmanylf <411214987@qq.com> Date: Wed Apr 6 19:24:06 2022 +0800 Add deep clustering end-to-end training method commit 50269e8b4dd0696d02e5da9f70c2d7952a26f392 Author: WeiGodHorse Date: Fri Mar 25 22:58:41 2022 +0800 fix a bug in Mandarin pypinyin_g2p_phone commit 39700a054ac5ed718a1eb74cef9b64b2144b727c Merge: aa706c512 14c635069 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Thu Mar 24 17:42:11 2022 -0400 Merge branch 'espnet:master' into master commit aa706c5122391feee57d4db121a403dfd8ea0ab0 Merge: ab2fa25af 350af365f Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Wed Mar 23 23:34:17 2022 -0400 Merge branch 'espnet:master' into master commit ab2fa25af6dffce3ecdf3e92adaa171d3d156d50 Merge: de5e7139b cb8181a99 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Tue Mar 8 16:03:38 2022 -0500 Merge branch 'espnet:master' into master commit de5e7139b65549adfcac58cb0ee23c32c50634ea Merge: 5ef36bcae 1bac0f080 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Tue Mar 8 15:09:20 2022 -0500 Merge branch 'espnet:master' into master commit 5ef36bcae3fac1792ccc2aae6b7dbab715f094fe Merge: 597cd7bd8 0c246e23c Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Tue Mar 8 13:35:27 2022 -0500 Merge branch 'espnet:master' into master commit 597cd7bd8a0efbe82733d19774297ab90f5c659f Merge: 6625f9056 f16e579e2 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Mon Mar 7 21:54:06 2022 -0500 Merge branch 'espnet:master' into master commit 6625f9056b5087aeb13a2214c770d586c067f5e3 Merge: 5f237866b 5e070668e Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Mon Mar 7 13:35:03 2022 -0500 Merge branch 'espnet:master' into master commit 3e6167c51df23b7629d7830e81e8cf4ea52032fc Author: earthmanylf <411214987@qq.com> Date: Mon Mar 7 20:03:31 2022 +0800 Fixed format in some files commit 294373a121cf0766efe623dc56b12d0990a77c93 Author: earthmanylf <411214987@qq.com> Date: Mon Mar 7 18:26:49 2022 +0800 Update code and add comments in separator commit 5f86c1104cbce4275043e11050b69191834ddbc0 Merge: 7aa90b584 6f429608b Author: earthmanylf <411214987@qq.com> Date: Mon Mar 7 18:06:10 2022 +0800 Add experiment result in egs2/wsj0_2mix/enh1/README.md; Update code in some files commit 5f237866b360028676c7b9e903d15839cdaa0113 Merge: 66c1a798d 6f429608b Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Sun Mar 6 19:26:35 2022 -0500 Merge branch 'espnet:master' into master commit 66c1a798d15f531b4c4b4c1e02cfd1eda6813f92 Merge: 5c5eb0292 a04a98c98 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Thu Mar 3 18:14:47 2022 -0500 Merge branch 'espnet:master' into master commit 7aa90b5844ba1d0050cfd737b2a2fabe9abd5d62 Merge: 5f7e2e714 b274c4ea6 Author: earthmanylf <411214987@qq.com> Date: Thu Mar 3 16:20:25 2022 +0800 Merge branch 'master' of github.com:espnet/espnet into dpclanddan commit 5c5eb0292e28c19345fc71d456348f6353f2e2a4 Merge: bd8e400fa 9863980d2 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Wed Mar 2 12:13:35 2022 -0500 Merge branch 'espnet:master' into master commit bd8e400fa37ebc1b77f7a938ae9275bb18de6fe5 Merge: 58aec432d 7999009d5 Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Mon Feb 28 20:37:32 2022 -0500 Merge branch 'espnet:master' into master commit 5f7e2e7140cc7204acecda90a6ff1d5379967da6 Merge: d3acdcc3b 637d8c333 Author: earthmanylf <411214987@qq.com> Date: Sun Feb 27 13:19:45 2022 +0800 Merge branch 'master' of github.com:espnet/espnet into dpclanddan commit d3acdcc3bd537cf3f50c8d5c4642dfc488daa656 Author: earthmanylf <411214987@qq.com> Date: Fri Feb 25 18:32:30 2022 +0800 fix bugs of test_dan_separator.py commit c54d9a4087106b56ab5ce4ec9758aeb74bca0b4c Author: earthmanylf <411214987@qq.com> Date: Fri Feb 25 16:00:30 2022 +0800 add subs to the abs_separator.py commit c1d9be5f4f9eb32bc75fb7a8b2fe406aa997946c Author: earthmanylf <411214987@qq.com> Date: Fri Feb 25 15:30:46 2022 +0800 update for dpcl and dan commit 58aec432d97300ec12494676a19900a08a950827 Merge: 23a537e2a 9c24b3add Author: Roshan S Sharma <36464960+roshansh-cmu@users.noreply.github.com> Date: Wed Feb 23 16:17:09 2022 -0500 Merge branch 'espnet:master' into master commit 23a537e2ad1ee9af7e8016054208d5ce1cc572fd Author: roshansh-cmu Date: Tue Feb 22 06:50:03 2022 -0500 black fix commit 8572a57af47ef72e9f010601483b31eb96baf03f Merge: 969b333d9 650472b45 Author: roshansh-cmu Date: Mon Feb 21 22:35:49 2022 -0500 Mergefix commit ee20e18a5f0eef55c8b0709e1e6b9bcddf10e4e6 Merge: 63f88c02b a3e1543e9 Author: earthmanylf <43513215+earthmanylf@users.noreply.github.com> Date: Wed Feb 16 14:29:36 2022 +0800 Merge pull request #1 from espnet/master Merge from upstream commit 9fb445053f999b64350e5e7a56a1699a727ed125 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Sep 15 00:30:05 2021 +0800 Update README.md commit 8c6d3e1614a247b78f1b17ff2c6ef3b3725b166a Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Wed Sep 15 00:29:31 2021 +0800 Update README.md commit 2411dbb82b08aee182df0738a47d7f6f44bdcea8 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Mon Sep 13 13:08:52 2021 +0800 Update README.md commit 3edc1a6d816428b3e4e099271dc51c117b9c8d3b Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Mon Sep 13 13:08:25 2021 +0800 Update README.md commit d4d4b7e450992867bc0ee91ffb467ec38ad6981c Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Sep 11 23:11:39 2021 +0800 Update README.md commit 885ab0552dc26076b0b581eb88813f426179fdcb Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Sep 4 10:48:05 2021 +0800 add results commit dfba960da5e60cd9d78c439b7fa0e400332fbe46 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Sep 4 10:43:36 2021 +0800 create exp commit 391d7c78f310313ca78abc1b3341183a15336579 Author: D-Keqi <61508571+D-Keqi@users.noreply.github.com> Date: Sat Sep 4 10:40:23 2021 +0800 steaming results --- .github/workflows/centos7.yml | 2 +- .github/workflows/ci.yaml | 6 +- .github/workflows/debian9.yml | 2 +- .github/workflows/test_import.yaml | 2 +- .mergify.yml | 10 +- CONTRIBUTING.md | 17 +- README.md | 41 +- ci/install.sh | 2 +- ci/test_integration_espnet2.sh | 49 +- ci/test_python.sh | 9 +- egs/arctic/tts1/local/clean_text.py | 1 - egs/chime6/asr1/local/extract_noises.py | 5 +- egs/chime6/asr1/local/make_noise_list.py | 1 - egs/cmu_indic/tts1/local/clean_text.py | 1 - egs/covost2/st1/local/process_tsv.py | 2 +- egs/csj/asr1/local/csj_rm_tag.py | 3 +- egs/iwslt16/mt1/local/extract_recog_text.py | 2 +- egs/iwslt16/mt1/local/generate_json.py | 6 +- egs/iwslt16/mt1/local/generate_vocab.py | 2 +- egs/iwslt18/st1/local/parse_xml.py | 2 +- egs/iwslt21/asr1/local/filter_parentheses.py | 1 + .../st1/local/data_prep.py | 1 - egs/jnas/asr1/local/filter_text.py | 3 +- .../asr1/local/get_space_normalized_hyps.py | 2 +- .../asr1/local/get_transcriptions.py | 3 +- egs/libri_css/asr1/local/best_wer_matching.py | 3 +- .../asr1/local/get_perspeaker_output.py | 2 +- egs/libri_css/asr1/local/prepare_data.py | 1 + .../local/segmentation/apply_webrtcvad.py | 1 + egs/ljspeech/tts1/local/clean_text.py | 2 +- egs/lrs/avsr1/local/se_batch.py | 5 +- egs/mgb2/asr1/local/process_xml.py | 3 +- egs/mgb2/asr1/local/text_segmenting.py | 1 + .../asr1/local/data_prep.py | 5 +- .../asr1/local/construct_dataset.py | 1 - egs/puebla_nahuatl/asr1/local/data_prep.py | 2 - egs/puebla_nahuatl/st1/local/data_prep.py | 2 +- egs/reverb/asr1/local/filterjson.py | 3 +- egs/reverb/asr1/local/run_wpe.py | 7 +- egs/reverb/asr1_multich/local/filterjson.py | 2 +- egs/tweb/tts1/local/clean_text.py | 1 - egs/vais1000/tts1/local/clean_text.py | 1 - .../tts1_en_fi/local/clean_text_css10.py | 16 +- .../vc1_task1/local/clean_text_asr_result.py | 2 +- .../vc1_task2/local/clean_text_finnish.py | 20 +- .../vc1_task2/local/clean_text_german.py | 3 +- .../vc1_task2/local/clean_text_mandarin.py | 7 +- egs/vcc20/voc1/local/subset_data_dir.py | 2 +- egs/voxforge/asr1/local/filter_text.py | 3 +- egs/wsj/asr1/local/filtering_samples.py | 5 +- egs/wsj_mix/asr1/local/merge_scp2json.py | 2 +- egs/wsj_mix/asr1/local/mergejson.py | 1 - .../asr1/local/data_prep.py | 5 +- egs2/README.md | 1 + egs2/TEMPLATE/asr1/asr.sh | 12 +- egs2/TEMPLATE/asr1/db.sh | 1 + .../asr1/pyscripts/audio/format_wav_scp.py | 6 +- .../pyscripts/utils/convert_text_to_phn.py | 40 +- .../asr1/pyscripts/utils/evaluate_f0.py | 6 +- .../asr1/pyscripts/utils/evaluate_mcd.py | 6 +- .../asr1/pyscripts/utils/extract_xvectors.py | 10 +- .../asr1/pyscripts/utils/plot_sinc_filters.py | 5 +- .../asr1/pyscripts/utils/rotate_logfile.py | 59 ++ .../asr1/pyscripts/utils/score_intent.py | 3 +- .../pyscripts/utils/score_summarization.py | 9 +- .../asr1/scripts/utils/evaluate_asr.sh | 4 +- .../scripts/utils/show_translation_result.sh | 23 +- egs2/TEMPLATE/diar1/diar.sh | 8 +- .../diar1/pyscripts/utils/convert_rttm.py | 13 +- .../diar1/pyscripts/utils/make_rttm.py | 6 +- egs2/TEMPLATE/enh1/enh.sh | 8 +- egs2/TEMPLATE/enh_asr1/enh_asr.sh | 12 +- .../enh_asr1/scripts/utils/show_enh_score.sh | 85 +- egs2/TEMPLATE/enh_st1/enh_st.sh | 32 +- egs2/TEMPLATE/mt1/mt.sh | 38 +- egs2/TEMPLATE/ssl1/hubert.sh | 100 +- egs2/TEMPLATE/ssl1/pyscripts/dump_km_label.py | 8 +- .../TEMPLATE/ssl1/pyscripts/feature_loader.py | 3 +- egs2/TEMPLATE/ssl1/pyscripts/sklearn_km.py | 12 +- egs2/TEMPLATE/st1/st.sh | 140 ++- egs2/TEMPLATE/tts1/README.md | 6 + egs2/TEMPLATE/tts1/tts.sh | 8 +- .../asr1/local/remove_missing.py | 1 - egs2/aishell3/tts1/local/data_prep.py | 1 + .../local/generate_fe_trainingdata.py.patch | 16 +- .../local/prepare_audioset_category_list.py | 2 +- egs2/aishell4/enh1/local/split_train_dev.py | 10 +- .../enh1/local/split_train_dev_by_column.py | 6 +- .../enh1/local/split_train_dev_by_prefix.py | 6 +- egs2/bn_openslr53/asr1/local/data_prep.py | 1 - egs2/bur_openslr80/asr1/local/data_prep.py | 1 - egs2/catslu/asr1/local/data_prep.py | 4 +- ...-3_specaug_accum1_preenc128_warmup20k.yaml | 90 ++ egs2/chime4/asr1/local/sym_channel.py | 2 +- .../tuning/train_enh_convtasnet_small.yaml | 64 ++ egs2/chime4/enh_asr1/README.md | 97 ++ .../enh_2021 => chime4/enh_asr1}/cmd.sh | 0 egs2/chime4/enh_asr1/conf/chime4.cfg | 1 + .../enh_asr1/conf/decode_asr_transformer.yaml | 7 + egs2/chime4/enh_asr1/conf/fbank.conf | 2 + .../enh_asr1}/conf/pbs.conf | 0 egs2/chime4/enh_asr1/conf/pitch.conf | 1 + .../enh_asr1}/conf/queue.conf | 0 .../enh_asr1}/conf/slurm.conf | 0 ..._enh_asr_convtasnet_fbank_transformer.yaml | 1 + .../enh_asr1/conf/train_lm_transformer.yaml | 48 + ...it_lr1e-4_accum1_adam_specaug_bypass0.yaml | 124 +++ ...ormer_lr2e-3_accum2_warmup20k_specaug.yaml | 119 +++ egs2/chime4/enh_asr1/db.sh | 1 + egs2/chime4/enh_asr1/enh_asr.sh | 1 + .../CHiME3_simulate_data_patched_parallel.m | 1 + .../enh_asr1/local/bth_chime4_data_prep.sh | 1 + egs2/chime4/enh_asr1/local/chime4_asr_data.sh | 1 + egs2/chime4/enh_asr1/local/chime4_enh_data.sh | 1 + .../local/clean_chime4_format_data.sh | 1 + .../enh_asr1/local/clean_wsj0_data_prep.sh | 1 + egs2/chime4/enh_asr1/local/cstr_ndx2flist.pl | 1 + egs2/chime4/enh_asr1/local/data.sh | 89 ++ .../enh_asr1/local/find_noisy_transcripts.pl | 1 + .../chime4/enh_asr1/local/find_transcripts.pl | 1 + egs2/chime4/enh_asr1/local/flist2scp.pl | 1 + egs2/chime4/enh_asr1/local/localize.m | 1 + egs2/chime4/enh_asr1/local/make_stft.sh | 1 + egs2/chime4/enh_asr1/local/ndx2flist.pl | 1 + .../enh_asr1/local/normalize_transcript.pl | 1 + .../enh_asr1}/local/path.sh | 0 .../local/real_enhan_chime4_data_prep.sh | 1 + .../local/real_ext_chime4_data_prep.sh | 1 + .../local/real_noisy_chime4_data_prep.sh | 1 + .../enh_asr1/local/run_beamform_2ch_track.sh | 1 + .../enh_asr1/local/run_beamform_6ch_track.sh | 1 + .../enh_asr1/local/show_enhance_results.sh | 1 + .../local/simu_enhan_chime4_data_prep.sh | 1 + .../local/simu_ext_chime4_data_prep.sh | 1 + .../local/simu_noisy_chime4_data_prep.sh | 1 + egs2/chime4/enh_asr1/local/sym_channel.py | 1 + egs2/chime4/enh_asr1/local/wsj_data_prep.sh | 1 + egs2/chime4/enh_asr1/local/wsj_format_data.sh | 1 + egs2/chime4/enh_asr1/path.sh | 1 + egs2/chime4/enh_asr1/pyscripts | 1 + egs2/chime4/enh_asr1/run.sh | 45 + egs2/chime4/enh_asr1/scripts | 1 + egs2/chime4/enh_asr1/steps | 1 + egs2/chime4/enh_asr1/utils | 1 + egs2/chime6/asr1/README.md | 30 + egs2/chime6/asr1/asr.sh | 1 + egs2/chime6/asr1/cmd.sh | 110 ++ .../asr1/conf/decode_asr_transformer.yaml | 7 + egs2/chime6/asr1/conf/fbank.conf | 2 + egs2/chime6/asr1/conf/pbs.conf | 11 + egs2/chime6/asr1/conf/pitch.conf | 1 + egs2/chime6/asr1/conf/queue.conf | 12 + egs2/chime6/asr1/conf/slurm.conf | 14 + egs2/chime6/asr1/conf/train_lm.yaml | 16 + ...-3_specaug_accum1_preenc128_warmup20k.yaml | 87 ++ .../{clarity21/enh_2021 => chime6/asr1}/db.sh | 0 egs2/chime6/asr1/local/check_tools.sh | 1 + egs2/chime6/asr1/local/data.sh | 53 + egs2/chime6/asr1/local/distant_audio_list | 1 + egs2/chime6/asr1/local/extract_noises.py | 1 + .../chime6/asr1/local/generate_chime6_data.sh | 1 + egs2/chime6/asr1/local/install_pb_chime5.sh | 1 + egs2/chime6/asr1/local/json2text.py | 1 + egs2/chime6/asr1/local/make_noise_list.py | 1 + egs2/chime6/asr1/local/path.sh | 0 .../local/prepare_baseline_chime6_data.sh | 1 + egs2/chime6/asr1/local/prepare_data.sh | 1 + egs2/chime6/asr1/local/prepare_dict.sh | 1 + egs2/chime6/asr1/local/run_gss.sh | 1 + egs2/chime6/asr1/local/train_lms_srilm.sh | 1 + egs2/chime6/asr1/local/wer_output_filter | 1 + egs2/chime6/asr1/path.sh | 1 + .../enh_2021 => chime6/asr1}/pyscripts | 0 egs2/chime6/asr1/run.sh | 44 + .../enh_2021 => chime6/asr1}/scripts | 0 egs2/chime6/asr1/steps | 1 + egs2/chime6/asr1/utils | 1 + egs2/clarity21/{enh_2021 => enh1}/README.md | 0 egs2/clarity21/enh1/cmd.sh | 110 ++ egs2/clarity21/enh1/conf/pbs.conf | 11 + egs2/clarity21/enh1/conf/queue.conf | 12 + egs2/clarity21/enh1/conf/slurm.conf | 14 + .../tuning/train_enh_beamformer_mvdr.yaml | 0 egs2/clarity21/enh1/db.sh | 1 + egs2/clarity21/{enh_2021 => enh1}/enh.sh | 0 .../{enh_2021 => enh1}/local/data.sh | 0 egs2/clarity21/enh1/local/path.sh | 0 .../{enh_2021 => enh1}/local/prep_data.py | 1 - egs2/clarity21/{enh_2021 => enh1}/path.sh | 0 egs2/clarity21/enh1/pyscripts | 1 + egs2/clarity21/{enh_2021 => enh1}/run.sh | 0 egs2/clarity21/enh1/scripts | 1 + egs2/clarity21/{enh_2021 => enh1}/steps | 0 egs2/clarity21/{enh_2021 => enh1}/utils | 0 .../enh1/local/prepare_dev_data.py | 2 +- egs2/covost2/st1/conf/fbank.conf | 2 +- egs2/covost2/st1/conf/pitch.conf | 2 +- egs2/covost2/st1/run.sh | 8 +- .../dirha_wsj/asr1/local/prepare_dirha_wsj.py | 4 +- egs2/dsing/asr1/local/data_prep.py | 8 +- .../st1/conf/decode_streaming_st.yaml | 5 + .../st1/conf/train_st_streaming.yaml | 95 ++ egs2/fisher_callhome_spanish/st1/run.sh | 1 + .../conf/train_asr_streaming_transformer.yaml | 58 ++ egs2/fsc/asr1/local/data_prep.py | 1 + egs2/fsc/asr1/run.sh | 2 +- egs2/fsc_challenge/asr1/local/data_prep.py | 3 +- egs2/fsc_challenge/asr1/run.sh | 2 +- egs2/fsc_unseen/asr1/local/data_prep.py | 3 +- egs2/fsc_unseen/asr1/run.sh | 2 +- egs2/grabo/asr1/local/data_prep.py | 5 +- egs2/grabo/asr1/local/score.py | 2 +- egs2/indic_speech/tts1/local/data_prep.py | 1 - egs2/iwslt14/mt1/run.sh | 4 +- .../asr1/local/prepare_alffa_data.py | 2 +- .../asr1/local/prepare_iwslt_data.py | 1 + egs2/iwslt22_dialect/asr1/local/preprocess.py | 6 +- egs2/iwslt22_dialect/st1/local/preprocess.py | 6 +- egs2/jdcinal/asr1/local/score.py | 1 + egs2/jkac/tts1/local/prep_segments.py | 3 +- egs2/jmd/tts1/local/clean_text.py | 1 - egs2/jtubespeech/tts1/local/prune.py | 7 +- egs2/jtubespeech/tts1/local/split.py | 7 +- egs2/jv_openslr35/asr1/local/data_prep.py | 1 - .../asr1/local/get_space_normalized_hyps.py | 2 +- .../asr1/local/get_transcriptions.py | 3 +- egs2/kss/tts1/conf/tuning/train_jets.yaml | 218 ++++ .../diar1/local/prepare_diarization.py | 2 +- .../ljspeech/tts1/conf/tuning/train_jets.yaml | 218 ++++ .../local/feature_extract/cvtransforms.py | 1 + .../feature_extract/extract_visual_feature.py | 4 +- .../feature_extract/models/pretrained.py | 3 +- .../local/feature_extract/video_processing.py | 6 +- egs2/lrs3/asr1/local/data_prep.py | 7 +- egs2/mediaspeech/asr1/local/data_prep.py | 9 +- egs2/microsoft_speech/asr1/local/process.py | 6 +- egs2/mini_an4/st1/cmd.sh | 110 ++ egs2/mini_an4/st1/conf/fbank.conf | 2 + egs2/mini_an4/st1/conf/pbs.conf | 11 + egs2/mini_an4/st1/conf/pitch.conf | 1 + egs2/mini_an4/st1/conf/queue.conf | 12 + egs2/mini_an4/st1/conf/slurm.conf | 14 + egs2/mini_an4/st1/conf/train_st.yaml | 6 + .../mini_an4/st1/conf/train_st_streaming.yaml | 9 + egs2/mini_an4/st1/db.sh | 1 + egs2/mini_an4/st1/downloads.tar.gz | 1 + egs2/mini_an4/st1/local/data.sh | 85 ++ egs2/mini_an4/st1/local/data_prep.py | 1 + egs2/mini_an4/st1/local/download_and_untar.sh | 1 + egs2/mini_an4/st1/local/path.sh | 0 egs2/mini_an4/st1/path.sh | 1 + egs2/mini_an4/st1/pyscripts | 1 + egs2/mini_an4/st1/run.sh | 29 + egs2/mini_an4/st1/scripts | 1 + egs2/mini_an4/st1/st.sh | 1 + egs2/mini_an4/st1/steps | 1 + egs2/mini_an4/st1/utils | 1 + .../diar1/local/simulation/make_mixture.py | 5 +- .../simulation/make_mixture_nooverlap.py | 5 +- .../diar1/local/simulation/random_mixture.py | 7 +- .../simulation/random_mixture_nooverlap.py | 7 +- egs2/misp2021/asr1/local/find_wav.py | 6 +- egs2/misp2021/asr1/local/prepare_far_data.py | 8 +- egs2/misp2021/asr1/local/run_beamformit.py | 2 +- egs2/misp2021/asr1/local/run_wpe.py | 9 +- .../avsr1/local/concatenate_feature.py | 5 +- egs2/misp2021/avsr1/local/find_wav.py | 6 +- egs2/misp2021/avsr1/local/prepare_far_data.py | 8 +- .../avsr1/local/prepare_far_video_roi.py | 11 +- .../prepare_visual_embedding_extractor.py | 5 +- egs2/misp2021/avsr1/local/run_beamformit.py | 2 +- egs2/misp2021/avsr1/local/run_wpe.py | 9 +- egs2/ml_openslr63/asr1/local/data_prep.py | 1 - egs2/mr_openslr64/asr1/local/data_prep.py | 1 - egs2/ms_indic_18/asr1/local/prepare_data.py | 2 +- egs2/open_li52/asr1/local/filter_text.py | 3 +- .../asr1/local/data_prep.py | 9 +- egs2/seame/asr1/local/preprocess.py | 8 +- egs2/seame/asr1/local/split_lang_trn.py | 5 +- egs2/sinhala/asr1/local/data_prep.py | 8 +- .../asr1/local/data_prep_slue.py | 1 + egs2/slue-voxceleb/asr1/local/f1_score.py | 6 +- .../asr1/local/generate_asr_files.py | 3 +- .../local/data_prep_original_slue_format.py | 3 +- ...ta_prep_original_slue_format_transcript.py | 3 +- egs2/slue-voxpopuli/asr1/local/eval_utils.py | 5 +- egs2/slue-voxpopuli/asr1/local/score.py | 4 +- .../conf/train_asr_streaming_transformer.yaml | 69 ++ egs2/slurp/asr1/local/prepare_slurp_data.py | 4 +- .../asr1/local/convert_to_entity_file.py | 4 +- .../asr1/local/evaluation/evaluate.py | 5 +- .../asr1/local/evaluation/metrics/__init__.py | 3 +- .../asr1/local/evaluation/metrics/distance.py | 3 +- .../asr1/local/evaluation/util.py | 5 +- .../asr1/local/prepare_slurp_data.py | 4 +- .../asr1/local/prepare_slurp_entity_data.py | 4 +- egs2/snips/asr1/local/data_prep.py | 2 +- .../speechcommands/asr1/local/data_prep_12.py | 8 +- .../speechcommands/asr1/local/data_prep_35.py | 4 +- egs2/speechcommands/asr1/local/score.py | 2 +- .../asr1/local/sunda_data_prep.py | 1 - .../asr1/local/prepare_sentiment.py | 4 +- egs2/swbd_sentiment/asr1/local/score_f1.py | 3 +- egs2/totonac/asr1/local/data_prep.py | 5 +- egs2/wenetspeech/asr1/local/extract_meta.py | 4 +- egs2/wenetspeech/asr1/local/process_opus.py | 5 +- egs2/wsj0_2mix/enh1/README.md | 59 ++ .../enh1/conf/tuning/train_enh_dan_tf.yaml | 65 ++ .../enh1/conf/tuning/train_enh_dpcl.yaml | 62 ++ .../enh1/conf/tuning/train_enh_dpcl_e2e.yaml | 66 ++ .../enh1/conf/tuning/train_enh_mdc.yaml | 62 ++ .../asr1/local/filter_text.py | 3 +- egs2/zh_openslr38/asr1/local/data_split.py | 4 +- espnet/asr/asr_utils.py | 1 - espnet/asr/chainer_backend/asr.py | 38 +- espnet/asr/pytorch_backend/asr.py | 52 +- espnet/asr/pytorch_backend/asr_init.py | 6 +- espnet/asr/pytorch_backend/asr_mix.py | 44 +- espnet/asr/pytorch_backend/recog.py | 15 +- espnet/bin/asr_align.py | 21 +- espnet/bin/asr_enhance.py | 4 +- espnet/bin/asr_recog.py | 2 +- espnet/bin/mt_trans.py | 2 +- espnet/bin/tts_decode.py | 3 +- espnet/bin/vc_decode.py | 3 +- espnet/lm/chainer_backend/extlm.py | 1 + espnet/lm/chainer_backend/lm.py | 30 +- espnet/lm/lm_utils.py | 10 +- espnet/lm/pytorch_backend/lm.py | 42 +- espnet/mt/pytorch_backend/mt.py | 39 +- espnet/nets/batch_beam_search.py | 9 +- espnet/nets/batch_beam_search_online.py | 22 +- espnet/nets/batch_beam_search_online_sim.py | 3 +- espnet/nets/beam_search.py | 12 +- espnet/nets/beam_search_transducer.py | 22 +- espnet/nets/chainer_backend/ctc.py | 2 +- .../chainer_backend/deterministic_embed_id.py | 10 +- espnet/nets/chainer_backend/e2e_asr.py | 2 +- .../chainer_backend/e2e_asr_transformer.py | 24 +- espnet/nets/chainer_backend/rnn/attentions.py | 1 - espnet/nets/chainer_backend/rnn/decoders.py | 6 +- espnet/nets/chainer_backend/rnn/encoders.py | 3 +- espnet/nets/chainer_backend/rnn/training.py | 17 +- .../chainer_backend/transformer/attention.py | 2 - .../chainer_backend/transformer/decoder.py | 4 +- .../transformer/decoder_layer.py | 3 +- .../chainer_backend/transformer/embedding.py | 1 - .../chainer_backend/transformer/encoder.py | 13 +- .../transformer/encoder_layer.py | 3 +- .../transformer/label_smoothing_loss.py | 1 - .../transformer/positionwise_feed_forward.py | 2 - .../transformer/subsampling.py | 7 +- .../chainer_backend/transformer/training.py | 12 +- espnet/nets/ctc_prefix_score.py | 3 +- espnet/nets/e2e_asr_common.py | 2 +- .../pytorch_backend/conformer/argument.py | 2 +- .../contextual_block_encoder_layer.py | 3 +- .../nets/pytorch_backend/conformer/encoder.py | 23 +- .../conformer/encoder_layer.py | 1 - espnet/nets/pytorch_backend/ctc.py | 8 +- espnet/nets/pytorch_backend/e2e_asr.py | 32 +- .../nets/pytorch_backend/e2e_asr_conformer.py | 8 +- .../nets/pytorch_backend/e2e_asr_maskctc.py | 20 +- espnet/nets/pytorch_backend/e2e_asr_mix.py | 30 +- .../e2e_asr_mix_transformer.py | 8 +- espnet/nets/pytorch_backend/e2e_asr_mulenc.py | 17 +- .../pytorch_backend/e2e_asr_transducer.py | 42 +- .../pytorch_backend/e2e_asr_transformer.py | 42 +- espnet/nets/pytorch_backend/e2e_mt.py | 14 +- .../pytorch_backend/e2e_mt_transformer.py | 23 +- espnet/nets/pytorch_backend/e2e_st.py | 31 +- .../nets/pytorch_backend/e2e_st_conformer.py | 8 +- .../pytorch_backend/e2e_st_transformer.py | 27 +- .../pytorch_backend/e2e_tts_fastspeech.py | 28 +- .../nets/pytorch_backend/e2e_tts_tacotron2.py | 7 +- .../pytorch_backend/e2e_tts_transformer.py | 25 +- .../nets/pytorch_backend/e2e_vc_tacotron2.py | 16 +- .../pytorch_backend/e2e_vc_transformer.py | 28 +- .../frontends/dnn_beamformer.py | 10 +- .../nets/pytorch_backend/frontends/dnn_wpe.py | 2 +- .../frontends/feature_transform.py | 4 +- .../pytorch_backend/frontends/frontend.py | 5 +- .../frontends/mask_estimator.py | 3 +- espnet/nets/pytorch_backend/lm/default.py | 6 +- espnet/nets/pytorch_backend/lm/transformer.py | 6 +- espnet/nets/pytorch_backend/nets_utils.py | 3 +- espnet/nets/pytorch_backend/rnn/attentions.py | 5 +- espnet/nets/pytorch_backend/rnn/decoders.py | 20 +- espnet/nets/pytorch_backend/rnn/encoders.py | 14 +- espnet/nets/pytorch_backend/tacotron2/cbhg.py | 4 +- .../nets/pytorch_backend/tacotron2/decoder.py | 1 - .../nets/pytorch_backend/tacotron2/encoder.py | 7 +- .../pytorch_backend/transducer/arguments.py | 2 +- .../nets/pytorch_backend/transducer/blocks.py | 30 +- .../pytorch_backend/transducer/conv1d_nets.py | 4 +- .../transducer/custom_decoder.py | 23 +- .../transducer/custom_encoder.py | 5 +- .../transducer/error_calculator.py | 4 +- .../pytorch_backend/transducer/initializer.py | 2 +- .../pytorch_backend/transducer/rnn_decoder.py | 15 +- .../pytorch_backend/transducer/rnn_encoder.py | 11 +- .../transducer/transducer_tasks.py | 11 +- .../transducer/transformer_decoder_layer.py | 2 +- .../nets/pytorch_backend/transducer/utils.py | 9 +- .../nets/pytorch_backend/transducer/vgg2l.py | 3 +- .../contextual_block_encoder_layer.py | 1 - .../pytorch_backend/transformer/decoder.py | 7 +- .../transformer/dynamic_conv.py | 3 +- .../transformer/dynamic_conv2d.py | 3 +- .../pytorch_backend/transformer/embedding.py | 1 + .../pytorch_backend/transformer/encoder.py | 17 +- .../transformer/encoder_layer.py | 1 - .../pytorch_backend/transformer/lightconv.py | 3 +- .../transformer/lightconv2d.py | 3 +- .../transformer/longformer_attention.py | 3 +- .../nets/pytorch_backend/transformer/plot.py | 2 +- .../transformer/subsampling.py | 4 +- .../transformer/subsampling_without_posenc.py | 1 + espnet/nets/pytorch_backend/wavenet.py | 1 - espnet/nets/scorer_interface.py | 6 +- espnet/nets/scorers/ctc.py | 3 +- espnet/nets/scorers/length_bonus.py | 4 +- espnet/nets/scorers/ngram.py | 3 +- espnet/nets/transducer_decoder_interface.py | 7 +- espnet/nets/tts_interface.py | 1 - espnet/optimizer/chainer.py | 4 +- espnet/optimizer/pytorch.py | 4 +- espnet/st/pytorch_backend/st.py | 40 +- espnet/transform/transformation.py | 7 +- espnet/tts/pytorch_backend/tts.py | 24 +- espnet/utils/cli_utils.py | 2 +- espnet/utils/io_utils.py | 2 +- espnet/utils/training/iterators.py | 7 +- espnet/utils/training/train_utils.py | 3 +- espnet/vc/pytorch_backend/vc.py | 24 +- espnet2/asr/decoder/abs_decoder.py | 3 +- espnet2/asr/decoder/mlm_decoder.py | 4 +- espnet2/asr/decoder/rnn_decoder.py | 5 +- espnet2/asr/decoder/transformer_decoder.py | 9 +- espnet2/asr/encoder/abs_encoder.py | 6 +- espnet2/asr/encoder/conformer_encoder.py | 51 +- .../contextual_block_conformer_encoder.py | 34 +- .../contextual_block_transformer_encoder.py | 25 +- espnet2/asr/encoder/hubert_encoder.py | 17 +- espnet2/asr/encoder/longformer_encoder.py | 43 +- espnet2/asr/encoder/rnn_encoder.py | 9 +- espnet2/asr/encoder/transformer_encoder.py | 30 +- espnet2/asr/encoder/vgg_rnn_encoder.py | 6 +- espnet2/asr/encoder/wav2vec2_encoder.py | 7 +- espnet2/asr/espnet_model.py | 24 +- espnet2/asr/frontend/abs_frontend.py | 3 +- espnet2/asr/frontend/default.py | 6 +- espnet2/asr/frontend/fused.py | 10 +- espnet2/asr/frontend/s3prl.py | 10 +- espnet2/asr/frontend/windowing.py | 6 +- espnet2/asr/maskctc_model.py | 32 +- espnet2/asr/postencoder/abs_postencoder.py | 3 +- .../hugging_face_transformers_postencoder.py | 11 +- espnet2/asr/preencoder/abs_preencoder.py | 3 +- espnet2/asr/preencoder/linear.py | 5 +- espnet2/asr/preencoder/sinc.py | 11 +- espnet2/asr/specaug/abs_specaug.py | 3 +- espnet2/asr/specaug/specaug.py | 7 +- .../asr/transducer/beam_search_transducer.py | 18 +- espnet2/asr/transducer/error_calculator.py | 3 +- espnet2/asr/transducer/transducer_decoder.py | 10 +- espnet2/bin/aggregate_stats_dirs.py | 5 +- espnet2/bin/asr_align.py | 32 +- espnet2/bin/asr_inference.py | 107 +- espnet2/bin/asr_inference_k2.py | 19 +- espnet2/bin/asr_inference_maskctc.py | 22 +- espnet2/bin/asr_inference_streaming.py | 143 ++- espnet2/bin/diar_inference.py | 23 +- espnet2/bin/enh_inference.py | 20 +- espnet2/bin/enh_scoring.py | 10 +- espnet2/bin/launch.py | 5 +- espnet2/bin/lm_calc_perplexity.py | 14 +- espnet2/bin/mt_inference.py | 29 +- espnet2/bin/split_scps.py | 7 +- espnet2/bin/st_inference.py | 29 +- espnet2/bin/st_inference_streaming.py | 611 ++++++++++++ espnet2/bin/tokenize_text.py | 12 +- espnet2/bin/tts_inference.py | 19 +- espnet2/diar/abs_diar.py | 3 +- espnet2/diar/attractor/abs_attractor.py | 3 +- espnet2/diar/decoder/abs_decoder.py | 3 +- espnet2/diar/espnet_model.py | 10 +- espnet2/enh/abs_enh.py | 3 +- espnet2/enh/decoder/abs_decoder.py | 3 +- espnet2/enh/decoder/stft_decoder.py | 4 +- espnet2/enh/encoder/abs_encoder.py | 3 +- espnet2/enh/encoder/stft_encoder.py | 4 +- espnet2/enh/espnet_enh_s2t_model.py | 13 +- espnet2/enh/espnet_model.py | 27 +- espnet2/enh/layers/beamformer.py | 33 +- espnet2/enh/layers/complex_utils.py | 11 +- espnet2/enh/layers/dc_crn.py | 3 +- espnet2/enh/layers/dnn_beamformer.py | 46 +- espnet2/enh/layers/dnn_wpe.py | 8 +- espnet2/enh/layers/dprnn.py | 3 +- espnet2/enh/layers/ifasnet.py | 3 +- espnet2/enh/layers/mask_estimator.py | 13 +- espnet2/enh/layers/skim.py | 4 +- espnet2/enh/layers/wpe.py | 12 +- espnet2/enh/loss/criterions/abs_loss.py | 4 +- espnet2/enh/loss/criterions/tf_domain.py | 112 ++- espnet2/enh/loss/criterions/time_domain.py | 3 +- espnet2/enh/loss/wrappers/abs_wrapper.py | 7 +- espnet2/enh/loss/wrappers/dpcl_solver.py | 32 + espnet2/enh/loss/wrappers/pit_solver.py | 11 +- espnet2/enh/separator/abs_separator.py | 6 +- espnet2/enh/separator/asteroid_models.py | 12 +- espnet2/enh/separator/conformer_separator.py | 22 +- espnet2/enh/separator/dan_separator.py | 165 +++ espnet2/enh/separator/dc_crn_separator.py | 17 +- espnet2/enh/separator/dccrn_separator.py | 27 +- espnet2/enh/separator/dpcl_e2e_separator.py | 182 ++++ espnet2/enh/separator/dpcl_separator.py | 138 +++ espnet2/enh/separator/dprnn_separator.py | 20 +- espnet2/enh/separator/fasnet_separator.py | 15 +- espnet2/enh/separator/neural_beamformer.py | 11 +- espnet2/enh/separator/rnn_separator.py | 18 +- espnet2/enh/separator/skim_separator.py | 11 +- espnet2/enh/separator/svoice_separator.py | 15 +- espnet2/enh/separator/tcn_separator.py | 16 +- .../enh/separator/transformer_separator.py | 29 +- espnet2/fileio/datadir_writer.py | 5 +- espnet2/fileio/read_text.py | 4 +- espnet2/fileio/rttm.py | 7 +- espnet2/fst/lm_rescore.py | 5 +- espnet2/gan_tts/abs_gan_tts.py | 7 +- espnet2/gan_tts/espnet_model.py | 9 +- espnet2/gan_tts/hifigan/__init__.py | 23 +- espnet2/gan_tts/hifigan/hifigan.py | 6 +- espnet2/gan_tts/hifigan/loss.py | 5 +- espnet2/gan_tts/hifigan/residual_block.py | 4 +- espnet2/gan_tts/jets/__init__.py | 1 + espnet2/gan_tts/jets/alignments.py | 165 +++ espnet2/gan_tts/jets/generator.py | 788 +++++++++++++++ espnet2/gan_tts/jets/jets.py | 651 ++++++++++++ espnet2/gan_tts/jets/length_regulator.py | 63 ++ espnet2/gan_tts/jets/loss.py | 212 ++++ espnet2/gan_tts/joint/joint_text2wav.py | 43 +- espnet2/gan_tts/melgan/melgan.py | 5 +- espnet2/gan_tts/melgan/pqmf.py | 1 - espnet2/gan_tts/melgan/residual_stack.py | 3 +- espnet2/gan_tts/parallel_wavegan/__init__.py | 8 +- .../parallel_wavegan/parallel_wavegan.py | 9 +- espnet2/gan_tts/parallel_wavegan/upsample.py | 5 +- espnet2/gan_tts/style_melgan/style_melgan.py | 6 +- espnet2/gan_tts/vits/duration_predictor.py | 13 +- espnet2/gan_tts/vits/flow.py | 5 +- espnet2/gan_tts/vits/generator.py | 7 +- .../gan_tts/vits/monotonic_align/__init__.py | 4 +- espnet2/gan_tts/vits/monotonic_align/setup.py | 7 +- espnet2/gan_tts/vits/posterior_encoder.py | 7 +- espnet2/gan_tts/vits/residual_coupling.py | 4 +- espnet2/gan_tts/vits/text_encoder.py | 1 - espnet2/gan_tts/vits/transform.py | 4 +- espnet2/gan_tts/vits/vits.py | 67 +- espnet2/gan_tts/wavenet/residual_block.py | 4 +- espnet2/gan_tts/wavenet/wavenet.py | 4 +- espnet2/hubert/espnet_model.py | 13 +- espnet2/hubert/hubert_loss.py | 2 +- espnet2/iterators/abs_iter_factory.py | 3 +- espnet2/iterators/chunk_iter_factory.py | 8 +- espnet2/iterators/multiple_iter_factory.py | 4 +- espnet2/iterators/sequence_iter_factory.py | 4 +- espnet2/layers/abs_normalize.py | 3 +- espnet2/layers/global_mvn.py | 5 +- espnet2/layers/inversible_interface.py | 3 +- espnet2/layers/label_aggregation.py | 4 +- espnet2/layers/log_mel.py | 3 +- espnet2/layers/mask_along_axis.py | 4 +- espnet2/layers/sinc_conv.py | 3 +- espnet2/layers/stft.py | 18 +- espnet2/layers/utterance_mvn.py | 2 +- espnet2/lm/abs_model.py | 3 +- espnet2/lm/espnet_model.py | 6 +- espnet2/lm/seq_rnn_lm.py | 3 +- espnet2/lm/transformer_lm.py | 6 +- espnet2/main_funcs/average_nbest_models.py | 7 +- .../main_funcs/calculate_all_attentions.py | 37 +- espnet2/main_funcs/collect_stats.py | 8 +- espnet2/main_funcs/pack_funcs.py | 12 +- espnet2/mt/espnet_model.py | 24 +- espnet2/mt/frontend/embedding.py | 8 +- espnet2/samplers/abs_sampler.py | 6 +- espnet2/samplers/build_batch_sampler.py | 9 +- espnet2/samplers/folded_batch_sampler.py | 9 +- espnet2/samplers/length_batch_sampler.py | 5 +- .../samplers/num_elements_batch_sampler.py | 5 +- espnet2/samplers/sorted_batch_sampler.py | 3 +- espnet2/samplers/unsorted_batch_sampler.py | 3 +- espnet2/schedulers/abs_scheduler.py | 3 +- espnet2/schedulers/noam_lr.py | 2 +- espnet2/st/espnet_model.py | 32 +- espnet2/tasks/abs_task.py | 70 +- espnet2/tasks/asr.py | 50 +- espnet2/tasks/diar.py | 14 +- espnet2/tasks/enh.py | 49 +- espnet2/tasks/enh_s2t.py | 28 +- espnet2/tasks/gan_tts.py | 44 +- espnet2/tasks/hubert.py | 19 +- espnet2/tasks/lm.py | 14 +- espnet2/tasks/mt.py | 36 +- espnet2/tasks/st.py | 45 +- espnet2/tasks/tts.py | 40 +- espnet2/text/abs_tokenizer.py | 6 +- espnet2/text/build_tokenizer.py | 3 +- espnet2/text/char_tokenizer.py | 6 +- espnet2/text/cleaner.py | 2 +- espnet2/text/phoneme_tokenizer.py | 32 +- espnet2/text/sentencepiece_tokenizer.py | 4 +- espnet2/text/token_id_converter.py | 5 +- espnet2/text/word_tokenizer.py | 6 +- espnet2/torch_utils/initialize.py | 1 + espnet2/torch_utils/load_pretrained_model.py | 6 +- espnet2/train/abs_espnet_model.py | 6 +- espnet2/train/abs_gan_espnet_model.py | 6 +- espnet2/train/class_choices.py | 7 +- espnet2/train/collate_fn.py | 9 +- espnet2/train/dataset.py | 23 +- espnet2/train/gan_trainer.py | 22 +- espnet2/train/iterable_dataset.py | 7 +- espnet2/train/preprocessor.py | 12 +- espnet2/train/reporter.py | 22 +- espnet2/train/trainer.py | 36 +- espnet2/tts/abs_tts.py | 6 +- espnet2/tts/espnet_model.py | 9 +- espnet2/tts/fastspeech/fastspeech.py | 33 +- espnet2/tts/fastspeech2/fastspeech2.py | 33 +- espnet2/tts/fastspeech2/loss.py | 5 +- espnet2/tts/fastspeech2/variance_predictor.py | 1 - .../tts/feats_extract/abs_feats_extract.py | 7 +- espnet2/tts/feats_extract/dio.py | 9 +- espnet2/tts/feats_extract/energy.py | 8 +- .../tts/feats_extract/linear_spectrogram.py | 5 +- espnet2/tts/feats_extract/log_mel_fbank.py | 6 +- espnet2/tts/feats_extract/log_spectrogram.py | 5 +- espnet2/tts/gst/style_encoder.py | 4 +- espnet2/tts/tacotron2/tacotron2.py | 23 +- espnet2/tts/transformer/transformer.py | 29 +- espnet2/tts/utils/__init__.py | 8 +- .../parallel_wavegan_pretrained_vocoder.py | 7 +- espnet2/utils/griffin_lim.py | 7 +- espnet2/utils/types.py | 4 +- setup.cfg | 10 +- setup.py | 8 +- test/espnet2/asr/decoder/test_rnn_decoder.py | 2 +- .../asr/decoder/test_transformer_decoder.py | 20 +- ...st_contextual_block_transformer_encoder.py | 4 +- .../asr/encoder/test_longformer_encoder.py | 3 +- test/espnet2/asr/frontend/test_fused.py | 2 +- test/espnet2/asr/frontend/test_s3prl.py | 5 +- ...t_hugging_face_transformers_postencoder.py | 2 +- test/espnet2/asr/preencoder/test_linear.py | 3 +- test/espnet2/asr/preencoder/test_sinc.py | 4 +- test/espnet2/asr/test_maskctc_model.py | 3 +- test/espnet2/bin/test_aggregate_stats_dirs.py | 3 +- test/espnet2/bin/test_asr_align.py | 7 +- test/espnet2/bin/test_asr_inference.py | 110 +- test/espnet2/bin/test_asr_inference_k2.py | 3 +- .../espnet2/bin/test_asr_inference_maskctc.py | 8 +- test/espnet2/bin/test_asr_train.py | 3 +- test/espnet2/bin/test_diar_inference.py | 4 +- test/espnet2/bin/test_diar_train.py | 3 +- test/espnet2/bin/test_enh_inference.py | 6 +- test/espnet2/bin/test_enh_s2t_train.py | 3 +- test/espnet2/bin/test_enh_scoring.py | 3 +- test/espnet2/bin/test_enh_train.py | 3 +- test/espnet2/bin/test_hubert_train.py | 3 +- test/espnet2/bin/test_lm_calc_perplexity.py | 3 +- test/espnet2/bin/test_lm_train.py | 3 +- test/espnet2/bin/test_pack.py | 3 +- test/espnet2/bin/test_st_inference.py | 73 ++ test/espnet2/bin/test_st_train.py | 14 + test/espnet2/bin/test_tokenize_text.py | 3 +- test/espnet2/bin/test_tts_inference.py | 6 +- test/espnet2/bin/test_tts_train.py | 3 +- test/espnet2/enh/decoder/test_stft_decoder.py | 1 - test/espnet2/enh/layers/test_complex_utils.py | 24 +- test/espnet2/enh/layers/test_conv_utils.py | 3 +- test/espnet2/enh/layers/test_enh_layers.py | 17 +- .../enh/loss/criterions/test_tf_domain.py | 36 +- .../enh/loss/criterions/test_time_domain.py | 14 +- .../enh/loss/wrappers/test_dpcl_solver.py | 17 + .../wrappers/test_multilayer_pit_solver.py | 1 - .../enh/loss/wrappers/test_pit_solver.py | 6 +- test/espnet2/enh/separator/test_beamformer.py | 5 +- .../enh/separator/test_conformer_separator.py | 1 - .../enh/separator/test_dan_separator.py | 129 +++ .../enh/separator/test_dc_crn_separator.py | 6 +- .../enh/separator/test_dccrn_separator.py | 5 +- .../enh/separator/test_dpcl_e2e_separator.py | 145 +++ .../enh/separator/test_dpcl_separator.py | 112 +++ .../enh/separator/test_dprnn_separator.py | 1 - .../enh/separator/test_fasnet_separator.py | 1 - .../enh/separator/test_rnn_separator.py | 1 - .../enh/separator/test_skim_separator.py | 1 - .../enh/separator/test_svoice_separator.py | 1 - .../enh/separator/test_tcn_separator.py | 1 - .../separator/test_transformer_separator.py | 1 - test/espnet2/enh/test_espnet_enh_s2t_model.py | 1 - test/espnet2/enh/test_espnet_model.py | 9 +- test/espnet2/fileio/test_npy_scp.py | 6 +- test/espnet2/fileio/test_read_text.py | 3 +- test/espnet2/gan_tts/hifigan/test_hifigan.py | 16 +- test/espnet2/gan_tts/jets/test_jets.py | 944 ++++++++++++++++++ .../gan_tts/joint/test_joint_text2wav.py | 5 +- test/espnet2/gan_tts/melgan/test_melgan.py | 11 +- .../parallel_wavegan/test_parallel_wavegan.py | 14 +- .../gan_tts/style_melgan/test_style_melgan.py | 9 +- test/espnet2/gan_tts/vits/test_generator.py | 10 - test/espnet2/gan_tts/vits/test_vits.py | 18 - test/espnet2/hubert/test_hubert_loss.py | 6 +- .../iterators/test_chunk_iter_factory.py | 4 +- test/espnet2/layers/test_sinc_filters.py | 5 +- test/espnet2/lm/test_seq_rnn_lm.py | 2 +- test/espnet2/lm/test_transformer_lm.py | 2 +- .../test_calculate_all_attentions.py | 4 +- test/espnet2/main_funcs/test_pack_funcs.py | 10 +- test/espnet2/tasks/test_abs_task.py | 4 +- test/espnet2/text/test_phoneme_tokenizer.py | 2 +- .../text/test_sentencepiece_tokenizer.py | 2 +- test/espnet2/torch_utils/test_device_funcs.py | 3 +- test/espnet2/train/test_collate_fn.py | 3 +- test/espnet2/train/test_distributed_utils.py | 10 +- test/espnet2/train/test_reporter.py | 9 +- .../tts/feats_extract/test_log_mel_fbank.py | 2 +- .../tts/feats_extract/test_log_spectrogram.py | 2 +- test/espnet2/utils/test_build_dataclass.py | 2 +- test/espnet2/utils/test_sized_dict.py | 3 +- test/espnet2/utils/test_types.py | 20 +- test/test_asr_init.py | 4 +- test/test_batch_beam_search.py | 9 +- test/test_custom_transducer.py | 11 +- test/test_e2e_asr.py | 5 +- test/test_e2e_asr_conformer.py | 1 + test/test_e2e_asr_maskctc.py | 1 + test/test_e2e_asr_mulenc.py | 2 +- test/test_e2e_asr_transducer.py | 12 +- test/test_e2e_asr_transformer.py | 6 +- test/test_e2e_compatibility.py | 6 +- test/test_e2e_mt.py | 2 +- test/test_e2e_mt_transformer.py | 1 + test/test_e2e_st.py | 2 +- test/test_e2e_st_conformer.py | 1 + test/test_e2e_st_transformer.py | 1 + test/test_e2e_tts_fastspeech.py | 5 +- test/test_e2e_tts_tacotron2.py | 7 +- test/test_e2e_tts_transformer.py | 7 +- test/test_e2e_vc_tacotron2.py | 7 +- test/test_e2e_vc_transformer.py | 7 +- test/test_lm.py | 7 +- test/test_multi_spkrs.py | 4 +- test/test_ngram.py | 4 +- test/test_positional_encoding.py | 9 +- test/test_recog.py | 2 +- test/test_scheduler.py | 8 +- test/test_sentencepiece.py | 1 - test/test_transformer_decode.py | 1 - test/test_utils.py | 6 +- tools/Makefile | 19 +- tools/check_install.py | 10 +- tools/installers/install_chainer.sh | 7 +- tools/installers/install_fairscale.sh | 9 +- tools/installers/install_fairseq.sh | 9 +- tools/installers/install_k2.sh | 11 +- tools/installers/install_longformer.sh | 9 +- tools/installers/install_s3prl.sh | 9 +- tools/installers/install_speechbrain.sh | 5 +- tools/installers/install_torch.sh | 29 +- tools/installers/install_torch_optimizer.sh | 9 +- tools/installers/install_warp-ctc.sh | 9 +- tools/installers/install_warp-transducer.sh | 5 +- utils/addjson.py | 1 - utils/apply-cmvn.py | 5 +- utils/calculate_rtf.py | 3 +- utils/compute-cmvn-stats.py | 3 +- utils/compute-fbank-feats.py | 4 +- utils/compute-stft-feats.py | 4 +- utils/convert_fbank_to_wav.py | 6 +- utils/copy-feats.py | 5 +- utils/dump-pcm.py | 2 +- utils/eval-source-separation.py | 10 +- utils/eval_perm_free_error.py | 2 +- utils/feat-to-shape.py | 3 +- utils/feats2npy.py | 7 +- utils/generate_wav_from_fbank.py | 5 +- utils/json2sctm.py | 5 +- utils/make_pair_json.py | 2 +- utils/mcd_calculate.py | 5 +- utils/merge_scp2json.py | 4 +- utils/spm_train | 1 - utils/text2vocabulary.py | 3 +- 796 files changed, 10222 insertions(+), 3264 deletions(-) create mode 100755 egs2/TEMPLATE/asr1/pyscripts/utils/rotate_logfile.py mode change 120000 => 100755 egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh create mode 100644 egs2/chime4/asr1/conf/tuning/train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k.yaml create mode 100644 egs2/chime4/enh1/conf/tuning/train_enh_convtasnet_small.yaml create mode 100644 egs2/chime4/enh_asr1/README.md rename egs2/{clarity21/enh_2021 => chime4/enh_asr1}/cmd.sh (100%) create mode 120000 egs2/chime4/enh_asr1/conf/chime4.cfg create mode 100644 egs2/chime4/enh_asr1/conf/decode_asr_transformer.yaml create mode 100644 egs2/chime4/enh_asr1/conf/fbank.conf rename egs2/{clarity21/enh_2021 => chime4/enh_asr1}/conf/pbs.conf (100%) create mode 100644 egs2/chime4/enh_asr1/conf/pitch.conf rename egs2/{clarity21/enh_2021 => chime4/enh_asr1}/conf/queue.conf (100%) rename egs2/{clarity21/enh_2021 => chime4/enh_asr1}/conf/slurm.conf (100%) create mode 120000 egs2/chime4/enh_asr1/conf/train_enh_asr_convtasnet_fbank_transformer.yaml create mode 100644 egs2/chime4/enh_asr1/conf/train_lm_transformer.yaml create mode 100644 egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_init_noenhloss_wavlm_transformer_init_lr1e-4_accum1_adam_specaug_bypass0.yaml create mode 100644 egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_si_snr_fbank_transformer_lr2e-3_accum2_warmup20k_specaug.yaml create mode 120000 egs2/chime4/enh_asr1/db.sh create mode 120000 egs2/chime4/enh_asr1/enh_asr.sh create mode 120000 egs2/chime4/enh_asr1/local/CHiME3_simulate_data_patched_parallel.m create mode 120000 egs2/chime4/enh_asr1/local/bth_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/chime4_asr_data.sh create mode 120000 egs2/chime4/enh_asr1/local/chime4_enh_data.sh create mode 120000 egs2/chime4/enh_asr1/local/clean_chime4_format_data.sh create mode 120000 egs2/chime4/enh_asr1/local/clean_wsj0_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/cstr_ndx2flist.pl create mode 100755 egs2/chime4/enh_asr1/local/data.sh create mode 120000 egs2/chime4/enh_asr1/local/find_noisy_transcripts.pl create mode 120000 egs2/chime4/enh_asr1/local/find_transcripts.pl create mode 120000 egs2/chime4/enh_asr1/local/flist2scp.pl create mode 120000 egs2/chime4/enh_asr1/local/localize.m create mode 120000 egs2/chime4/enh_asr1/local/make_stft.sh create mode 120000 egs2/chime4/enh_asr1/local/ndx2flist.pl create mode 120000 egs2/chime4/enh_asr1/local/normalize_transcript.pl rename egs2/{clarity21/enh_2021 => chime4/enh_asr1}/local/path.sh (100%) create mode 120000 egs2/chime4/enh_asr1/local/real_enhan_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/real_ext_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/real_noisy_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/run_beamform_2ch_track.sh create mode 120000 egs2/chime4/enh_asr1/local/run_beamform_6ch_track.sh create mode 120000 egs2/chime4/enh_asr1/local/show_enhance_results.sh create mode 120000 egs2/chime4/enh_asr1/local/simu_enhan_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/simu_ext_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/simu_noisy_chime4_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/sym_channel.py create mode 120000 egs2/chime4/enh_asr1/local/wsj_data_prep.sh create mode 120000 egs2/chime4/enh_asr1/local/wsj_format_data.sh create mode 120000 egs2/chime4/enh_asr1/path.sh create mode 120000 egs2/chime4/enh_asr1/pyscripts create mode 100755 egs2/chime4/enh_asr1/run.sh create mode 120000 egs2/chime4/enh_asr1/scripts create mode 120000 egs2/chime4/enh_asr1/steps create mode 120000 egs2/chime4/enh_asr1/utils create mode 100644 egs2/chime6/asr1/README.md create mode 120000 egs2/chime6/asr1/asr.sh create mode 100644 egs2/chime6/asr1/cmd.sh create mode 100644 egs2/chime6/asr1/conf/decode_asr_transformer.yaml create mode 100644 egs2/chime6/asr1/conf/fbank.conf create mode 100644 egs2/chime6/asr1/conf/pbs.conf create mode 100644 egs2/chime6/asr1/conf/pitch.conf create mode 100644 egs2/chime6/asr1/conf/queue.conf create mode 100644 egs2/chime6/asr1/conf/slurm.conf create mode 100644 egs2/chime6/asr1/conf/train_lm.yaml create mode 100644 egs2/chime6/asr1/conf/tuning/train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k.yaml rename egs2/{clarity21/enh_2021 => chime6/asr1}/db.sh (100%) create mode 120000 egs2/chime6/asr1/local/check_tools.sh create mode 100755 egs2/chime6/asr1/local/data.sh create mode 120000 egs2/chime6/asr1/local/distant_audio_list create mode 120000 egs2/chime6/asr1/local/extract_noises.py create mode 120000 egs2/chime6/asr1/local/generate_chime6_data.sh create mode 120000 egs2/chime6/asr1/local/install_pb_chime5.sh create mode 120000 egs2/chime6/asr1/local/json2text.py create mode 120000 egs2/chime6/asr1/local/make_noise_list.py create mode 100644 egs2/chime6/asr1/local/path.sh create mode 120000 egs2/chime6/asr1/local/prepare_baseline_chime6_data.sh create mode 120000 egs2/chime6/asr1/local/prepare_data.sh create mode 120000 egs2/chime6/asr1/local/prepare_dict.sh create mode 120000 egs2/chime6/asr1/local/run_gss.sh create mode 120000 egs2/chime6/asr1/local/train_lms_srilm.sh create mode 120000 egs2/chime6/asr1/local/wer_output_filter create mode 120000 egs2/chime6/asr1/path.sh rename egs2/{clarity21/enh_2021 => chime6/asr1}/pyscripts (100%) create mode 100755 egs2/chime6/asr1/run.sh rename egs2/{clarity21/enh_2021 => chime6/asr1}/scripts (100%) create mode 120000 egs2/chime6/asr1/steps create mode 120000 egs2/chime6/asr1/utils rename egs2/clarity21/{enh_2021 => enh1}/README.md (100%) create mode 100644 egs2/clarity21/enh1/cmd.sh create mode 100644 egs2/clarity21/enh1/conf/pbs.conf create mode 100644 egs2/clarity21/enh1/conf/queue.conf create mode 100644 egs2/clarity21/enh1/conf/slurm.conf rename egs2/clarity21/{enh_2021 => enh1}/conf/tuning/train_enh_beamformer_mvdr.yaml (100%) create mode 120000 egs2/clarity21/enh1/db.sh rename egs2/clarity21/{enh_2021 => enh1}/enh.sh (100%) rename egs2/clarity21/{enh_2021 => enh1}/local/data.sh (100%) create mode 100644 egs2/clarity21/enh1/local/path.sh rename egs2/clarity21/{enh_2021 => enh1}/local/prep_data.py (99%) rename egs2/clarity21/{enh_2021 => enh1}/path.sh (100%) create mode 120000 egs2/clarity21/enh1/pyscripts rename egs2/clarity21/{enh_2021 => enh1}/run.sh (100%) create mode 120000 egs2/clarity21/enh1/scripts rename egs2/clarity21/{enh_2021 => enh1}/steps (100%) rename egs2/clarity21/{enh_2021 => enh1}/utils (100%) create mode 100644 egs2/fisher_callhome_spanish/st1/conf/decode_streaming_st.yaml create mode 100644 egs2/fisher_callhome_spanish/st1/conf/train_st_streaming.yaml create mode 100644 egs2/fsc/asr1/conf/train_asr_streaming_transformer.yaml create mode 100644 egs2/kss/tts1/conf/tuning/train_jets.yaml create mode 100644 egs2/ljspeech/tts1/conf/tuning/train_jets.yaml create mode 100755 egs2/mini_an4/st1/cmd.sh create mode 100755 egs2/mini_an4/st1/conf/fbank.conf create mode 100755 egs2/mini_an4/st1/conf/pbs.conf create mode 100755 egs2/mini_an4/st1/conf/pitch.conf create mode 100755 egs2/mini_an4/st1/conf/queue.conf create mode 100755 egs2/mini_an4/st1/conf/slurm.conf create mode 100644 egs2/mini_an4/st1/conf/train_st.yaml create mode 100644 egs2/mini_an4/st1/conf/train_st_streaming.yaml create mode 120000 egs2/mini_an4/st1/db.sh create mode 120000 egs2/mini_an4/st1/downloads.tar.gz create mode 100755 egs2/mini_an4/st1/local/data.sh create mode 120000 egs2/mini_an4/st1/local/data_prep.py create mode 120000 egs2/mini_an4/st1/local/download_and_untar.sh create mode 100755 egs2/mini_an4/st1/local/path.sh create mode 120000 egs2/mini_an4/st1/path.sh create mode 120000 egs2/mini_an4/st1/pyscripts create mode 100755 egs2/mini_an4/st1/run.sh create mode 120000 egs2/mini_an4/st1/scripts create mode 120000 egs2/mini_an4/st1/st.sh create mode 120000 egs2/mini_an4/st1/steps create mode 120000 egs2/mini_an4/st1/utils create mode 100644 egs2/slurp/asr1/conf/train_asr_streaming_transformer.yaml create mode 100644 egs2/wsj0_2mix/enh1/conf/tuning/train_enh_dan_tf.yaml create mode 100644 egs2/wsj0_2mix/enh1/conf/tuning/train_enh_dpcl.yaml create mode 100644 egs2/wsj0_2mix/enh1/conf/tuning/train_enh_dpcl_e2e.yaml create mode 100644 egs2/wsj0_2mix/enh1/conf/tuning/train_enh_mdc.yaml create mode 100644 espnet2/bin/st_inference_streaming.py create mode 100644 espnet2/enh/loss/wrappers/dpcl_solver.py create mode 100644 espnet2/enh/separator/dan_separator.py create mode 100644 espnet2/enh/separator/dpcl_e2e_separator.py create mode 100644 espnet2/enh/separator/dpcl_separator.py create mode 100644 espnet2/gan_tts/jets/__init__.py create mode 100644 espnet2/gan_tts/jets/alignments.py create mode 100644 espnet2/gan_tts/jets/generator.py create mode 100644 espnet2/gan_tts/jets/jets.py create mode 100644 espnet2/gan_tts/jets/length_regulator.py create mode 100644 espnet2/gan_tts/jets/loss.py create mode 100644 test/espnet2/bin/test_st_inference.py create mode 100644 test/espnet2/bin/test_st_train.py create mode 100644 test/espnet2/enh/loss/wrappers/test_dpcl_solver.py create mode 100644 test/espnet2/enh/separator/test_dan_separator.py create mode 100644 test/espnet2/enh/separator/test_dpcl_e2e_separator.py create mode 100644 test/espnet2/enh/separator/test_dpcl_separator.py create mode 100644 test/espnet2/gan_tts/jets/test_jets.py diff --git a/.github/workflows/centos7.yml b/.github/workflows/centos7.yml index 94d5973e859..d365c2e4961 100644 --- a/.github/workflows/centos7.yml +++ b/.github/workflows/centos7.yml @@ -19,7 +19,7 @@ jobs: # ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found # (required by /__w/espnet/espnet/tools/venv/envs/espnet/lib/python3.6/site-packages/pyworld/pyworld.cpython-36m-x86_64-linux-gnu.so) # NOTE(kamo): The issue doens't exist for python3.7? - TH_VERSION: 1.10.1 + TH_VERSION: 1.11.0 CHAINER_VERSION: 6.0.0 USE_CONDA: true CC: /opt/rh/devtoolset-7/root/usr/bin/gcc diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index f1eb6fb47ae..92e0b29f582 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -16,19 +16,19 @@ jobs: matrix: os: [ubuntu-18.04] python-version: [3.7] - pytorch-version: [1.4.0, 1.5.1, 1.6.0, 1.7.1, 1.8.1, 1.9.1, 1.10.1] + pytorch-version: [1.4.0, 1.5.1, 1.6.0, 1.7.1, 1.8.1, 1.9.1, 1.10.2, 1.11.0] chainer-version: [6.0.0] # NOTE(kamo): Conda is tested by Circle-CI use-conda: [false] include: - os: ubuntu-20.04 python-version: 3.8 - pytorch-version: 1.10.1 + pytorch-version: 1.11.0 chainer-verssion: 6.0.0 use-conda: false - os: ubuntu-20.04 python-version: 3.9 - pytorch-version: 1.10.1 + pytorch-version: 1.11.0 chainer-verssion: 6.0.0 use-conda: false steps: diff --git a/.github/workflows/debian9.yml b/.github/workflows/debian9.yml index a29e5474ad4..79a68e8383d 100644 --- a/.github/workflows/debian9.yml +++ b/.github/workflows/debian9.yml @@ -15,7 +15,7 @@ jobs: image: debian:9 env: ESPNET_PYTHON_VERSION: 3.7 - TH_VERSION: 1.10.1 + TH_VERSION: 1.11.0 CHAINER_VERSION: 6.0.0 USE_CONDA: true CC: gcc-6 diff --git a/.github/workflows/test_import.yaml b/.github/workflows/test_import.yaml index ead9f587c07..1031d3e5601 100644 --- a/.github/workflows/test_import.yaml +++ b/.github/workflows/test_import.yaml @@ -16,7 +16,7 @@ jobs: matrix: os: [ubuntu-latest] python-version: [3.9] - pytorch-version: [1.10.1] + pytorch-version: [1.11.0] steps: - uses: actions/checkout@v2 - uses: actions/cache@v1 diff --git a/.mergify.yml b/.mergify.yml index 0304250182c..d67959e73ea 100644 --- a/.mergify.yml +++ b/.mergify.yml @@ -4,17 +4,17 @@ pull_request_rules: - "label=auto-merge" - "check-success=test_centos7" - "check-success=test_debian9" - - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.3.1, 6.0.0, false)" - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.4.0, 6.0.0, false)" - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.5.1, 6.0.0, false)" - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.6.0, 6.0.0, false)" - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.7.1, 6.0.0, false)" - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.8.1, 6.0.0, false)" - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.9.1, 6.0.0, false)" - - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.10.1, 6.0.0, false)" - - "check-success=linter_and_test (ubuntu-20.04, 3.8, 1.10.1, false, 6.0.0)" - - "check-success=linter_and_test (ubuntu-20.04, 3.9, 1.10.1, false, 6.0.0)" - - "check-success=test_import (ubuntu-latest, 3.9, 1.10.1)" + - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.10.2, 6.0.0, false)" + - "check-success=linter_and_test (ubuntu-18.04, 3.7, 1.11.0, 6.0.0, false)" + - "check-success=linter_and_test (ubuntu-20.04, 3.8, 1.11.0, false, 6.0.0)" + - "check-success=linter_and_test (ubuntu-20.04, 3.9, 1.11.0, false, 6.0.0)" + - "check-success=test_import (ubuntu-latest, 3.9, 1.11.0)" actions: merge: method: merge diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9036a09b66d..7a97e340c9a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -64,21 +64,21 @@ To port models from zenodo using Hugging Face hub, 1. Create a Hugging Face account - https://huggingface.co/ 2. Request to be added to espnet organisation - https://huggingface.co/espnet 3. Go to `egs2/RECIPE/*/scripts/utils` and run `./upload_models_to_hub.sh "ZENODO_MODEL_NAME"` - + To upload models using Huggingface-cli follow the following steps: You can also refer to https://huggingface.co/docs/transformers/model_sharing 1. Create a Hugging Face account - https://huggingface.co/ -2. Request to be added to espnet organisation - https://huggingface.co/espnet -3. Run huggingface-cli login (You can get the token request at this step under setting > Access Tokens > espnet token +2. Request to be added to espnet organisation - https://huggingface.co/espnet +3. Run huggingface-cli login (You can get the token request at this step under setting > Access Tokens > espnet token 4. `huggingface-cli repo create your-model-name --organization espnet` 5. `git clone https://huggingface.co/username/your-model-name` (clone this outside ESPNet to avoid issues as this a git repo) 6. `cd your-model-name` 7. `git lfs install` -8. copy contents from exp diretory of your recipe into this directory (Check other models of similar task under ESPNet to confirm your directory structure) +8. copy contents from exp diretory of your recipe into this directory (Check other models of similar task under ESPNet to confirm your directory structure) 9. `git add . ` 10. `git commit -m "Add model files"` 11. `git push` -12. Check if the inference demo on HF is running successfully to verify the upload +12. Check if the inference demo on HF is running successfully to verify the upload #### 1.3.3 Additional requirements for new recipe @@ -91,12 +91,13 @@ to its differences. - If a recipe for a new corpus is proposed, you should add its name and information to: https://github.com/espnet/espnet/blob/master/egs/README.md if it's a ESPnet1 recipe, or https://github.com/espnet/espnet/blob/master/egs2/README.md + `db.sh` if it's a ESPnet2 recipe. - + #### 1.3.4 Checklist before you submit the recipe-based PR - [ ] be careful about the name for the recipe. It is recommended to follow naming conventions of the other recipes - [ ] common/shared files are linked with **soft link** (see Section 1.3.3) -- [ ] modified or new python scripts should be passed through **latest** black formating (by using python package black). The command to be executed could be `black espnet espnet2 test utils setup.py egs*/*/*/local egs2/TEMPLATE/asr1/pyscripts` +- [ ] modified or new python scripts should be passed through **latest** black formating (by using python package black). The command to be executed could be `black espnet espnet2 test utils setup.py egs*/*/*/local egs2/TEMPLATE/*/pyscripts tools/*.py ci/*.py` +- [ ] modified or new python scripts should be passed through **latest** isort formating (by using python package isort). The command to be executed could be `isort espnet espnet2 test utils setup.py egs*/*/*/local egs2/TEMPLATE/*/pyscripts tools/*.py ci/*.py` - [ ] cluster settings should be set as **default** (e.g., cmd.sh conf/slurm.conf conf/queue.conf conf/pbs.conf) - [ ] update `egs/README.md` or `egs2/README.md` with corresponding recipes - [ ] add corresponding entry in `egs2/TEMPLATE/db.sh` for a new corpus @@ -135,7 +136,7 @@ $ pip install -e ".[test]" ### 4.1 Python -Then you can run the entire test suite using [flake8](http://flake8.pycqa.org/en/latest/), [autopep8](https://github.com/hhatto/autopep8), [black](https://github.com/psf/black) and [pytest](https://docs.pytest.org/en/latest/) with [coverage](https://pytest-cov.readthedocs.io/en/latest/reporting.html) by +Then you can run the entire test suite using [flake8](http://flake8.pycqa.org/en/latest/), [autopep8](https://github.com/hhatto/autopep8), [black](https://github.com/psf/black), [isort](https://github.com/PyCQA/isort) and [pytest](https://docs.pytest.org/en/latest/) with [coverage](https://pytest-cov.readthedocs.io/en/latest/reporting.html) by ``` console ./ci/test_python.sh ``` diff --git a/README.md b/README.md index 673739104c2..21a5592b5b9 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,15 @@ # ESPnet: end-to-end speech processing toolkit -| system/pytorch ver. | 1.4.0 | 1.5.1 | 1.6.0 | 1.7.1 | 1.8.1 | 1.9.1 | 1.10.1 | -| :---------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | -| ubuntu20/python3.9/pip | | | | | | | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | -| ubuntu20/python3.8/pip | | | | | | | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | -| ubuntu18/python3.7/pip | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | [![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions) | -| debian9/python3.7/conda | | | | | | | [![debian9](https://github.com/espnet/espnet/workflows/debian9/badge.svg)](https://github.com/espnet/espnet/actions?query=workflow%3Adebian9) | -| centos7/python3.7/conda | | | | | | | [![centos7](https://github.com/espnet/espnet/workflows/centos7/badge.svg)](https://github.com/espnet/espnet/actions?query=workflow%3Acentos7) | -| doc/python3.8 | | | | | | | [![doc](https://github.com/espnet/espnet/workflows/doc/badge.svg)](https://github.com/espnet/espnet/actions?query=workflow%3Adoc) | +|system/pytorch ver.|1.4.0|1.5.1|1.6.0|1.7.1|1.8.1|1.9.1|1.10.2|1.11.0| +| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | +|ubuntu20/python3.9/pip||||||||[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)| +|ubuntu20/python3.8/pip||||||||[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)| +|ubuntu18/python3.7/pip|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)|[![Github Actions](https://github.com/espnet/espnet/workflows/CI/badge.svg)](https://github.com/espnet/espnet/actions)| +|debian9/python3.7/conda||||||||[![debian9](https://github.com/espnet/espnet/workflows/debian9/badge.svg)](https://github.com/espnet/espnet/actions?query=workflow%3Adebian9)| +|centos7/python3.7/conda||||||||[![centos7](https://github.com/espnet/espnet/workflows/centos7/badge.svg)](https://github.com/espnet/espnet/actions?query=workflow%3Acentos7)| +|doc/python3.8||||||||[![doc](https://github.com/espnet/espnet/workflows/doc/badge.svg)](https://github.com/espnet/espnet/actions?query=workflow%3Adoc)| + [![PyPI version](https://badge.fury.io/py/espnet.svg)](https://badge.fury.io/py/espnet) [![Python Versions](https://img.shields.io/pypi/pyversions/espnet.svg)](https://pypi.org/project/espnet/) @@ -17,7 +18,8 @@ [![GitHub license](https://img.shields.io/github/license/espnet/espnet.svg)](https://github.com/espnet/espnet) [![codecov](https://codecov.io/gh/espnet/espnet/branch/master/graph/badge.svg)](https://codecov.io/gh/espnet/espnet) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) -[![Mergify Status](https://img.shields.io/endpoint.svg?url=https://gh.mergify.io/badges/espnet/espnet&style=flat)](https://mergify.io) +[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) +[![Mergify Status](https://img.shields.io/endpoint.svg?url=https://api.mergify.com/v1/badges/espnet/espnet&style=flat)](https://mergify.com) [![Gitter](https://badges.gitter.im/espnet-en/community.svg)](https://gitter.im/espnet-en/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [**Docs**](https://espnet.github.io/espnet/) @@ -77,11 +79,11 @@ ESPnet uses [pytorch](http://pytorch.org/) as a deep learning engine and also fo - Self-supervised learning representations as features, using upstream models in [S3PRL](https://github.com/s3prl/s3prl) in frontend. - Set `frontend` to be `s3prl` - Select any upstream model by setting the `frontend_conf` to the corresponding name. -- Transfer Learning : +- Transfer Learning : - easy usage and transfers from models previously trained by your group, or models from [ESPnet huggingface repository](https://huggingface.co/espnet). - [Documentation](https://github.com/espnet/espnet/tree/master/egs2/mini_an4/asr1/transfer_learning.md) and [toy example runnable on colab](https://github.com/espnet/notebook/blob/master/espnet2_asr_transfer_learning_demo.ipynb). - Streaming Transformer/Conformer ASR with blockwise synchronous beam search. -- Restricted Self-Attention based on [Longformer](https://arxiv.org/abs/2004.05150) as an encoder for long sequences +- Restricted Self-Attention based on [Longformer](https://arxiv.org/abs/2004.05150) as an encoder for long sequences Demonstration - Real-time ASR demo with ESPnet2 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/espnet/notebook/blob/master/espnet2_asr_realtime_demo.ipynb) @@ -96,14 +98,15 @@ Demonstration - FastSpeech2 - Conformer FastSpeech & FastSpeech2 - VITS + - JETS - Multi-speaker & multi-language extention - - Pretrined speaker embedding (e.g., X-vector) + - Pretrained speaker embedding (e.g., X-vector) - Speaker ID embedding - Language ID embedding - Global style token (GST) embedding - Mix of the above embeddings - End-to-end training - - End-to-end text-to-wav model (e.g., VITS) + - End-to-end text-to-wav model (e.g., VITS, JETS, etc.) - Joint training of text2mel and vocoder - Various language support - En / Jp / Zn / De / Ru / And more... @@ -125,7 +128,7 @@ To train the neural vocoder, please check the following repositories: > **NOTE**: > - We are moving on ESPnet2-based development for TTS. -> - If you are beginner, we recommend using [ESPnet2-TTS](https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1). +> - The use of ESPnet1-TTS is deprecated, please use [ESPnet2-TTS](https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1). ### SE: Speech enhancement (and separation) @@ -133,7 +136,7 @@ To train the neural vocoder, please check the following repositories: - Multi-speaker speech separation - Unified encoder-separator-decoder structure for time-domain and frequency-domain models - Encoder/Decoder: STFT/iSTFT, Convolution/Transposed-Convolution - - Separators: BLSTM, Transformer, Conformer, [TasNet](https://arxiv.org/abs/1809.07454), [DPRNN](https://arxiv.org/abs/1910.06379), [DC-CRN](https://web.cse.ohio-state.edu/~wang.77/papers/TZW.taslp21.pdf), [DCCRN](https://arxiv.org/abs/2008.00264), Neural Beamformers, etc. + - Separators: BLSTM, Transformer, Conformer, [TasNet](https://arxiv.org/abs/1809.07454), [DPRNN](https://arxiv.org/abs/1910.06379), [SkiM](https://arxiv.org/abs/2201.10800), [SVoice](https://arxiv.org/abs/2011.02329), [DC-CRN](https://web.cse.ohio-state.edu/~wang.77/papers/TZW.taslp21.pdf), [DCCRN](https://arxiv.org/abs/2008.00264), [Deep Clustering](https://ieeexplore.ieee.org/document/7471631), [Deep Attractor Network](https://pubmed.ncbi.nlm.nih.gov/29430212/), [FaSNet](https://arxiv.org/abs/1909.13387), [iFaSNet](https://arxiv.org/abs/1910.14104), Neural Beamformers, etc. - Flexible ASR integration: working as an individual task or as the ASR frontend - Easy to import pretrained models from [Asteroid](https://github.com/asteroid-team/asteroid) - Both the pre-trained models from Asteroid and the specific configuration are supported. @@ -577,10 +580,10 @@ We list the performance on various SLU tasks and dataset using the metric report | Dialogue Act Classification | Switchboard | Acc | 67.5 | [link](https://github.com/espnet/espnet/tree/master/egs2/swbd_da/asr1/README.md) | | Dialogue Act Classification | Jdcinal (Jp) | Acc | 67.4 | [link](https://github.com/espnet/espnet/tree/master/egs2/jdcinal/asr1/README.md) | | Emotion Recognition | IEMOCAP | Acc | 69.4 | [link](https://github.com/espnet/espnet/tree/master/egs2/iemocap/asr1/README.md) | -| Emotion Recognition | swbd_sentiment | Macro F1 | 61.4 | [link](https://github.com/espnet/espnet/tree/master/egs2/swbd_sentiment/asr1/README.md) | -| Emotion Recognition | slue_voxceleb | Macro F1 | 44.0 | [link](https://github.com/espnet/espnet/tree/master/egs2/slue-voxceleb/asr1/README.md) | +| Emotion Recognition | swbd_sentiment | Macro F1 | 61.4 | [link](https://github.com/espnet/espnet/tree/master/egs2/swbd_sentiment/asr1/README.md) | +| Emotion Recognition | slue_voxceleb | Macro F1 | 44.0 | [link](https://github.com/espnet/espnet/tree/master/egs2/slue-voxceleb/asr1/README.md) | + - If you want to check the results of the other recipes, please check `egs2//asr1/RESULTS.md`. @@ -735,7 +738,7 @@ See the module documentation for more information. It is recommended to use models with RNN-based encoders (such as BLSTMP) for aligning large audio files; rather than using Transformer models that have a high memory consumption on longer audio data. The sample rate of the audio must be consistent with that of the data used in training; adjust with `sox` if needed. - + Also, we can use this tool to provide token-level segmentation information if we prepare a list of tokens instead of that of utterances in the `text` file. See the discussion in https://github.com/espnet/espnet/issues/4278#issuecomment-1100756463. diff --git a/ci/install.sh b/ci/install.sh index 5bfed7584ad..7f8498a2a88 100755 --- a/ci/install.sh +++ b/ci/install.sh @@ -48,7 +48,7 @@ python3 -m pip freeze # Check pytorch version python3 < /dev/null; then echo "==== use_k2, num_paths > nll_batch_size, feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ===" ./run.sh --num_paths 500 --nll_batch_size 20 --use_k2 true --ngpu 0 --stage 12 --stop-stage 13 --skip-upload false --feats-type "raw" --token-type "bpe" \ --feats_normalize "utterance_mvn" --lm-args "--max_epoch=1" --python "${python}" \ --asr-args "--model_conf extract_feats_in_collect_stats=false --max_epoch=1" - + echo "==== use_k2, num_paths == nll_batch_size, feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ===" ./run.sh --num_paths 20 --nll_batch_size 20 --use_k2 true --ngpu 0 --stage 12 --stop-stage 13 --skip-upload false --feats-type "raw" --token-type "bpe" \ --feats_normalize "utterance_mvn" --lm-args "--max_epoch=1" --python "${python}" \ @@ -68,7 +68,7 @@ rm -rf exp dump data # NOTE(kan-bayashi): pytorch 1.4 - 1.6 works but 1.6 has a problem with CPU, # so we test this recipe using only pytorch > 1.6 here. # See also: https://github.com/pytorch/pytorch/issues/42446 -if python3 -c 'import torch as t; from distutils.version import LooseVersion as L; assert L(t.__version__) > L("1.6")' &> /dev/null; then +if python3 -c 'import torch as t; from packaging.version import parse as L; assert L(t.__version__) > L("1.6")' &> /dev/null; then ./run.sh --fs 22050 --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --inference_model latest.pth \ --ngpu 0 --stop-stage 8 --skip-upload false --train-args "--num_iters_per_epoch 1 --max_epoch 1" --python "${python}" rm -rf exp dump data @@ -76,7 +76,7 @@ fi cd "${cwd}" # [ESPnet2] test enh recipe -if python -c 'import torch as t; from distutils.version import LooseVersion as L; assert L(t.__version__) >= L("1.2.0")' &> /dev/null; then +if python -c 'import torch as t; from packaging.version import parse as L; assert L(t.__version__) >= L("1.2.0")' &> /dev/null; then cd ./egs2/mini_an4/enh1 echo "==== [ESPnet2] ENH ===" ./run.sh --stage 1 --stop-stage 1 --python "${python}" @@ -101,7 +101,7 @@ if python3 -c "import fairseq" &> /dev/null; then fi # [ESPnet2] test enh_asr1 recipe -if python -c 'import torch as t; from distutils.version import LooseVersion as L; assert L(t.__version__) >= L("1.2.0")' &> /dev/null; then +if python -c 'import torch as t; from packaging.version import parse as L; assert L(t.__version__) >= L("1.2.0")' &> /dev/null; then cd ./egs2/mini_an4/enh_asr1 echo "==== [ESPnet2] ENH_ASR ===" ./run.sh --ngpu 0 --stage 0 --stop-stage 15 --skip-upload_hf false --feats-type "raw" --spk-num 1 --enh_asr_args "--max_epoch=1 --enh_separator_conf num_spk=1" --python "${python}" @@ -110,10 +110,44 @@ if python -c 'import torch as t; from distutils.version import LooseVersion as L cd "${cwd}" fi +# [ESPnet2] test st recipe +cd ./egs2/mini_an4/st1 +echo "==== [ESPnet2] ST ===" +./run.sh --stage 1 --stop-stage 1 +feats_types="raw fbank_pitch" +token_types="bpe char" +for t in ${feats_types}; do + ./run.sh --stage 2 --stop-stage 4 --feats-type "${t}" --python "${python}" +done +for t in ${token_types}; do + ./run.sh --stage 5 --stop-stage 5 --tgt_token_type "${t}" --src_token_type "${t}" --python "${python}" +done +for t in ${feats_types}; do + for t2 in ${token_types}; do + echo "==== feats_type=${t}, token_types=${t2} ===" + ./run.sh --ngpu 0 --stage 6 --stop-stage 13 --skip-upload false --feats-type "${t}" --tgt_token_type "${t2}" --src_token_type "${t2}" \ + --st-args "--max_epoch=1" --lm-args "--max_epoch=1" --inference_args "--beam_size 5" --python "${python}" + done +done +echo "==== feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ===" +./run.sh --ngpu 0 --stage 10 --stop-stage 13 --skip-upload false --feats-type "raw" --tgt_token_type "bpe" --src_token_type "bpe" \ + --feats_normalize "utterance_mvn" --lm-args "--max_epoch=1" --inference_args "--beam_size 5" --python "${python}" \ + --st-args "--model_conf extract_feats_in_collect_stats=false --max_epoch=1" + +echo "==== use_streaming, feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ===" +./run.sh --use_streaming true --ngpu 0 --stage 6 --stop-stage 13 --skip-upload false --feats-type "raw" --tgt_token_type "bpe" --src_token_type "bpe" \ + --feats_normalize "utterance_mvn" --lm-args "--max_epoch=1" --inference_args "--beam_size 5" --python "${python}" \ + --st-args "--model_conf extract_feats_in_collect_stats=false --max_epoch=1 --encoder=contextual_block_transformer --decoder=transformer + --encoder_conf block_size=40 --encoder_conf hop_size=16 --encoder_conf look_ahead=16" + +# Remove generated files in order to reduce the disk usage +rm -rf exp dump data +cd "${cwd}" + # [ESPnet2] Validate configuration files echo "" > dummy_token_list echo "==== [ESPnet2] Validation configuration files ===" -if python3 -c 'import torch as t; from distutils.version import LooseVersion as L; assert L(t.__version__) >= L("1.8.0")' &> /dev/null; then +if python3 -c 'import torch as t; from packaging.version import parse as L; assert L(t.__version__) >= L("1.8.0")' &> /dev/null; then for f in egs2/*/asr1/conf/train_asr*.yaml; do if [ "$f" == "egs2/fsc/asr1/conf/train_asr.yaml" ]; then if ! python3 -c "import s3prl" > /dev/null; then @@ -134,6 +168,9 @@ if python3 -c 'import torch as t; from distutils.version import LooseVersion as for f in egs2/*/ssl1/conf/train*.yaml; do ${python} -m espnet2.bin.hubert_train --config "${f}" --iterator_type none --normalize none --dry_run true --output_dir out --token_list dummy_token_list done + for f in egs2/*/enh_asr1/conf/train_enh_asr*.yaml; do + ${python} -m espnet2.bin.enh_s2t_train --config "${f}" --iterator_type none --dry_run true --output_dir out --token_list dummy_token_list + done fi # These files must be same each other. diff --git a/ci/test_python.sh b/ci/test_python.sh index b3f47146198..a191944aef1 100755 --- a/ci/test_python.sh +++ b/ci/test_python.sh @@ -5,11 +5,16 @@ set -euo pipefail -modules="espnet espnet2 test utils setup.py egs*/*/*/local egs2/TEMPLATE/asr1/pyscripts" +modules="espnet espnet2 test utils setup.py egs*/*/*/local egs2/TEMPLATE/*/pyscripts tools/*.py ci/*.py" # black if ! black --check ${modules}; then - printf 'Please apply:\n $ black %s\n' "${modules}" + printf '[INFO] Please apply black:\n $ black %s\n' "${modules}" + exit 1 +fi +# isort +if ! isort -c -v ${modules}; then + printf '[INFO] Please apply isort:\n $ isort %s\n' "${modules}" exit 1 fi diff --git a/egs/arctic/tts1/local/clean_text.py b/egs/arctic/tts1/local/clean_text.py index 7b14f47a61a..6fd5ce649e0 100755 --- a/egs/arctic/tts1/local/clean_text.py +++ b/egs/arctic/tts1/local/clean_text.py @@ -8,7 +8,6 @@ from tacotron_cleaner.cleaners import custom_english_cleaners - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("text", type=str, help="text to be cleaned") diff --git a/egs/chime6/asr1/local/extract_noises.py b/egs/chime6/asr1/local/extract_noises.py index 7c96666b5c9..b79e6fcaeaa 100755 --- a/egs/chime6/asr1/local/extract_noises.py +++ b/egs/chime6/asr1/local/extract_noises.py @@ -6,11 +6,12 @@ import argparse import json import math -import numpy as np import os -import scipy.io.wavfile as siw import sys +import numpy as np +import scipy.io.wavfile as siw + def get_args(): parser = argparse.ArgumentParser( diff --git a/egs/chime6/asr1/local/make_noise_list.py b/egs/chime6/asr1/local/make_noise_list.py index 1674bb71b4d..b8f84fc3fed 100755 --- a/egs/chime6/asr1/local/make_noise_list.py +++ b/egs/chime6/asr1/local/make_noise_list.py @@ -7,7 +7,6 @@ import os import sys - if len(sys.argv) != 2: print("Usage: {} ".format(sys.argv[0])) raise SystemExit(1) diff --git a/egs/cmu_indic/tts1/local/clean_text.py b/egs/cmu_indic/tts1/local/clean_text.py index 7b14f47a61a..6fd5ce649e0 100755 --- a/egs/cmu_indic/tts1/local/clean_text.py +++ b/egs/cmu_indic/tts1/local/clean_text.py @@ -8,7 +8,6 @@ from tacotron_cleaner.cleaners import custom_english_cleaners - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("text", type=str, help="text to be cleaned") diff --git a/egs/covost2/st1/local/process_tsv.py b/egs/covost2/st1/local/process_tsv.py index 2c46d83df75..84609ad1f68 100755 --- a/egs/covost2/st1/local/process_tsv.py +++ b/egs/covost2/st1/local/process_tsv.py @@ -5,8 +5,8 @@ import argparse import codecs -import pandas as pd +import pandas as pd parser = argparse.ArgumentParser(description="extract translation from tsv file") parser.add_argument("tsv_path", type=str, default=None, help="input tsv path") diff --git a/egs/csj/asr1/local/csj_rm_tag.py b/egs/csj/asr1/local/csj_rm_tag.py index 0a23ca59708..dfe5ba5e4f3 100755 --- a/egs/csj/asr1/local/csj_rm_tag.py +++ b/egs/csj/asr1/local/csj_rm_tag.py @@ -6,9 +6,8 @@ import argparse import codecs -from io import open import sys - +from io import open PY2 = sys.version_info[0] == 2 sys.stdin = codecs.getreader("utf-8")(sys.stdin if PY2 else sys.stdin.buffer) diff --git a/egs/iwslt16/mt1/local/extract_recog_text.py b/egs/iwslt16/mt1/local/extract_recog_text.py index bf2dbdfda9e..bba75a17b9a 100755 --- a/egs/iwslt16/mt1/local/extract_recog_text.py +++ b/egs/iwslt16/mt1/local/extract_recog_text.py @@ -4,9 +4,9 @@ """ import argparse import glob -from itertools import takewhile import json import os +from itertools import takewhile def get_args(): diff --git a/egs/iwslt16/mt1/local/generate_json.py b/egs/iwslt16/mt1/local/generate_json.py index 2dd4d66a098..4e81eb8d7f1 100755 --- a/egs/iwslt16/mt1/local/generate_json.py +++ b/egs/iwslt16/mt1/local/generate_json.py @@ -5,11 +5,9 @@ """ import argparse import json -from logging import getLogger import os -from typing import Dict -from typing import List - +from logging import getLogger +from typing import Dict, List logger = getLogger(__name__) diff --git a/egs/iwslt16/mt1/local/generate_vocab.py b/egs/iwslt16/mt1/local/generate_vocab.py index c97c4c069c5..f060d3b4aae 100755 --- a/egs/iwslt16/mt1/local/generate_vocab.py +++ b/egs/iwslt16/mt1/local/generate_vocab.py @@ -6,8 +6,8 @@ format: token + whitespace + index """ import argparse -from collections import defaultdict import fileinput +from collections import defaultdict def get_args(): diff --git a/egs/iwslt18/st1/local/parse_xml.py b/egs/iwslt18/st1/local/parse_xml.py index e42f8e2c79e..067926ee50f 100755 --- a/egs/iwslt18/st1/local/parse_xml.py +++ b/egs/iwslt18/st1/local/parse_xml.py @@ -6,10 +6,10 @@ import argparse import codecs -from collections import OrderedDict import os import re import xml.etree.ElementTree as etree +from collections import OrderedDict def main(): diff --git a/egs/iwslt21/asr1/local/filter_parentheses.py b/egs/iwslt21/asr1/local/filter_parentheses.py index 8c27bf39d27..b0c77d3a314 100755 --- a/egs/iwslt21/asr1/local/filter_parentheses.py +++ b/egs/iwslt21/asr1/local/filter_parentheses.py @@ -7,6 +7,7 @@ import argparse import codecs import re + import regex parser = argparse.ArgumentParser() diff --git a/egs/iwslt21_low_resource/st1/local/data_prep.py b/egs/iwslt21_low_resource/st1/local/data_prep.py index 75153cc426f..60df7d00d8c 100644 --- a/egs/iwslt21_low_resource/st1/local/data_prep.py +++ b/egs/iwslt21_low_resource/st1/local/data_prep.py @@ -1,7 +1,6 @@ import argparse import os - if __name__ == "__main__": parser = argparse.ArgumentParser(description="Convert data into kaldi format") parser.add_argument("data_dir", type=str) diff --git a/egs/jnas/asr1/local/filter_text.py b/egs/jnas/asr1/local/filter_text.py index db35c1754da..c5b000ce4c0 100755 --- a/egs/jnas/asr1/local/filter_text.py +++ b/egs/jnas/asr1/local/filter_text.py @@ -6,9 +6,8 @@ import argparse import codecs -from io import open import sys - +from io import open PY2 = sys.version_info[0] == 2 sys.stdin = codecs.getreader("utf-8")(sys.stdin if PY2 else sys.stdin.buffer) diff --git a/egs/ksponspeech/asr1/local/get_space_normalized_hyps.py b/egs/ksponspeech/asr1/local/get_space_normalized_hyps.py index c105b47c578..1f5225bfe83 100755 --- a/egs/ksponspeech/asr1/local/get_space_normalized_hyps.py +++ b/egs/ksponspeech/asr1/local/get_space_normalized_hyps.py @@ -4,11 +4,11 @@ # Copyright 2020 Electronics and Telecommunications Research Institute (Jeong-Uk, Bang) # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) -import configargparse import logging import os import sys +import configargparse from numpy import zeros space_sym = "▁" diff --git a/egs/ksponspeech/asr1/local/get_transcriptions.py b/egs/ksponspeech/asr1/local/get_transcriptions.py index 9d1db4b9225..771c377641f 100644 --- a/egs/ksponspeech/asr1/local/get_transcriptions.py +++ b/egs/ksponspeech/asr1/local/get_transcriptions.py @@ -5,13 +5,14 @@ # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) import codecs -import configargparse import logging import os import re import shutil import sys +import configargparse + def get_parser(): """Get default arguments.""" diff --git a/egs/libri_css/asr1/local/best_wer_matching.py b/egs/libri_css/asr1/local/best_wer_matching.py index d9688496ad6..67e1d4b808a 100755 --- a/egs/libri_css/asr1/local/best_wer_matching.py +++ b/egs/libri_css/asr1/local/best_wer_matching.py @@ -5,9 +5,10 @@ import io import itertools import math +import sys + import numpy as np from scipy.optimize import linear_sum_assignment -import sys # Helper function to group the list by ref/hyp ids diff --git a/egs/libri_css/asr1/local/get_perspeaker_output.py b/egs/libri_css/asr1/local/get_perspeaker_output.py index 3dcdfae1340..3f0361ca320 100755 --- a/egs/libri_css/asr1/local/get_perspeaker_output.py +++ b/egs/libri_css/asr1/local/get_perspeaker_output.py @@ -5,9 +5,9 @@ into per_speaker output (text) file""" import argparse -from collections import defaultdict import itertools import os +from collections import defaultdict def get_args(): diff --git a/egs/libri_css/asr1/local/prepare_data.py b/egs/libri_css/asr1/local/prepare_data.py index f5b2e409f5c..f3800935c47 100755 --- a/egs/libri_css/asr1/local/prepare_data.py +++ b/egs/libri_css/asr1/local/prepare_data.py @@ -7,6 +7,7 @@ import argparse import glob import os + import soundfile as sf import tqdm diff --git a/egs/libri_css/asr1/local/segmentation/apply_webrtcvad.py b/egs/libri_css/asr1/local/segmentation/apply_webrtcvad.py index 08ca2f9d765..e30005fd518 100755 --- a/egs/libri_css/asr1/local/segmentation/apply_webrtcvad.py +++ b/egs/libri_css/asr1/local/segmentation/apply_webrtcvad.py @@ -12,6 +12,7 @@ import os import sys import wave + import webrtcvad diff --git a/egs/ljspeech/tts1/local/clean_text.py b/egs/ljspeech/tts1/local/clean_text.py index 14c6721ece4..ee7c5fcfa1f 100755 --- a/egs/ljspeech/tts1/local/clean_text.py +++ b/egs/ljspeech/tts1/local/clean_text.py @@ -5,8 +5,8 @@ import argparse import codecs -import nltk +import nltk from tacotron_cleaner.cleaners import custom_english_cleaners try: diff --git a/egs/lrs/avsr1/local/se_batch.py b/egs/lrs/avsr1/local/se_batch.py index c5f0a58bf6b..6b78ee965eb 100755 --- a/egs/lrs/avsr1/local/se_batch.py +++ b/egs/lrs/avsr1/local/se_batch.py @@ -5,11 +5,12 @@ License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.""" -from deepxi.utils import read_wav import glob -import numpy as np import os +import numpy as np +from deepxi.utils import read_wav + def Batch(fdir, snr_l=[]): """REQUIRES REWRITING. WILL BE MOVED TO deepxi/utils.py diff --git a/egs/mgb2/asr1/local/process_xml.py b/egs/mgb2/asr1/local/process_xml.py index dadfb97845e..e0fa189d083 100644 --- a/egs/mgb2/asr1/local/process_xml.py +++ b/egs/mgb2/asr1/local/process_xml.py @@ -1,9 +1,10 @@ #!/usr/bin/env python3 import argparse -from bs4 import BeautifulSoup import sys +from bs4 import BeautifulSoup + def get_args(): parser = argparse.ArgumentParser(description="""This script process xml file.""") diff --git a/egs/mgb2/asr1/local/text_segmenting.py b/egs/mgb2/asr1/local/text_segmenting.py index ec9004a20b1..6cfa58fb135 100644 --- a/egs/mgb2/asr1/local/text_segmenting.py +++ b/egs/mgb2/asr1/local/text_segmenting.py @@ -4,6 +4,7 @@ # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) import argparse + import pandas as pd diff --git a/egs/polyphone_swiss_french/asr1/local/data_prep.py b/egs/polyphone_swiss_french/asr1/local/data_prep.py index 6926ceabc1f..6a41a0b8717 100755 --- a/egs/polyphone_swiss_french/asr1/local/data_prep.py +++ b/egs/polyphone_swiss_french/asr1/local/data_prep.py @@ -1,11 +1,11 @@ #!/usr/bin/env python3 -from collections import defaultdict import os import pathlib -from random import shuffle import re import subprocess import sys +from collections import defaultdict +from random import shuffle class FrPolyphonePrepper: @@ -570,6 +570,7 @@ def _generate_random(self, corpus, splits): if __name__ == "__main__": import argparse + import yaml example = "{0} --config conf/dataprep.yml".format(sys.argv[0]) diff --git a/egs/puebla_nahuatl/asr1/local/construct_dataset.py b/egs/puebla_nahuatl/asr1/local/construct_dataset.py index 752c915e22f..fd471e44ebf 100644 --- a/egs/puebla_nahuatl/asr1/local/construct_dataset.py +++ b/egs/puebla_nahuatl/asr1/local/construct_dataset.py @@ -1,5 +1,4 @@ import os - from argparse import ArgumentParser diff --git a/egs/puebla_nahuatl/asr1/local/data_prep.py b/egs/puebla_nahuatl/asr1/local/data_prep.py index 959d9a91250..6d90f9a0d4c 100755 --- a/egs/puebla_nahuatl/asr1/local/data_prep.py +++ b/egs/puebla_nahuatl/asr1/local/data_prep.py @@ -5,11 +5,9 @@ import shutil import string import sys - from argparse import ArgumentParser from xml.dom.minidom import parse - s = "".join(chr(c) for c in range(sys.maxunicode + 1)) ws = "".join(re.findall(r"\s", s)) outtab = " " * len(ws) diff --git a/egs/puebla_nahuatl/st1/local/data_prep.py b/egs/puebla_nahuatl/st1/local/data_prep.py index 74a39fdf478..3d07917fdbc 100644 --- a/egs/puebla_nahuatl/st1/local/data_prep.py +++ b/egs/puebla_nahuatl/st1/local/data_prep.py @@ -1,10 +1,10 @@ # -*- coding: UTF-8 -*- -from argparse import ArgumentParser import os import re import string import sys +from argparse import ArgumentParser from xml.dom.minidom import parse s = "".join(chr(c) for c in range(sys.maxunicode + 1)) diff --git a/egs/reverb/asr1/local/filterjson.py b/egs/reverb/asr1/local/filterjson.py index 00dff00fca3..400177e3d17 100755 --- a/egs/reverb/asr1/local/filterjson.py +++ b/egs/reverb/asr1/local/filterjson.py @@ -6,12 +6,11 @@ import argparse import codecs -from io import open import json import logging import re import sys - +from io import open PY2 = sys.version_info[0] == 2 sys.stdin = codecs.getreader("utf-8")(sys.stdin if PY2 else sys.stdin.buffer) diff --git a/egs/reverb/asr1/local/run_wpe.py b/egs/reverb/asr1/local/run_wpe.py index 309cf609d90..84d21b3b5c7 100755 --- a/egs/reverb/asr1/local/run_wpe.py +++ b/egs/reverb/asr1/local/run_wpe.py @@ -6,12 +6,11 @@ import argparse import errno import os -import soundfile as sf -from nara_wpe.utils import istft -from nara_wpe.utils import stft -from nara_wpe.wpe import wpe import numpy as np +import soundfile as sf +from nara_wpe.utils import istft, stft +from nara_wpe.wpe import wpe parser = argparse.ArgumentParser() parser.add_argument("--files", "-f", nargs="+") diff --git a/egs/reverb/asr1_multich/local/filterjson.py b/egs/reverb/asr1_multich/local/filterjson.py index 8841d546dc2..400177e3d17 100755 --- a/egs/reverb/asr1_multich/local/filterjson.py +++ b/egs/reverb/asr1_multich/local/filterjson.py @@ -6,11 +6,11 @@ import argparse import codecs -from io import open import json import logging import re import sys +from io import open PY2 = sys.version_info[0] == 2 sys.stdin = codecs.getreader("utf-8")(sys.stdin if PY2 else sys.stdin.buffer) diff --git a/egs/tweb/tts1/local/clean_text.py b/egs/tweb/tts1/local/clean_text.py index 07a34438f24..c7634744928 100755 --- a/egs/tweb/tts1/local/clean_text.py +++ b/egs/tweb/tts1/local/clean_text.py @@ -8,7 +8,6 @@ from tacotron_cleaner.cleaners import custom_english_cleaners - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("text", type=str, help="text to be cleaned") diff --git a/egs/vais1000/tts1/local/clean_text.py b/egs/vais1000/tts1/local/clean_text.py index 8f89b943092..d1e320c654e 100755 --- a/egs/vais1000/tts1/local/clean_text.py +++ b/egs/vais1000/tts1/local/clean_text.py @@ -8,7 +8,6 @@ from vietnamese_cleaner.vietnamese_cleaners import vietnamese_cleaner - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("text", type=str, help="text to be cleaned") diff --git a/egs/vcc20/tts1_en_fi/local/clean_text_css10.py b/egs/vcc20/tts1_en_fi/local/clean_text_css10.py index 2b64d394028..b83b8159884 100755 --- a/egs/vcc20/tts1_en_fi/local/clean_text_css10.py +++ b/egs/vcc20/tts1_en_fi/local/clean_text_css10.py @@ -9,13 +9,15 @@ import os import nltk -from tacotron_cleaner.cleaners import collapse_whitespace -from tacotron_cleaner.cleaners import expand_abbreviations -from tacotron_cleaner.cleaners import expand_numbers -from tacotron_cleaner.cleaners import expand_symbols -from tacotron_cleaner.cleaners import lowercase -from tacotron_cleaner.cleaners import remove_unnecessary_symbols -from tacotron_cleaner.cleaners import uppercase +from tacotron_cleaner.cleaners import ( + collapse_whitespace, + expand_abbreviations, + expand_numbers, + expand_symbols, + lowercase, + remove_unnecessary_symbols, + uppercase, +) try: # For phoneme conversion, use https://github.com/Kyubyong/g2p. diff --git a/egs/vcc20/vc1_task1/local/clean_text_asr_result.py b/egs/vcc20/vc1_task1/local/clean_text_asr_result.py index 9dc253855e6..381f9ffaf6b 100755 --- a/egs/vcc20/vc1_task1/local/clean_text_asr_result.py +++ b/egs/vcc20/vc1_task1/local/clean_text_asr_result.py @@ -5,8 +5,8 @@ import argparse import codecs -import nltk +import nltk from tacotron_cleaner.cleaners import custom_english_cleaners try: diff --git a/egs/vcc20/vc1_task2/local/clean_text_finnish.py b/egs/vcc20/vc1_task2/local/clean_text_finnish.py index fbbe6fa8a76..59e0a23f798 100755 --- a/egs/vcc20/vc1_task2/local/clean_text_finnish.py +++ b/egs/vcc20/vc1_task2/local/clean_text_finnish.py @@ -5,16 +5,18 @@ import argparse import codecs -import nltk -from tacotron_cleaner.cleaners import collapse_whitespace -from tacotron_cleaner.cleaners import custom_english_cleaners -from tacotron_cleaner.cleaners import expand_abbreviations -from tacotron_cleaner.cleaners import expand_numbers -from tacotron_cleaner.cleaners import expand_symbols -from tacotron_cleaner.cleaners import lowercase -from tacotron_cleaner.cleaners import remove_unnecessary_symbols -from tacotron_cleaner.cleaners import uppercase +import nltk +from tacotron_cleaner.cleaners import ( + collapse_whitespace, + custom_english_cleaners, + expand_abbreviations, + expand_numbers, + expand_symbols, + lowercase, + remove_unnecessary_symbols, + uppercase, +) E_lang_tag = "en_US" diff --git a/egs/vcc20/vc1_task2/local/clean_text_german.py b/egs/vcc20/vc1_task2/local/clean_text_german.py index b9123de1578..a10fd4e8f2e 100755 --- a/egs/vcc20/vc1_task2/local/clean_text_german.py +++ b/egs/vcc20/vc1_task2/local/clean_text_german.py @@ -5,11 +5,10 @@ import argparse import codecs -import nltk +import nltk from tacotron_cleaner.cleaners import custom_english_cleaners - E_lang_tag = "en_US" try: diff --git a/egs/vcc20/vc1_task2/local/clean_text_mandarin.py b/egs/vcc20/vc1_task2/local/clean_text_mandarin.py index e1932ceebd0..9a2784f0a2c 100755 --- a/egs/vcc20/vc1_task2/local/clean_text_mandarin.py +++ b/egs/vcc20/vc1_task2/local/clean_text_mandarin.py @@ -5,14 +5,13 @@ import argparse import codecs -import nltk +import nltk +from pypinyin import Style from pypinyin.contrib.neutral_tone import NeutralToneWith5Mixin from pypinyin.converter import DefaultConverter from pypinyin.core import Pinyin -from pypinyin import Style -from pypinyin.style._utils import get_finals -from pypinyin.style._utils import get_initials +from pypinyin.style._utils import get_finals, get_initials from tacotron_cleaner.cleaners import custom_english_cleaners diff --git a/egs/vcc20/voc1/local/subset_data_dir.py b/egs/vcc20/voc1/local/subset_data_dir.py index 841d0fb2bfc..968cd3d02d1 100755 --- a/egs/vcc20/voc1/local/subset_data_dir.py +++ b/egs/vcc20/voc1/local/subset_data_dir.py @@ -5,8 +5,8 @@ # consisting of some specified number of utterances. import argparse -from io import open import sys +from io import open def get_parser(): diff --git a/egs/voxforge/asr1/local/filter_text.py b/egs/voxforge/asr1/local/filter_text.py index db35c1754da..c5b000ce4c0 100755 --- a/egs/voxforge/asr1/local/filter_text.py +++ b/egs/voxforge/asr1/local/filter_text.py @@ -6,9 +6,8 @@ import argparse import codecs -from io import open import sys - +from io import open PY2 = sys.version_info[0] == 2 sys.stdin = codecs.getreader("utf-8")(sys.stdin if PY2 else sys.stdin.buffer) diff --git a/egs/wsj/asr1/local/filtering_samples.py b/egs/wsj/asr1/local/filtering_samples.py index 27766d43e58..4b91b004373 100755 --- a/egs/wsj/asr1/local/filtering_samples.py +++ b/egs/wsj/asr1/local/filtering_samples.py @@ -4,16 +4,15 @@ # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) -from functools import reduce import json -from operator import mul import sys +from functools import reduce +from operator import mul from espnet.bin.asr_train import get_parser from espnet.nets.pytorch_backend.nets_utils import get_subsample from espnet.utils.dynamic_import import dynamic_import - if __name__ == "__main__": cmd_args = sys.argv[1:] parser = get_parser(required=False) diff --git a/egs/wsj_mix/asr1/local/merge_scp2json.py b/egs/wsj_mix/asr1/local/merge_scp2json.py index 52260785b9d..7cf55f2d35f 100755 --- a/egs/wsj_mix/asr1/local/merge_scp2json.py +++ b/egs/wsj_mix/asr1/local/merge_scp2json.py @@ -3,10 +3,10 @@ import argparse import codecs -from io import open import json import logging import sys +from io import open from espnet.utils.cli_utils import get_commandline_args diff --git a/egs/wsj_mix/asr1/local/mergejson.py b/egs/wsj_mix/asr1/local/mergejson.py index 0926a858469..8b965cb97e5 100755 --- a/egs/wsj_mix/asr1/local/mergejson.py +++ b/egs/wsj_mix/asr1/local/mergejson.py @@ -11,7 +11,6 @@ import logging import sys - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("jsons", type=str, nargs="+", help="json files") diff --git a/egs/yoloxochitl_mixtec/asr1/local/data_prep.py b/egs/yoloxochitl_mixtec/asr1/local/data_prep.py index 91fcee41249..e96e633df00 100755 --- a/egs/yoloxochitl_mixtec/asr1/local/data_prep.py +++ b/egs/yoloxochitl_mixtec/asr1/local/data_prep.py @@ -1,14 +1,15 @@ # -*- coding: UTF-8 -*- -from argparse import ArgumentParser import os import re import shutil -import soundfile as sf import string import sys +from argparse import ArgumentParser from xml.dom.minidom import parse +import soundfile as sf + s = "".join(chr(c) for c in range(sys.maxunicode + 1)) ws = "".join(re.findall(r"\s", s)) outtab = " " * len(ws) diff --git a/egs2/README.md b/egs2/README.md index d67bdde2e8c..f4429367fa7 100755 --- a/egs2/README.md +++ b/egs2/README.md @@ -19,6 +19,7 @@ See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2 | bur_openslr80 | Burmese ASR training dataset | ASR | BUR | https://openslr.org/80/ | | | catslu | CATSLU-MAPS | SLU | CMN | https://sites.google.com/view/catslu/home | | | chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | ENG | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ | | +| chime6 | The 6th CHiME Speech Separation and Recognition Challenge | ASR | ENG | https://chimechallenge.github.io/chime6/ | | | clarity21 | The First Clarity Enhancement Challenge CEC1 | SE | ENG | https://claritychallenge.github.io/clarity_CEC1_doc/ | | | cmu_indic | CMU INDIC | TTS | 7 languages | http://festvox.org/cmu_indic/ | | | commonvoice | The Mozilla Common Voice | ASR | 13 languages | https://voice.mozilla.org/datasets | | diff --git a/egs2/TEMPLATE/asr1/asr.sh b/egs2/TEMPLATE/asr1/asr.sh index f4d7a8ad24a..763aceb7a34 100755 --- a/egs2/TEMPLATE/asr1/asr.sh +++ b/egs2/TEMPLATE/asr1/asr.sh @@ -755,7 +755,7 @@ if ! "${skip_train}"; then log "LM collect-stats started... log: '${_logdir}/stats.*.log'" # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.lm_train \ --collect_stats true \ @@ -771,7 +771,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/dev.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${lm_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${lm_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -967,7 +967,7 @@ if ! "${skip_train}"; then # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.asr_train \ --collect_stats true \ @@ -985,7 +985,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/valid.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${asr_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${asr_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -1242,7 +1242,7 @@ if ! "${skip_eval}"; then # 2. Submit decoding jobs log "Decoding started... log: '${_logdir}/asr_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \ ${python} -m ${asr_inference_tool} \ --batch_size ${batch_size} \ @@ -1252,7 +1252,7 @@ if ! "${skip_eval}"; then --asr_train_config "${asr_exp}"/config.yaml \ --asr_model_file "${asr_exp}"/"${inference_asr_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} ${inference_args} + ${_opts} ${inference_args} || { cat $(grep -l -i error "${_logdir}"/asr_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs for f in token token_int score text; do diff --git a/egs2/TEMPLATE/asr1/db.sh b/egs2/TEMPLATE/asr1/db.sh index 3d443b38a84..7cbc4eeb67b 100755 --- a/egs2/TEMPLATE/asr1/db.sh +++ b/egs2/TEMPLATE/asr1/db.sh @@ -23,6 +23,7 @@ REVERB= REVERB_OUT="${PWD}/REVERB" # Output file path CHIME3= CHIME4= +CHIME5= CSJDATATOP= CSJVER=dvd ## Set your CSJ format (dvd or usb). ## Usage : diff --git a/egs2/TEMPLATE/asr1/pyscripts/audio/format_wav_scp.py b/egs2/TEMPLATE/asr1/pyscripts/audio/format_wav_scp.py index cca465bb93c..06bb01f926b 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/audio/format_wav_scp.py +++ b/egs2/TEMPLATE/asr1/pyscripts/audio/format_wav_scp.py @@ -3,19 +3,19 @@ import logging from io import BytesIO from pathlib import Path -from typing import Tuple, Optional +from typing import Optional, Tuple -import kaldiio import humanfriendly +import kaldiio import numpy as np import resampy import soundfile from tqdm import tqdm from typeguard import check_argument_types -from espnet.utils.cli_utils import get_commandline_args from espnet2.fileio.read_text import read_2column_text from espnet2.fileio.sound_scp import SoundScpWriter +from espnet.utils.cli_utils import get_commandline_args def humanfriendly_or_none(value: str): diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/convert_text_to_phn.py b/egs2/TEMPLATE/asr1/pyscripts/utils/convert_text_to_phn.py index 21f8f4daf46..052b23ca636 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/convert_text_to_phn.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/convert_text_to_phn.py @@ -1,15 +1,16 @@ #!/usr/bin/env python3 -# Copyright 2021 Tomoki Hayashi +# Copyright 2021 Tomoki Hayashi and Gunnar Thor # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) """Convert kaldi-style text into phonemized sentences.""" import argparse import codecs +import contextlib -from joblib import delayed -from joblib import Parallel +from joblib import Parallel, delayed, parallel +from tqdm import tqdm from espnet2.text.cleaner import TextCleaner from espnet2.text.phoneme_tokenizer import PhonemeTokenizer @@ -34,13 +35,40 @@ def main(): text = {line.split()[0]: " ".join(line.split()[1:]) for line in lines} if cleaner is not None: text = {k: cleaner(v) for k, v in text.items()} - phns_list = Parallel(n_jobs=args.nj)( - [delayed(phoneme_tokenizer.text2tokens)(sentence) for sentence in text.values()] - ) + with tqdm_joblib(tqdm(total=len(text.values()), desc="Phonemizing")): + phns_list = Parallel(n_jobs=args.nj)( + [ + delayed(phoneme_tokenizer.text2tokens)(sentence) + for sentence in text.values() + ] + ) with codecs.open(args.out_text, "w", encoding="utf8") as g: for utt_id, phns in zip(text.keys(), phns_list): g.write(f"{utt_id} " + " ".join(phns) + "\n") +@contextlib.contextmanager +def tqdm_joblib(tqdm_object): + """Patch joblib to report into tqdm progress bar given as argument. + + Reference: + https://stackoverflow.com/questions/24983493 + + """ + + class TqdmBatchCompletionCallback(parallel.BatchCompletionCallBack): + def __call__(self, *args, **kwargs): + tqdm_object.update(n=self.batch_size) + return super().__call__(*args, **kwargs) + + old_batch_callback = parallel.BatchCompletionCallBack + parallel.BatchCompletionCallBack = TqdmBatchCompletionCallback + try: + yield tqdm_object + finally: + parallel.BatchCompletionCallBack = old_batch_callback + tqdm_object.close() + + if __name__ == "__main__": main() diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_f0.py b/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_f0.py index e27e57624ee..bc9a3709f99 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_f0.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_f0.py @@ -10,17 +10,13 @@ import logging import multiprocessing as mp import os - -from typing import Dict -from typing import List -from typing import Tuple +from typing import Dict, List, Tuple import librosa import numpy as np import pysptk import pyworld as pw import soundfile as sf - from fastdtw import fastdtw from scipy import spatial diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_mcd.py b/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_mcd.py index 379438217ea..213dc60b563 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_mcd.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_mcd.py @@ -10,16 +10,12 @@ import logging import multiprocessing as mp import os - -from typing import Dict -from typing import List -from typing import Tuple +from typing import Dict, List, Tuple import librosa import numpy as np import pysptk import soundfile as sf - from fastdtw import fastdtw from scipy import spatial diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/extract_xvectors.py b/egs2/TEMPLATE/asr1/pyscripts/utils/extract_xvectors.py index e64b82dc515..a58a844be0a 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/extract_xvectors.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/extract_xvectors.py @@ -3,14 +3,14 @@ # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) import argparse -import kaldiio import logging -from pathlib import Path -import sys -import torch import os -import numpy as np +import sys +from pathlib import Path +import kaldiio +import numpy as np +import torch from tqdm.contrib import tqdm from espnet2.fileio.sound_scp import SoundScpReader diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/plot_sinc_filters.py b/egs2/TEMPLATE/asr1/pyscripts/utils/plot_sinc_filters.py index 001ba49d34b..56e06d73d51 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/plot_sinc_filters.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/plot_sinc_filters.py @@ -12,10 +12,11 @@ """ import argparse +import sys +from pathlib import Path + import matplotlib.pyplot as plt import numpy as np -from pathlib import Path -import sys import torch diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/rotate_logfile.py b/egs2/TEMPLATE/asr1/pyscripts/utils/rotate_logfile.py new file mode 100755 index 00000000000..aa2818d3a9f --- /dev/null +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/rotate_logfile.py @@ -0,0 +1,59 @@ +#!/usr/bin/env python + +# Copyright 2022 Chaitanya Narisetty +# Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) + +"""Rotate log-file.""" + +import argparse +import shutil +from pathlib import Path + + +def rotate(path, max_num_log_files=1000): + """Rotate a log-file while retaining past `max_num_log_files` files. + Examples: + /some/path/ + ├──logfile.txt + ├──logfile.1.txt + ├──logfile.2.txt + >>> rotate('/some/path/logfile.txt') + /some/path/ + ├──logfile.1.txt + ├──logfile.2.txt + ├──logfile.3.txt + """ + for i in range(max_num_log_files - 1, -1, -1): + if i == 0: + p = Path(path) + pn = p.parent / (p.stem + ".1" + p.suffix) + else: + _p = Path(path) + p = _p.parent / (_p.stem + f".{i}" + _p.suffix) + pn = _p.parent / (_p.stem + f".{i + 1}" + _p.suffix) + + if p.exists(): + if i == max_num_log_files - 1: + p.unlink() + else: + shutil.move(p, pn) + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument( + "log_filepath", type=str, help="Path to log-file to be rotated." + ) + parser.add_argument( + "--max-num-log-files", + type=int, + help="Maximum number of log-files to be kept.", + default=1000, + ) + args = parser.parse_args() + + rotate(args.log_filepath, args.max_num_log_files) + + +if __name__ == "__main__": + main() diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/score_intent.py b/egs2/TEMPLATE/asr1/pyscripts/utils/score_intent.py index 4f0f074c9db..ccfba96010d 100755 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/score_intent.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/score_intent.py @@ -5,11 +5,12 @@ # Apache 2.0 +import argparse import os import re import sys + import pandas as pd -import argparse def get_classification_result(hyp_file, ref_file, hyp_write, ref_write): diff --git a/egs2/TEMPLATE/asr1/pyscripts/utils/score_summarization.py b/egs2/TEMPLATE/asr1/pyscripts/utils/score_summarization.py index 35202f1ce88..781ecebfd12 100644 --- a/egs2/TEMPLATE/asr1/pyscripts/utils/score_summarization.py +++ b/egs2/TEMPLATE/asr1/pyscripts/utils/score_summarization.py @@ -1,10 +1,9 @@ -import sys import os -from datasets import load_metric -import numpy as np -from nlgeval import compute_metrics -from nlgeval import NLGEval +import sys +import numpy as np +from datasets import load_metric +from nlgeval import NLGEval, compute_metrics ref_file = sys.argv[1] hyp_file = sys.argv[2] diff --git a/egs2/TEMPLATE/asr1/scripts/utils/evaluate_asr.sh b/egs2/TEMPLATE/asr1/scripts/utils/evaluate_asr.sh index 7d3da2bfbea..0cc2c632591 100755 --- a/egs2/TEMPLATE/asr1/scripts/utils/evaluate_asr.sh +++ b/egs2/TEMPLATE/asr1/scripts/utils/evaluate_asr.sh @@ -173,14 +173,14 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then # 2. Submit decoding jobs log "Decoding started... log: '${logdir}/asr_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${logdir}"/asr_inference.JOB.log \ python3 -m espnet2.bin.asr_inference \ --ngpu "${_ngpu}" \ --data_path_and_name_and_type "${wavscp},speech,sound" \ --key_file "${logdir}"/keys.JOB.scp \ --output_dir "${logdir}"/output.JOB \ - "${_opts[@]}" ${inference_args} + "${_opts[@]}" ${inference_args} || { cat $(grep -l -i error "${logdir}"/asr_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs for f in token token_int score text; do diff --git a/egs2/TEMPLATE/asr1/scripts/utils/show_translation_result.sh b/egs2/TEMPLATE/asr1/scripts/utils/show_translation_result.sh index c1c1bdf0882..125bc6e8910 100755 --- a/egs2/TEMPLATE/asr1/scripts/utils/show_translation_result.sh +++ b/egs2/TEMPLATE/asr1/scripts/utils/show_translation_result.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash mindepth=0 -maxdepth=3 +maxdepth=1 case=tc . utils/parse_options.sh @@ -44,24 +44,27 @@ cat << EOF EOF +# only show BLEU score for now metrics="bleu" - while IFS= read -r expdir; do if ls "${expdir}"/*/*/score_*/result.${case}.txt &> /dev/null; then echo "## $(basename ${expdir})" - for type in $metrics; do - cat << EOF + for type in ${metrics}; do + cat << EOF + ### ${type^^} -|dataset|bleu_score|verbose_score| +|dataset|score|verbose_score| |---|---|---| EOF - data=$(echo "${expdir}"/*/*/score_*/result.${case}.txt | cut -d '/' -f4) - bleu=$(sed -n '5p' "${expdir}"/*/*/score_*/result.${case}.txt | cut -d ' ' -f 3 | tr -d ',') - verbose=$(sed -n '7p' "${expdir}"/*/*/score_*/result.${case}.txt | cut -d ' ' -f 3- | tr -d '",') - echo "${data}|${bleu}|${verbose}" + for result in "${expdir}"/*/*/score_"${type}"/result."${case}".txt; do + inference_tag=$(echo "${result}" | rev | cut -d/ -f4 | rev) + test_set=$(echo "${result}" | rev | cut -d/ -f3 | rev) + score=$(sed -n '5p' "${result}" | cut -d ' ' -f 3 | tr -d ',') + verbose=$(sed -n '7p' "${result}" | cut -d ' ' -f 3- | tr -d '",') + echo "|${inference_tag}/${test_set}|${score}|${verbose}|" + done done fi - done < <(find ${exp} -mindepth ${mindepth} -maxdepth ${maxdepth} -type d) diff --git a/egs2/TEMPLATE/diar1/diar.sh b/egs2/TEMPLATE/diar1/diar.sh index 815c73537f4..b711d324eab 100755 --- a/egs2/TEMPLATE/diar1/diar.sh +++ b/egs2/TEMPLATE/diar1/diar.sh @@ -348,7 +348,7 @@ if ! "${skip_train}"; then # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.diar_train \ --collect_stats true \ @@ -360,7 +360,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/valid.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${diar_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${diar_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -510,7 +510,7 @@ if ! "${skip_eval}"; then # 2. Submit inference jobs log "Diarization started... log: '${_logdir}/diar_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/diar_inference.JOB.log \ ${python} -m espnet2.bin.diar_inference \ --ngpu "${_ngpu}" \ @@ -520,7 +520,7 @@ if ! "${skip_eval}"; then --train_config "${diar_exp}"/config.yaml \ --model_file "${diar_exp}"/"${inference_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} + ${_opts} || { cat $(grep -l -i error "${_logdir}"/diar_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs for i in $(seq "${_nj}"); do diff --git a/egs2/TEMPLATE/diar1/pyscripts/utils/convert_rttm.py b/egs2/TEMPLATE/diar1/pyscripts/utils/convert_rttm.py index d5d4b257b36..e3e1047d7bb 100755 --- a/egs2/TEMPLATE/diar1/pyscripts/utils/convert_rttm.py +++ b/egs2/TEMPLATE/diar1/pyscripts/utils/convert_rttm.py @@ -1,19 +1,20 @@ #!/usr/bin/env python3 +import argparse import collections.abc -import humanfriendly +import logging +import os +import re from pathlib import Path from typing import Union -import argparse -import logging +import humanfriendly import numpy as np -import re -import os import soundfile -from espnet2.utils.types import str_or_int from typeguard import check_argument_types +from espnet2.utils.types import str_or_int + def convert_rttm_text( path: Union[Path, str], diff --git a/egs2/TEMPLATE/diar1/pyscripts/utils/make_rttm.py b/egs2/TEMPLATE/diar1/pyscripts/utils/make_rttm.py index f8b9c8c05af..1f08fce0060 100755 --- a/egs2/TEMPLATE/diar1/pyscripts/utils/make_rttm.py +++ b/egs2/TEMPLATE/diar1/pyscripts/utils/make_rttm.py @@ -5,11 +5,13 @@ # Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) import argparse -from espnet2.fileio.npy_scp import NpyScpReader import logging + +import humanfriendly import numpy as np from scipy.signal import medfilt -import humanfriendly + +from espnet2.fileio.npy_scp import NpyScpReader def get_parser() -> argparse.Namespace: diff --git a/egs2/TEMPLATE/enh1/enh.sh b/egs2/TEMPLATE/enh1/enh.sh index db170043db6..864a0485df0 100755 --- a/egs2/TEMPLATE/enh1/enh.sh +++ b/egs2/TEMPLATE/enh1/enh.sh @@ -494,7 +494,7 @@ if ! "${skip_train}"; then # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.enh_train \ --collect_stats true \ @@ -504,7 +504,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/valid.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${enh_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${enh_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -652,7 +652,7 @@ if ! "${skip_eval}"; then # 2. Submit inference jobs log "Enhancement started... log: '${_logdir}/enh_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/enh_inference.JOB.log \ ${python} -m espnet2.bin.enh_inference \ --ngpu "${_ngpu}" \ @@ -663,7 +663,7 @@ if ! "${skip_eval}"; then ${inference_enh_config:+--inference_config "$inference_enh_config"} \ --model_file "${enh_exp}"/"${inference_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} ${inference_args} + ${_opts} ${inference_args} || { cat $(grep -l -i error "${_logdir}"/enh_inference.*.log) ; exit 1; } _spk_list=" " diff --git a/egs2/TEMPLATE/enh_asr1/enh_asr.sh b/egs2/TEMPLATE/enh_asr1/enh_asr.sh index fc720ddf94b..9ec09219613 100755 --- a/egs2/TEMPLATE/enh_asr1/enh_asr.sh +++ b/egs2/TEMPLATE/enh_asr1/enh_asr.sh @@ -794,7 +794,7 @@ if ! "${skip_train}"; then log "LM collect-stats started... log: '${_logdir}/stats.*.log'" # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.lm_train \ --collect_stats true \ @@ -810,7 +810,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/dev.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${lm_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${lm_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -937,7 +937,7 @@ if ! "${skip_train}"; then if "${use_ngram}"; then log "Stage 9: Ngram Training: train_set=${data_feats}/lm_train.txt" cut -f 2 -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa - build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin + build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin else log "Stage 9: Skip ngram stages: use_ngram=${use_ngram}" fi @@ -1335,7 +1335,7 @@ if ! "${skip_eval}"; then # 2. Submit inference jobs log "Enhancement started... log: '${_logdir}/enh_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/enh_inference.JOB.log \ ${python} -m espnet2.bin.enh_inference \ --enh_s2t_task true \ @@ -1347,7 +1347,7 @@ if ! "${skip_eval}"; then ${inference_enh_config:+--inference_config "$inference_enh_config"} \ --model_file "${enh_asr_exp}"/"${inference_enh_asr_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} ${enh_inference_args} + ${_opts} ${enh_inference_args} || { cat $(grep -l -i error "${_logdir}"/enh_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs _spk_list=" " @@ -1632,7 +1632,7 @@ if ! "${skip_upload_hf}"; then # Generate description file # shellcheck disable=SC2034 hf_task=speech-enhancement-recognition - # shellcheck disable=SC2034 + # shellcheck disable=SC2034 espnet_task=EnhS2T # shellcheck disable=SC2034 task_exp=${enh_asr_exp} diff --git a/egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh b/egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh deleted file mode 120000 index 6d6490d3760..00000000000 --- a/egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh +++ /dev/null @@ -1 +0,0 @@ -../../../enh1/scripts/utils/show_enh_score.sh \ No newline at end of file diff --git a/egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh b/egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh new file mode 100755 index 00000000000..e135d73f91f --- /dev/null +++ b/egs2/TEMPLATE/enh_asr1/scripts/utils/show_enh_score.sh @@ -0,0 +1,84 @@ +#!/usr/bin/env bash +mindepth=0 +maxdepth=1 + +. utils/parse_options.sh + +if [ $# -gt 1 ]; then + echo "Usage: $0 --mindepth 0 --maxdepth 1 [exp]" 1>&2 + echo "" + echo "Show the system environments and the evaluation results in Markdown format." + echo 'The default of is "exp/".' + exit 1 +fi + +[ -f ./path.sh ] && . ./path.sh +set -euo pipefail +if [ $# -eq 1 ]; then + exp=$(realpath "$1") +else + exp=exp +fi + + +cat << EOF + +# RESULTS +## Environments +- date: \`$(LC_ALL=C date)\` +EOF + +python3 << EOF +import sys, espnet, torch +pyversion = sys.version.replace('\n', ' ') + +print(f"""- python version: \`{pyversion}\` +- espnet version: \`espnet {espnet.__version__}\` +- pytorch version: \`pytorch {torch.__version__}\`""") +EOF + +cat << EOF +- Git hash: \`$(git rev-parse HEAD)\` + - Commit date: \`$(git log -1 --format='%cd')\` + +EOF + + +while IFS= read -r expdir; do + if ls "${expdir}"/*/scoring_enh/result_stoi.txt &> /dev/null; then + echo -e "\n## $(basename ${expdir})\n" + [ -e "${expdir}"/config.yaml ] && grep ^config "${expdir}"/config.yaml + metrics=() + heading="\n|dataset|" + sep="|---|" + for type in pesq estoi stoi sar sdr sir si_snr; do + if ls "${expdir}"/*/scoring_enh/result_${type}.txt &> /dev/null; then + metrics+=("$type") + heading+="${type^^}|" + sep+="---|" + fi + done + echo -e "${heading}\n${sep}" + + setnames=() + for dirname in "${expdir}"/*/scoring_enh/result_stoi.txt; do + dset=$(echo $dirname | sed -e "s#${expdir}/\([^/]*\)/scoring_enh/result_stoi.txt#\1#g") + setnames+=("$dset") + done + for dset in "${setnames[@]}"; do + line="|${dset}|" + for ((i=0; i<${#metrics[@]}; i++)); do + type=${metrics[$i]} + if [ -f "${expdir}"/${dset}/scoring_enh/result_${type}.txt ]; then + score=$(head -n1 "${expdir}"/${dset}/scoring_enh/result_${type}.txt) + else + score="" + fi + line+="${score}|" + done + echo $line + done + echo "" + fi + +done < <(find ${exp} -mindepth ${mindepth} -maxdepth ${maxdepth} -type d) diff --git a/egs2/TEMPLATE/enh_st1/enh_st.sh b/egs2/TEMPLATE/enh_st1/enh_st.sh index eabf49cc29d..b27f986e582 100755 --- a/egs2/TEMPLATE/enh_st1/enh_st.sh +++ b/egs2/TEMPLATE/enh_st1/enh_st.sh @@ -551,7 +551,7 @@ if ! "${skip_data_prep}"; then done utils/combine_data.sh --extra_files "${utt_extra_files} ${_scp_list}" "data/${train_set}_sp" ${_dirs} for extra_file in ${utt_extra_files}; do - python pyscripts/utils/remove_duplicate_keys.py data/"${train_set}_sp"/${extra_file} > data/"${train_set}_sp"/${extra_file}.tmp + python pyscripts/utils/remove_duplicate_keys.py data/"${train_set}_sp"/${extra_file} > data/"${train_set}_sp"/${extra_file}.tmp mv data/"${train_set}_sp"/${extra_file}.tmp data/"${train_set}_sp"/${extra_file} done else @@ -593,7 +593,7 @@ if ! "${skip_data_prep}"; then fi cp ${single_file} "${data_feats}${_suf}/${dset}" expand_utt_extra_files="${expand_utt_extra_files} $(basename ${single_file})" - done + done done echo "${expand_utt_extra_files}" utils/fix_data_dir.sh --utt_extra_files "${expand_utt_extra_files}" "${data_feats}${_suf}/${dset}" @@ -727,9 +727,9 @@ if ! "${skip_data_prep}"; then utils/fix_data_dir.sh --utt_extra_files "${utt_extra_files}" "${data_feats}/${dset}" for utt_extra_file in ${utt_extra_files}; do python pyscripts/utils/remove_duplicate_keys.py ${data_feats}/${dset}/${utt_extra_file} \ - > ${data_feats}/${dset}/${utt_extra_file}.tmp + > ${data_feats}/${dset}/${utt_extra_file}.tmp mv ${data_feats}/${dset}/${utt_extra_file}.tmp ${data_feats}/${dset}/${utt_extra_file} - done + done done # shellcheck disable=SC2002 @@ -934,7 +934,7 @@ if ! "${skip_train}"; then log "LM collect-stats started... log: '${_logdir}/stats.*.log'" # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.lm_train \ --collect_stats true \ @@ -950,7 +950,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/dev.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${lm_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${lm_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -1078,7 +1078,7 @@ if ! "${skip_train}"; then if "${use_ngram}"; then log "Stage 9: Ngram Training: train_set=${data_feats}/lm_train.txt" cut -f 2 -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa - build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin + build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin else log "Stage 9: Skip ngram stages: use_ngram=${use_ngram}" fi @@ -1148,7 +1148,7 @@ if ! "${skip_train}"; then # but it's used only for deciding the sample ids. # TODO(jiatong): fix different bpe model - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.enh_s2t_train \ --collect_stats true \ @@ -1173,7 +1173,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/valid.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${enh_st_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${enh_st_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -1436,7 +1436,7 @@ if ! "${skip_eval}"; then # 2. Submit decoding jobs log "Decoding started... log: '${_logdir}/st_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/st_inference.JOB.log \ ${python} -m ${st_inference_tool} \ --enh_s2t_task true \ @@ -1447,7 +1447,7 @@ if ! "${skip_eval}"; then --st_train_config "${enh_st_exp}"/config.yaml \ --st_model_file "${enh_st_exp}"/"${inference_enh_st_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} ${st_inference_args} + ${_opts} ${st_inference_args} || { cat $(grep -l -i error "${_logdir}"/st_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs for f in token token_int score text; do @@ -1773,11 +1773,11 @@ if ! "${skip_upload_hf}"; then gitlfs=$(git lfs --version 2> /dev/null || true) [ -z "${gitlfs}" ] && \ log "ERROR: You need to install git-lfs first" && \ - exit 1 - + exit 1 + dir_repo=${expdir}/hf_${hf_repo//"/"/"_"} [ ! -d "${dir_repo}" ] && git clone https://huggingface.co/${hf_repo} ${dir_repo} - + if command -v git &> /dev/null; then _creator_name="$(git config user.name)" _checkout="git checkout $(git show -s --format=%H)" @@ -1790,13 +1790,13 @@ if ! "${skip_upload_hf}"; then # foo/asr1 -> foo _corpus="${_task%/*}" _model_name="${_creator_name}/${_corpus}_$(basename ${packed_model} .zip)" - + # copy files in ${dir_repo} unzip -o ${packed_model} -d ${dir_repo} # Generate description file # shellcheck disable=SC2034 hf_task=speech-enhancement-translation - # shellcheck disable=SC2034 + # shellcheck disable=SC2034 espnet_task=EnhS2T # shellcheck disable=SC2034 task_exp=${enh_st_exp} diff --git a/egs2/TEMPLATE/mt1/mt.sh b/egs2/TEMPLATE/mt1/mt.sh index 587b4ebf534..02260cb3a4d 100755 --- a/egs2/TEMPLATE/mt1/mt.sh +++ b/egs2/TEMPLATE/mt1/mt.sh @@ -455,7 +455,7 @@ if ! "${skip_data_prep}"; then log "Stage 1: Data preparation for data/${train_set}, data/${valid_set}, etc." # [Task dependent] Need to create data.sh for new corpus local/data.sh ${local_data_opts} - + fi if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then @@ -474,7 +474,7 @@ if ! "${skip_data_prep}"; then # with regex to suuport multi-references for single_file in $(ls data/"${dset}"/${extra_file}*); do cp ${single_file} "${data_feats}${_suf}/${dset}" - done + done done echo "${feats_type}" > "${data_feats}${_suf}/${dset}/feats_type" done @@ -702,7 +702,7 @@ if ! "${skip_train}"; then log "LM collect-stats started... log: '${_logdir}/stats.*.log'" # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.lm_train \ --collect_stats true \ @@ -718,7 +718,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/dev.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${lm_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${lm_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -845,7 +845,7 @@ if ! "${skip_train}"; then if "${use_ngram}"; then log "Stage 8: Ngram Training: train_set=${data_feats}/lm_train.txt" cut -f 2 -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa - build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin + build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin else log "Stage 8: Skip ngram stages: use_ngram=${use_ngram}" fi @@ -1132,7 +1132,7 @@ if ! "${skip_eval}"; then # 2. Submit decoding jobs log "Decoding started... log: '${_logdir}/mt_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/mt_inference.JOB.log \ ${python} -m ${mt_inference_tool} \ --batch_size ${batch_size} \ @@ -1142,7 +1142,7 @@ if ! "${skip_eval}"; then --mt_train_config "${mt_exp}"/config.yaml \ --mt_model_file "${mt_exp}"/"${inference_mt_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} ${inference_args} + ${_opts} ${inference_args} || { cat $(grep -l -i error "${_logdir}"/mt_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs for f in token token_int score text; do @@ -1205,7 +1205,7 @@ if ! "${skip_eval}"; then # ) \ # <(<"${_data}/text.${tgt_case}.${tgt_lang}" awk '{ print "(" $2 "-" $1 ")" }') \ # >"${_scoredir}/hyp.trn.org" - + # remove utterance id #perl -pe 's/\([^\)]+\)//g;' "${_scoredir}/ref.trn.org" > "${_scoredir}/ref.trn" #perl -pe 's/\([^\)]+\)//g;' "${_scoredir}/hyp.trn.org" > "${_scoredir}/hyp.trn" @@ -1215,19 +1215,19 @@ if ! "${skip_eval}"; then detokenizer.perl -l ${tgt_lang} -q < "${_scoredir}/hyp.trn" > "${_scoredir}/hyp.trn.detok" if [ ${tgt_case} = "tc" ]; then - echo "Case sensitive BLEU result (single-reference)" >> ${_scoredir}/result.tc.txt + echo "Case sensitive BLEU result (single-reference)" > ${_scoredir}/result.tc.txt sacrebleu "${_scoredir}/ref.trn.detok" \ -i "${_scoredir}/hyp.trn.detok" \ -m bleu chrf ter \ >> ${_scoredir}/result.tc.txt - + log "Write a case-sensitive BLEU (single-reference) result in ${_scoredir}/result.tc.txt" fi # detokenize & remove punctuation except apostrophe remove_punctuation.pl < "${_scoredir}/ref.trn.detok" > "${_scoredir}/ref.trn.detok.lc.rm" remove_punctuation.pl < "${_scoredir}/hyp.trn.detok" > "${_scoredir}/hyp.trn.detok.lc.rm" - echo "Case insensitive BLEU result (single-reference)" >> ${_scoredir}/result.lc.txt + echo "Case insensitive BLEU result (single-reference)" > ${_scoredir}/result.lc.txt sacrebleu -lc "${_scoredir}/ref.trn.detok.lc.rm" \ -i "${_scoredir}/hyp.trn.detok.lc.rm" \ -m bleu chrf ter \ @@ -1252,8 +1252,8 @@ if ! "${skip_eval}"; then ) \ <(<"${_data}/text.${tgt_case}.${tgt_lang}" awk '{ print "(" $2 "-" $1 ")" }') \ >"${_scoredir}/ref.trn.org.${ref_idx}" - - # + + # perl -pe 's/\([^\)]+\)//g;' "${_scoredir}/ref.trn.org.${ref_idx}" > "${_scoredir}/ref.trn.${ref_idx}" detokenizer.perl -l ${tgt_lang} -q < "${_scoredir}/ref.trn.${ref_idx}" > "${_scoredir}/ref.trn.detok.${ref_idx}" remove_punctuation.pl < "${_scoredir}/ref.trn.detok.${ref_idx}" > "${_scoredir}/ref.trn.detok.lc.rm.${ref_idx}" @@ -1279,7 +1279,7 @@ if ! "${skip_eval}"; then # Show results in Markdown syntax scripts/utils/show_translation_result.sh --case $tgt_case "${mt_exp}" > "${mt_exp}"/RESULTS.md - cat "${cat_exp}"/RESULTS.md + cat "${mt_exp}"/RESULTS.md fi else log "Skip the evaluation stages" @@ -1386,11 +1386,11 @@ if ! "${skip_upload_hf}"; then gitlfs=$(git lfs --version 2> /dev/null || true) [ -z "${gitlfs}" ] && \ log "ERROR: You need to install git-lfs first" && \ - exit 1 - + exit 1 + dir_repo=${expdir}/hf_${hf_repo//"/"/"_"} [ ! -d "${dir_repo}" ] && git clone https://huggingface.co/${hf_repo} ${dir_repo} - + if command -v git &> /dev/null; then _creator_name="$(git config user.name)" _checkout="git checkout $(git show -s --format=%H)" @@ -1403,13 +1403,13 @@ if ! "${skip_upload_hf}"; then # foo/asr1 -> foo _corpus="${_task%/*}" _model_name="${_creator_name}/${_corpus}_$(basename ${packed_model} .zip)" - + # copy files in ${dir_repo} unzip -o ${packed_model} -d ${dir_repo} # Generate description file # shellcheck disable=SC2034 hf_task=machine-translation - # shellcheck disable=SC2034 + # shellcheck disable=SC2034 espnet_task=MT # shellcheck disable=SC2034 task_exp=${mt_exp} diff --git a/egs2/TEMPLATE/ssl1/hubert.sh b/egs2/TEMPLATE/ssl1/hubert.sh index 8a6f7590cb8..027b6636782 100755 --- a/egs2/TEMPLATE/ssl1/hubert.sh +++ b/egs2/TEMPLATE/ssl1/hubert.sh @@ -143,7 +143,7 @@ Options: # Pretrain related --pretrain_configs # configration files of pretraining stage --n_clusters # number of k-means clusters of pretraining stage - --features_km # feature for k-means clustering of pretraining stage + --features_km # feature for k-means clustering of pretraining stage --pt_args # Arguments for hubert model pretraining (default="${pt_args}"). # e.g., --pt_args "--max_epoch 10" # Note that it will overwrite args in pt config. @@ -180,7 +180,7 @@ fi [ -z "${valid_set}" ] && { log "${help_message}"; log "Error: --valid_set is required"; exit 2; }; # Check pretrain_config, n_clusters and feature list -pretrain_config_list=(${pretrain_configs// / }) +pretrain_config_list=(${pretrain_configs// / }) n_clusters_list=(${n_clusters// / }) feature_list=(${features_km// / }) if ! [ ${pretrain_start_iter} -le ${pretrain_stop_iter} ]; then @@ -227,7 +227,7 @@ fi if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then if [ "${feats_type}" = raw ]; then log "Stage 3: Format wav.scp: data/ -> ${data_feats}" - + # ====== Recreating "wav.scp" ====== # Kaldi-wav.scp, which can describe the file path with unix-pipe, like "cat /some/path |", # shouldn't be used in training process. @@ -235,7 +235,7 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then # and it can also change the audio-format and sampling rate. # If nothing is need, then format_wav_scp.sh does nothing: # i.e. the input file format and rate is same as the output. - + for dset in "${train_set}" "${valid_set}"; do _suf="/org" utils/copy_data_dir.sh --validate_opts --non-print data/"${dset}" "${data_feats}${_suf}/${dset}" @@ -253,7 +253,7 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then scripts/audio/format_wav_scp.sh --nj "${nj}" --cmd "${train_cmd}" \ --audio-format "${audio_format}" --fs "${fs}" ${_opts} \ "data/${dset}/wav.scp" "${data_feats}${_suf}/${dset}" - + echo "${feats_type}" > "${data_feats}${_suf}/${dset}/feats_type" done else @@ -265,21 +265,21 @@ fi if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then log "Stage 4: Remove long/short data: ${data_feats}/org -> ${data_feats}" - + # NOTE(kamo): Not applying to test_sets to keep original data for dset in "${train_set}" "${valid_set}"; do - + # Copy data dir utils/copy_data_dir.sh --validate_opts --non-print "${data_feats}/org/${dset}" "${data_feats}/${dset}" cp "${data_feats}/org/${dset}/feats_type" "${data_feats}/${dset}/feats_type" - + # Remove short utterances _feats_type="$(<${data_feats}/${dset}/feats_type)" if [ "${_feats_type}" = raw ]; then _fs=$(python3 -c "import humanfriendly as h;print(h.parse_size('${fs}'))") _min_length=$(python3 -c "print(int(${min_wav_duration} * ${_fs}))") _max_length=$(python3 -c "print(int(${max_wav_duration} * ${_fs}))") - + # utt2num_samples is created by format_wav_scp.sh <"${data_feats}/org/${dset}/utt2num_samples" \ awk -v min_length="${_min_length}" -v max_length="${_max_length}" \ @@ -291,11 +291,11 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then else log "Error: not supported: --feats_type ${feats_type}" fi - + # Remove empty text <"${data_feats}/org/${dset}/text" \ awk ' { if( NF != 1 ) print $0; } ' >"${data_feats}/${dset}/text" - + # fix_data_dir.sh leaves only utts which exist in all files utils/fix_data_dir.sh "${data_feats}/${dset}" done @@ -303,7 +303,7 @@ fi if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then - + for ((iter=${pretrain_start_iter}; iter<=${pretrain_stop_iter};iter++)); do asr_config="${pretrain_config_list[${iter}]}" if [ "${lang}" != noinfo ]; then @@ -311,25 +311,25 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then else asr_stats_dir="${expdir}/pretrain_iter${iter}_stats_${feats_type}" fi - + if [ -n "${asr_config}" ]; then asr_tag="$(basename "${asr_config}" .yaml)_${feats_type}" else asr_tag="train_${feats_type}" fi - + asr_exp="${expdir}/pretrain_${asr_tag}_iter${iter}" - + train_set_plabel=$(eval "echo ${train_set}_\${feature_list[${iter}]}_km\${n_clusters_list[${iter}]}") valid_set_plabel=$(eval "echo ${valid_set}_\${feature_list[${iter}]}_km\${n_clusters_list[${iter}]}") - + feats_km="${feature_list[${iter}]}" n_clusters="${n_clusters_list[${iter}]}" dictdir="./data/${feats_km}_km${n_clusters}_token_list_iter${iter}/${token_type}" - + if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then log "Stage 5.iter${iter}: Running ${n_clusters} cluster K-means on ${feats_km} feature." - + if [ ${iter} -eq 0 ] || [ ${feats_km} == "mfcc" ]; then ./scripts/km.sh \ --train_set "${train_set}" \ @@ -354,21 +354,21 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then --hubert_dir_path "${expdir}/pretrained_model_iter$((iter-1))"/valid.acc.best.pth fi fi - + if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then _asr_train_dir="${data_feats}/${train_set_plabel}" _asr_valid_dir="${data_feats}/${valid_set_plabel}" - + log "Stage 6.iter${iter}: ${feats_km} pretrain model collect stats: \ train_set=${_asr_train_dir}, valid_set=${_asr_valid_dir}" - + _opts= if [ -n "${asr_config}" ]; then # To generate the config file: e.g. # % python3 -m espnet2.bin.asr_train --print_config --optim adam _opts+="--config ${asr_config} " fi - + _feats_type="$(<${_asr_train_dir}/feats_type)" if [ "${_feats_type}" = raw ]; then _scp=wav.scp @@ -385,14 +385,14 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then _input_size="$(<${_asr_train_dir}/feats_dim)" _opts+="--input_size=${_input_size} " fi - + # 1. Split the key file _logdir="${asr_stats_dir}/logdir" mkdir -p "${_logdir}" - + # Get the minimum number among ${nj} and the number lines of input files _nj=$(min "${nj}" "$(<${_asr_train_dir}/${_scp} wc -l)" "$(<${_asr_valid_dir}/${_scp} wc -l)") - + key_file="${_asr_train_dir}/${_scp}" split_scps="" for n in $(seq "${_nj}"); do @@ -400,7 +400,7 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then done # shellcheck disable=SC2086 utils/split_scp.pl "${key_file}" ${split_scps} - + key_file="${_asr_valid_dir}/${_scp}" split_scps="" for n in $(seq "${_nj}"); do @@ -408,18 +408,18 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then done # shellcheck disable=SC2086 utils/split_scp.pl "${key_file}" ${split_scps} - + # 2. Generate run.sh log "Generate '${asr_stats_dir}/run.sh'. You can resume the process from stage 5.iter${iter} using this script" mkdir -p "${asr_stats_dir}"; echo "${run_args} --stage 6 \"\$@\"; exit \$?" > "${asr_stats_dir}/run.sh"; chmod +x "${asr_stats_dir}/run.sh" - + # 3. Submit jobs log "Hubert pretraining collect-stats started... log: '${_logdir}/stats.*.log'" - + # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - - # shellcheck disable=SC2086 + + # shellcheck disableSC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.hubert_train \ --collect_stats true \ @@ -439,8 +439,8 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then --valid_shape_file "${_logdir}/valid.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ --hubert_dict "${dictdir}/dict.txt" \ - ${_opts} ${pt_args} || { cat "${_logdir}"/stats.1.log; exit 1; } - + ${_opts} ${pt_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } + # 4. Aggregate shape files _opts= for i in $(seq "${_nj}"); do @@ -448,30 +448,30 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then done # shellcheck disable=SC2086 ${python} -m espnet2.bin.aggregate_stats_dirs ${_opts} --output_dir "${asr_stats_dir}" - + # Append the num-tokens at the last dimensions. This is used for batch-bins count <"${asr_stats_dir}/train/text_shape" \ awk -v N="$(<${dictdir}/tokens.txt wc -l)" '{ print $0 "," N }' \ >"${asr_stats_dir}/train/text_shape.${token_type}" - + <"${asr_stats_dir}/valid/text_shape" \ awk -v N="$(<${dictdir}/tokens.txt wc -l)" '{ print $0 "," N }' \ >"${asr_stats_dir}/valid/text_shape.${token_type}" fi - + if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then _asr_train_dir="${data_feats}/${train_set_plabel}" _asr_valid_dir="${data_feats}/${valid_set_plabel}" - + log "Stage 7.iter${iter}: Hubert Pretraining: train_set=${_asr_train_dir}, valid_set=${_asr_valid_dir}" - + _opts= if [ -n "${asr_config}" ]; then # To generate the config file: e.g. # % python3 -m espnet2.bin.hubert_train --print_config --optim adam _opts+="--config ${asr_config} " fi - + _feats_type="$(<${_asr_train_dir}/feats_type)" if [ "${_feats_type}" = raw ]; then _scp=wav.scp @@ -488,14 +488,14 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then _type=kaldi_ark _fold_length="${asr_speech_fold_length}" _input_size="$(<${_asr_train_dir}/feats_dim)" - _opts+="--input_size=${_input_size} " + _opts+="--input_size=${_input_size} " fi - + if [ "${num_splits_asr}" -gt 1 ]; then # If you met a memory error when parsing text files, this option may help you. # The corpus is split into subsets and each subset is used for training one by one in order, # so the memory footprint can be limited to the memory required for each dataset. - + _split_dir="${asr_stats_dir}/splits${num_splits_asr}" if [ ! -f "${_split_dir}/.done" ]; then rm -f "${_split_dir}/.done" @@ -511,23 +511,23 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then else log "${_split_dir}/.done exists. Spliting is skipped" fi - + _opts+="--train_data_path_and_name_and_type ${_split_dir}/${_scp},speech,${_type} " _opts+="--train_data_path_and_name_and_type ${_split_dir}/text,text,text " _opts+="--train_shape_file ${_split_dir}/speech_shape " _opts+="--train_shape_file ${_split_dir}/text_shape.${token_type} " _opts+="--multiple_iterator true " - + else _opts+="--train_data_path_and_name_and_type ${_asr_train_dir}/${_scp},speech,${_type} " _opts+="--train_data_path_and_name_and_type ${_asr_train_dir}/text,text,text " _opts+="--train_shape_file ${asr_stats_dir}/train/speech_shape " _opts+="--train_shape_file ${asr_stats_dir}/train/text_shape.${token_type} " fi - + log "Generate '${asr_exp}/run.sh'. You can resume the process from stage 6 using this script" mkdir -p "${asr_exp}"; echo "${run_args} --stage 7 \"\$@\"; exit \$?" > "${asr_exp}/run.sh"; chmod +x "${asr_exp}/run.sh" - + # NOTE(kamo): --fold_length is used only if --batch_type=folded and it's ignored in the other case log "Hubert pretraining started... log: '${asr_exp}/train.log'" if echo "${cuda_cmd}" | grep -e queue.pl -e queue-freegpu.pl &> /dev/null; then @@ -536,7 +536,7 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then else jobname="${asr_exp}/train.log" fi - + # shellcheck disable=SC2086 ${python} -m espnet2.bin.launch \ --cmd "${cuda_cmd} --name ${jobname}" \ @@ -564,19 +564,19 @@ if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 5 ]; then --output_dir "${asr_exp}" \ --hubert_dict "${dictdir}/dict.txt" \ ${_opts} ${pt_args} - + if [ "${iter}" -ge 0 ]; then log "Create a symbolic link of the pretrained model" if [ -L "${expdir}/pretrained_model_iter${iter}" ]; then log "Symbolic link ${expdir}/pretrained_model_iter${iter} already exists, remove it." rm "${expdir}/pretrained_model_iter${iter}" fi - + if ! [ -z "${asr_exp}" ]; then ln -s "../${asr_exp}" "${expdir}/pretrained_model_iter${iter}" fi fi - + log "Model saved in: ${asr_exp}" else log "Skip the pretraining stages" diff --git a/egs2/TEMPLATE/ssl1/pyscripts/dump_km_label.py b/egs2/TEMPLATE/ssl1/pyscripts/dump_km_label.py index 552c84f89ad..5880675c531 100644 --- a/egs2/TEMPLATE/ssl1/pyscripts/dump_km_label.py +++ b/egs2/TEMPLATE/ssl1/pyscripts/dump_km_label.py @@ -1,16 +1,14 @@ import argparse import logging import os +import pdb import sys -import numpy as np - import joblib +import numpy as np import torch import tqdm -import pdb - -from sklearn_km import MfccFeatureReader, get_path_iterator, HubertFeatureReader +from sklearn_km import HubertFeatureReader, MfccFeatureReader, get_path_iterator logging.basicConfig( level=logging.DEBUG, diff --git a/egs2/TEMPLATE/ssl1/pyscripts/feature_loader.py b/egs2/TEMPLATE/ssl1/pyscripts/feature_loader.py index b0dae8a2074..16fdd8c58f2 100644 --- a/egs2/TEMPLATE/ssl1/pyscripts/feature_loader.py +++ b/egs2/TEMPLATE/ssl1/pyscripts/feature_loader.py @@ -7,14 +7,13 @@ # Paper: https://arxiv.org/pdf/2106.07447.pdf # Code in Fairseq: https://github.com/pytorch/fairseq/tree/master/examples/hubert -"""Extract MFCC & intermediate embedding from the Hubert model for k-means clustering.""" +"""Extract MFCC & intermediate embedding from the Hubert model for k-means clustering""" import logging import os import sys import fairseq - import soundfile as sf import torch import torchaudio diff --git a/egs2/TEMPLATE/ssl1/pyscripts/sklearn_km.py b/egs2/TEMPLATE/ssl1/pyscripts/sklearn_km.py index ce0c82fcd3c..d97e9df26c1 100644 --- a/egs2/TEMPLATE/ssl1/pyscripts/sklearn_km.py +++ b/egs2/TEMPLATE/ssl1/pyscripts/sklearn_km.py @@ -8,28 +8,24 @@ import argparse import logging +import math import os import sys -from random import sample import warnings +from random import sample +import fairseq import joblib import numpy as np -import math - import soundfile as sf import torch import torchaudio import tqdm - +from feature_loader import HubertFeatureReader, MfccFeatureReader from sklearn.cluster import MiniBatchKMeans -import fairseq from espnet2.asr.encoder.hubert_encoder import FairseqHubertEncoder -from feature_loader import MfccFeatureReader -from feature_loader import HubertFeatureReader - logging.basicConfig( level=logging.DEBUG, format="%(asctime)s (%(module)s:%(lineno)d) %(levelname)s: %(message)s", diff --git a/egs2/TEMPLATE/st1/st.sh b/egs2/TEMPLATE/st1/st.sh index 895667e1525..b37cd3c5f22 100755 --- a/egs2/TEMPLATE/st1/st.sh +++ b/egs2/TEMPLATE/st1/st.sh @@ -111,6 +111,7 @@ hf_repo= # Decoding related use_k2=false # Whether to use k2 based decoder +use_streaming=false # Whether to use streaming decoding batch_size=1 inference_tag= # Suffix to the result dir for decoding. inference_config= # Config for decoding. @@ -138,7 +139,6 @@ lm_test_text= # Text file path of language model evaluation set. nlsyms_txt=none # Non-linguistic symbol list if existing. cleaner=none # Text cleaner. g2p=none # g2p method (needed if token_type=phn). -lang=noinfo # The language type of corpus. score_opts= # The options given to sclite scoring local_score_opts= # The options given to local/score.sh. st_speech_fold_length=800 # fold_length for speech data during ST training. @@ -249,7 +249,6 @@ Options: --nlsyms_txt # Non-linguistic symbol list if existing (default="${nlsyms_txt}"). --cleaner # Text cleaner (default="${cleaner}"). --g2p # g2p method (default="${g2p}"). - --lang # The language type of corpus (default=${lang}). --score_opts # The options given to sclite scoring (default="{score_opts}"). --local_score_opts # The options given to local/score.sh (default="{local_score_opts}"). --st_speech_fold_length # fold_length for speech data during ST training (default="${st_speech_fold_length}"). @@ -306,11 +305,7 @@ utt_extra_files="text.${src_case}.${src_lang} text.${tgt_case}.${tgt_lang}" [ -z "${lm_test_text}" ] && lm_test_text="${data_feats}/${test_sets%% *}/text.${tgt_case}.${tgt_lang}" # Check tokenization type -if [ "${lang}" != noinfo ]; then - token_listdir=data/${lang}_token_list -else - token_listdir=data/token_list -fi +token_listdir=data/${src_lang}_${tgt_lang}_token_list # The tgt bpedir is set for all cases when using bpe tgt_bpedir="${token_listdir}/tgt_bpe_${tgt_bpemode}${tgt_nbpe}" tgt_bpeprefix="${tgt_bpedir}"/bpe @@ -385,11 +380,7 @@ if [ -z "${st_tag}" ]; then else st_tag="train_${feats_type}" fi - if [ "${lang}" != noinfo ]; then - st_tag+="_${lang}_${tgt_token_type}_${tgt_case}" - else - st_tag+="_${tgt_token_type}_${tgt_case}" - fi + st_tag+="_${src_lang}_${tgt_lang}_${tgt_token_type}_${tgt_case}" if [ "${tgt_token_type}" = bpe ]; then st_tag+="${tgt_nbpe}" fi @@ -407,11 +398,7 @@ if [ -z "${lm_tag}" ]; then else lm_tag="train" fi - if [ "${lang}" != noinfo ]; then - lm_tag+="_${lang}_${lm_token_type}" - else - lm_tag+="_${lm_token_type}" - fi + lm_tag+="_${src_lang}_${tgt_lang}_${lm_token_type}" if [ "${lm_token_type}" = bpe ]; then lm_tag+="${tgt_nbpe}" fi @@ -423,11 +410,7 @@ fi # The directory used for collect-stats mode if [ -z "${st_stats_dir}" ]; then - if [ "${lang}" != noinfo ]; then - st_stats_dir="${expdir}/st_stats_${feats_type}_${lang}_${tgt_token_type}" - else - st_stats_dir="${expdir}/st_stats_${feats_type}_${tgt_token_type}" - fi + st_stats_dir="${expdir}/st_stats_${feats_type}_${src_lang}_${tgt_lang}_${tgt_token_type}" if [ "${tgt_token_type}" = bpe ]; then st_stats_dir+="${tgt_nbpe}" fi @@ -436,11 +419,7 @@ if [ -z "${st_stats_dir}" ]; then fi fi if [ -z "${lm_stats_dir}" ]; then - if [ "${lang}" != noinfo ]; then - lm_stats_dir="${expdir}/lm_stats_${lang}_${lm_token_type}" - else - lm_stats_dir="${expdir}/lm_stats_${lm_token_type}" - fi + lm_stats_dir="${expdir}/lm_stats_${src_lang}_${tgt_lang}_${lm_token_type}" if [ "${lm_token_type}" = bpe ]; then lm_stats_dir+="${tgt_nbpe}" fi @@ -504,9 +483,9 @@ if ! "${skip_data_prep}"; then done utils/combine_data.sh --extra_files "${utt_extra_files}" "data/${train_set}_sp" ${_dirs} for extra_file in ${utt_extra_files}; do - python pyscripts/utils/remove_duplicate_keys.py data/"${train_set}_sp"/${extra_file} > data/"${train_set}_sp"/${extra_file}.tmp + python pyscripts/utils/remove_duplicate_keys.py data/"${train_set}_sp"/${extra_file} > data/"${train_set}_sp"/${extra_file}.tmp mv data/"${train_set}_sp"/${extra_file}.tmp data/"${train_set}_sp"/${extra_file} - done + done else log "Skip stage 2: Speed perturbation" fi @@ -539,11 +518,11 @@ if ! "${skip_data_prep}"; then # expand the utt_extra_files for multi-references expand_utt_extra_files="" for extra_file in ${utt_extra_files}; do - # with regex to suuport multi-references + # with regex to support multi-references for single_file in $(ls data/"${dset}"/${extra_file}*); do cp ${single_file} "${data_feats}${_suf}/${dset}" expand_utt_extra_files="${expand_utt_extra_files} $(basename ${single_file})" - done + done done echo "${expand_utt_extra_files}" utils/fix_data_dir.sh --utt_extra_files "${expand_utt_extra_files}" "${data_feats}${_suf}/${dset}" @@ -584,11 +563,11 @@ if ! "${skip_data_prep}"; then # expand the utt_extra_files for multi-references expand_utt_extra_files="" for extra_file in ${utt_extra_files}; do - # with regex to suuport multi-references + # with regex to support multi-references for single_file in $(ls data/"${dset}"/${extra_file}*); do cp ${single_file} "${data_feats}${_suf}/${dset}" expand_utt_extra_files="${expand_utt_extra_files} $(basename ${single_file})" - done + done done for extra_file in ${expand_utt_extra_files}; do LC_ALL=C sort -u -k1,1 "${data_feats}${_suf}/${dset}/${extra_file}" -o "${data_feats}${_suf}/${dset}/${extra_file}" @@ -633,11 +612,11 @@ if ! "${skip_data_prep}"; then # expand the utt_extra_files for multi-references expand_utt_extra_files="" for extra_file in ${utt_extra_files}; do - # with regex to suuport multi-references + # with regex to support multi-references for single_file in $(ls data/"${dset}"/${extra_file}*); do cp ${single_file} "${data_feats}${_suf}/${dset}" expand_utt_extra_files="${expand_utt_extra_files} $(basename ${single_file})" - done + done done utils/fix_data_dir.sh --utt_extra_files "${expand_utt_extra_files}*" "${data_feats}${_suf}/${dset}" for extra_file in ${expand_utt_extra_files}; do @@ -716,20 +695,23 @@ if ! "${skip_data_prep}"; then fi # Remove empty text - <"${data_feats}/org/${dset}/text" \ - awk ' { if( NF != 1 ) print $0; } ' >"${data_feats}/${dset}/text" + for utt_extra_file in ${utt_extra_files}; do + <"${data_feats}/org/${dset}/${utt_extra_file}" \ + awk ' { if( NF != 1 ) print $0; } ' > "${data_feats}/${dset}/${utt_extra_file}" + done # fix_data_dir.sh leaves only utts which exist in all files utils/fix_data_dir.sh --utt_extra_files "${utt_extra_files}" "${data_feats}/${dset}" for utt_extra_file in ${utt_extra_files}; do python pyscripts/utils/remove_duplicate_keys.py ${data_feats}/${dset}/${utt_extra_file} \ - > ${data_feats}/${dset}/${utt_extra_file}.tmp + > ${data_feats}/${dset}/${utt_extra_file}.tmp mv ${data_feats}/${dset}/${utt_extra_file}.tmp ${data_feats}/${dset}/${utt_extra_file} - done + done done # shellcheck disable=SC2002 - cat ${lm_train_text} | awk ' { if( NF != 1 ) print $0; } ' > "${data_feats}/lm_train.txt" + cat ${lm_train_text} | awk ' { if( NF != 1 ) print $0; } ' \ + > "${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt" fi if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then @@ -802,10 +784,10 @@ if ! "${skip_data_prep}"; then # Create word-list for word-LM training if ${use_word_lm} && [ "${tgt_token_type}" != word ]; then - log "Generate word level token_list from ${data_feats}/lm_train.txt" + log "Generate word level token_list from ${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt" ${python} -m espnet2.bin.tokenize_text \ --token_type word \ - --input "${data_feats}/lm_train.txt" --output "${lm_token_list}" \ + --input "${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt" --output "${lm_token_list}" \ --field 2- \ --cleaner "${cleaner}" \ --g2p "${g2p}" \ @@ -891,7 +873,7 @@ fi if ! "${skip_train}"; then if "${use_lm}"; then if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then - log "Stage 6: LM collect stats: train_set=${data_feats}/lm_train.txt, dev_set=${lm_dev_text}" + log "Stage 6: LM collect stats: train_set=${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt, dev_set=${lm_dev_text}" _opts= if [ -n "${lm_config}" ]; then @@ -904,9 +886,9 @@ if ! "${skip_train}"; then _logdir="${lm_stats_dir}/logdir" mkdir -p "${_logdir}" # Get the minimum number among ${nj} and the number lines of input files - _nj=$(min "${nj}" "$(<${data_feats}/lm_train.txt wc -l)" "$(<${lm_dev_text} wc -l)") + _nj=$(min "${nj}" "$(<${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt wc -l)" "$(<${lm_dev_text} wc -l)") - key_file="${data_feats}/lm_train.txt" + key_file="${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt" split_scps="" for n in $(seq ${_nj}); do split_scps+=" ${_logdir}/train.${n}.scp" @@ -930,7 +912,7 @@ if ! "${skip_train}"; then log "LM collect-stats started... log: '${_logdir}/stats.*.log'" # NOTE: --*_shape_file doesn't require length information if --batch_type=unsorted, # but it's used only for deciding the sample ids. - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m espnet2.bin.lm_train \ --collect_stats true \ @@ -941,12 +923,12 @@ if ! "${skip_train}"; then --non_linguistic_symbols "${nlsyms_txt}" \ --cleaner "${cleaner}" \ --g2p "${g2p}" \ - --train_data_path_and_name_and_type "${data_feats}/lm_train.txt,text,text" \ + --train_data_path_and_name_and_type "${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt,text,text" \ --valid_data_path_and_name_and_type "${lm_dev_text},text,text" \ --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/dev.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${lm_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${lm_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -968,7 +950,7 @@ if ! "${skip_train}"; then if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then - log "Stage 7: LM Training: train_set=${data_feats}/lm_train.txt, dev_set=${lm_dev_text}" + log "Stage 7: LM Training: train_set=${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt, dev_set=${lm_dev_text}" _opts= if [ -n "${lm_config}" ]; then @@ -986,7 +968,7 @@ if ! "${skip_train}"; then if [ ! -f "${_split_dir}/.done" ]; then rm -f "${_split_dir}/.done" ${python} -m espnet2.bin.split_scps \ - --scps "${data_feats}/lm_train.txt" "${lm_stats_dir}/train/text_shape.${lm_token_type}" \ + --scps "${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt" "${lm_stats_dir}/train/text_shape.${lm_token_type}" \ --num_splits "${num_splits_lm}" \ --output_dir "${_split_dir}" touch "${_split_dir}/.done" @@ -994,12 +976,12 @@ if ! "${skip_train}"; then log "${_split_dir}/.done exists. Spliting is skipped" fi - _opts+="--train_data_path_and_name_and_type ${_split_dir}/lm_train.txt,text,text " + _opts+="--train_data_path_and_name_and_type ${_split_dir}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt,text,text " _opts+="--train_shape_file ${_split_dir}/text_shape.${lm_token_type} " _opts+="--multiple_iterator true " else - _opts+="--train_data_path_and_name_and_type ${data_feats}/lm_train.txt,text,text " + _opts+="--train_data_path_and_name_and_type ${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt,text,text " _opts+="--train_shape_file ${lm_stats_dir}/train/text_shape.${lm_token_type} " fi @@ -1072,9 +1054,9 @@ if ! "${skip_train}"; then fi if [ ${stage} -le 9 ] && [ ${stop_stage} -ge 9 ]; then if "${use_ngram}"; then - log "Stage 9: Ngram Training: train_set=${data_feats}/lm_train.txt" - cut -f 2 -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa - build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin + log "Stage 9: Ngram Training: train_set=${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt" + cut -f 2 -d " " ${data_feats}/lm_train.${src_lang}.${tgt_case}.${tgt_lang}.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa + build_binary -s ${ngram_exp}/${ngram_num}gram.arpa ${ngram_exp}/${ngram_num}gram.bin else log "Stage 9: Skip ngram stages: use_ngram=${use_ngram}" fi @@ -1412,7 +1394,11 @@ if ! "${skip_eval}"; then key_file=${_data}/${_scp} split_scps="" _nj=$(min "${inference_nj}" "$(<${key_file} wc -l)") - st_inference_tool="espnet2.bin.st_inference" + if "${use_streaming}"; then + st_inference_tool="espnet2.bin.st_inference_streaming" + else + st_inference_tool="espnet2.bin.st_inference" + fi for n in $(seq "${_nj}"); do split_scps+=" ${_logdir}/keys.${n}.scp" @@ -1422,7 +1408,7 @@ if ! "${skip_eval}"; then # 2. Submit decoding jobs log "Decoding started... log: '${_logdir}/st_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/st_inference.JOB.log \ ${python} -m ${st_inference_tool} \ --batch_size ${batch_size} \ @@ -1432,7 +1418,7 @@ if ! "${skip_eval}"; then --st_train_config "${st_exp}"/config.yaml \ --st_model_file "${st_exp}"/"${inference_st_model}" \ --output_dir "${_logdir}"/output.JOB \ - ${_opts} ${inference_args} + ${_opts} ${inference_args} || { cat $(grep -l -i error "${_logdir}"/st_inference.*.log) ; exit 1; } # 3. Concatenates the output files from each jobs for f in token token_int score text; do @@ -1478,29 +1464,35 @@ if ! "${skip_eval}"; then ) \ <(<"${_data}/utt2spk" awk '{ print "(" $2 "-" $1 ")" }') \ >"${_scoredir}/hyp.trn.org" - + # remove utterance id - perl -pe 's/\([^\)]+\)//g;' "${_scoredir}/ref.trn.org" > "${_scoredir}/ref.trn" - perl -pe 's/\([^\)]+\)//g;' "${_scoredir}/hyp.trn.org" > "${_scoredir}/hyp.trn" + perl -pe 's/\([^\)]+\)$//g;' "${_scoredir}/ref.trn.org" > "${_scoredir}/ref.trn" + perl -pe 's/\([^\)]+\)$//g;' "${_scoredir}/hyp.trn.org" > "${_scoredir}/hyp.trn" # detokenizer detokenizer.perl -l ${tgt_lang} -q < "${_scoredir}/ref.trn" > "${_scoredir}/ref.trn.detok" detokenizer.perl -l ${tgt_lang} -q < "${_scoredir}/hyp.trn" > "${_scoredir}/hyp.trn.detok" + # rotate result files + if [ ${tgt_case} = "tc" ]; then + pyscripts/utils/rotate_logfile.py ${_scoredir}/result.tc.txt + fi + pyscripts/utils/rotate_logfile.py ${_scoredir}/result.lc.txt + if [ ${tgt_case} = "tc" ]; then - echo "Case sensitive BLEU result (single-reference)" >> ${_scoredir}/result.tc.txt + echo "Case sensitive BLEU result (single-reference)" > ${_scoredir}/result.tc.txt sacrebleu "${_scoredir}/ref.trn.detok" \ -i "${_scoredir}/hyp.trn.detok" \ -m bleu chrf ter \ >> ${_scoredir}/result.tc.txt - + log "Write a case-sensitive BLEU (single-reference) result in ${_scoredir}/result.tc.txt" fi # detokenize & remove punctuation except apostrophe remove_punctuation.pl < "${_scoredir}/ref.trn.detok" > "${_scoredir}/ref.trn.detok.lc.rm" remove_punctuation.pl < "${_scoredir}/hyp.trn.detok" > "${_scoredir}/hyp.trn.detok.lc.rm" - echo "Case insensitive BLEU result (single-reference)" >> ${_scoredir}/result.lc.txt + echo "Case insensitive BLEU result (single-reference)" > ${_scoredir}/result.lc.txt sacrebleu -lc "${_scoredir}/ref.trn.detok.lc.rm" \ -i "${_scoredir}/hyp.trn.detok.lc.rm" \ -m bleu chrf ter \ @@ -1525,9 +1517,9 @@ if ! "${skip_eval}"; then ) \ <(<"${_data}/utt2spk" awk '{ print "(" $2 "-" $1 ")" }') \ >"${_scoredir}/ref.trn.org.${ref_idx}" - - # - perl -pe 's/\([^\)]+\)//g;' "${_scoredir}/ref.trn.org.${ref_idx}" > "${_scoredir}/ref.trn.${ref_idx}" + + # remove utterance id + perl -pe 's/\([^\)]+\)$//g;' "${_scoredir}/ref.trn.org.${ref_idx}" > "${_scoredir}/ref.trn.${ref_idx}" detokenizer.perl -l ${tgt_lang} -q < "${_scoredir}/ref.trn.${ref_idx}" > "${_scoredir}/ref.trn.detok.${ref_idx}" remove_punctuation.pl < "${_scoredir}/ref.trn.detok.${ref_idx}" > "${_scoredir}/ref.trn.detok.lc.rm.${ref_idx}" case_sensitive_refs="${case_sensitive_refs} ${_scoredir}/ref.trn.detok.${ref_idx}" @@ -1552,7 +1544,7 @@ if ! "${skip_eval}"; then # Show results in Markdown syntax scripts/utils/show_translation_result.sh --case $tgt_case "${st_exp}" > "${st_exp}"/RESULTS.md - cat "${cat_exp}"/RESULTS.md + cat "${st_exp}"/RESULTS.md fi else log "Skip the evaluation stages" @@ -1641,7 +1633,7 @@ EOF # shellcheck disable=SC2086 espnet_model_zoo_upload \ --file "${packed_model}" \ - --title "ESPnet2 pretrained model, ${_model_name}, fs=${fs}, lang=${lang}" \ + --title "ESPnet2 pretrained model, ${_model_name}, fs=${fs}, lang=${src_lang}_${tgt_lang}" \ --description_file "${st_exp}"/description \ --creator_name "${_creator_name}" \ --license "CC-BY-4.0" \ @@ -1662,11 +1654,11 @@ if ! "${skip_upload_hf}"; then gitlfs=$(git lfs --version 2> /dev/null || true) [ -z "${gitlfs}" ] && \ log "ERROR: You need to install git-lfs first" && \ - exit 1 - + exit 1 + dir_repo=${expdir}/hf_${hf_repo//"/"/"_"} [ ! -d "${dir_repo}" ] && git clone https://huggingface.co/${hf_repo} ${dir_repo} - + if command -v git &> /dev/null; then _creator_name="$(git config user.name)" _checkout="git checkout $(git show -s --format=%H)" @@ -1679,13 +1671,13 @@ if ! "${skip_upload_hf}"; then # foo/asr1 -> foo _corpus="${_task%/*}" _model_name="${_creator_name}/${_corpus}_$(basename ${packed_model} .zip)" - + # copy files in ${dir_repo} unzip -o ${packed_model} -d ${dir_repo} # Generate description file # shellcheck disable=SC2034 hf_task=speech-translation - # shellcheck disable=SC2034 + # shellcheck disable=SC2034 espnet_task=ST # shellcheck disable=SC2034 task_exp=${st_exp} diff --git a/egs2/TEMPLATE/tts1/README.md b/egs2/TEMPLATE/tts1/README.md index a94a6cd5913..f7d3258f497 100644 --- a/egs2/TEMPLATE/tts1/README.md +++ b/egs2/TEMPLATE/tts1/README.md @@ -726,6 +726,7 @@ You can train the following models by changing `*.yaml` config for `--train_conf - [FastSpeech2](https://arxiv.org/abs/2006.04558) ([FastPitch](https://arxiv.org/abs/2006.06873)) - [Conformer](https://arxiv.org/abs/2005.08100)-based FastSpeech / FastSpeech2 - [VITS](https://arxiv.org/abs/2106.06103) +- [JETS](https://arxiv.org/abs/2203.16852) You can find example configs of the above models in [`egs2/ljspeech/tts1/conf/tuning`](../../ljspeech/tts1/conf/tuning). @@ -742,6 +743,11 @@ You can find example configs of the above models in: - [`egs2/vctk/tts1/conf/tuning`](../../vctk/tts1/conf/tuning). - [`egs2/libritts/tts1/conf/tuning`](../../vctk/libritts/conf/tuning). +And now we support other toolkit's xvector. +Please check the following options. + +https://github.com/espnet/espnet/blob/df053b8c13c26fe289fc882751801fd781e9d43e/egs2/TEMPLATE/tts1/tts.sh#L69-L71 + ## FAQ ### ESPnet1 model is compatible with ESPnet2? diff --git a/egs2/TEMPLATE/tts1/tts.sh b/egs2/TEMPLATE/tts1/tts.sh index 0bd2e0debb8..13a3aaf2d5d 100755 --- a/egs2/TEMPLATE/tts1/tts.sh +++ b/egs2/TEMPLATE/tts1/tts.sh @@ -644,7 +644,7 @@ if ! "${skip_train}"; then # 3. Submit jobs log "TTS collect_stats started... log: '${_logdir}/stats.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${train_cmd} JOB=1:"${_nj}" "${_logdir}"/stats.JOB.log \ ${python} -m "espnet2.bin.${tts_task}_train" \ --collect_stats true \ @@ -665,7 +665,7 @@ if ! "${skip_train}"; then --train_shape_file "${_logdir}/train.JOB.scp" \ --valid_shape_file "${_logdir}/valid.JOB.scp" \ --output_dir "${_logdir}/stats.JOB" \ - ${_opts} ${train_args} || { cat "${_logdir}"/stats.1.log; exit 1; } + ${_opts} ${train_args} || { cat $(grep -l -i error "${_logdir}"/stats.*.log) ; exit 1; } # 4. Aggregate shape files _opts= @@ -1008,7 +1008,7 @@ if ! "${skip_eval}"; then # 3. Submit decoding jobs log "Decoding started... log: '${_logdir}/tts_inference.*.log'" - # shellcheck disable=SC2086 + # shellcheck disable=SC2046,SC2086 ${_cmd} --gpu "${_ngpu}" JOB=1:"${_nj}" "${_logdir}"/tts_inference.JOB.log \ ${python} -m espnet2.bin.tts_inference \ --ngpu "${_ngpu}" \ @@ -1019,7 +1019,7 @@ if ! "${skip_eval}"; then --train_config "${tts_exp}"/config.yaml \ --output_dir "${_logdir}"/output.JOB \ --vocoder_file "${vocoder_file}" \ - ${_opts} ${_ex_opts} ${inference_args} + ${_opts} ${_ex_opts} ${inference_args} || { cat $(grep -l -i error "${_logdir}"/tts_inference.*.log) ; exit 1; } # 4. Concatenates the output files from each jobs if [ -e "${_logdir}/output.${_nj}/norm" ]; then diff --git a/egs2/accented_french_openslr57/asr1/local/remove_missing.py b/egs2/accented_french_openslr57/asr1/local/remove_missing.py index 937144f75d8..1469b4a55bb 100644 --- a/egs2/accented_french_openslr57/asr1/local/remove_missing.py +++ b/egs2/accented_french_openslr57/asr1/local/remove_missing.py @@ -4,7 +4,6 @@ import argparse import os - parser = argparse.ArgumentParser(description="Normalize test text.") parser.add_argument("--folder", type=str, help="path of download folder") parser.add_argument("--train", type=str, help="path of train folder") diff --git a/egs2/aishell3/tts1/local/data_prep.py b/egs2/aishell3/tts1/local/data_prep.py index 706c28d5642..679232b9f3e 100644 --- a/egs2/aishell3/tts1/local/data_prep.py +++ b/egs2/aishell3/tts1/local/data_prep.py @@ -1,5 +1,6 @@ import argparse import os + from espnet2.utils.types import str2bool SPK_LABEL_LEN = 7 diff --git a/egs2/aishell4/enh1/local/generate_fe_trainingdata.py.patch b/egs2/aishell4/enh1/local/generate_fe_trainingdata.py.patch index a7666a5a756..9a23ef72207 100644 --- a/egs2/aishell4/enh1/local/generate_fe_trainingdata.py.patch +++ b/egs2/aishell4/enh1/local/generate_fe_trainingdata.py.patch @@ -2,9 +2,9 @@ +++ generate_fe_trainingdata.new.py @@ -1,8 +1,8 @@ #!/usr/bin/env python - + -import io -+from distutils.version import LooseVersion ++from packaging.version import parse as V import os -import subprocess +import sys @@ -14,17 +14,17 @@ @@ -12,6 +12,10 @@ import librosa import argparse - + + -+is_py_3_3_plus = LooseVersion(sys.version) > LooseVersion("3.3") ++is_py_3_3_plus = V("{}.{}.{}".format(*sys.version_info[:3])) > V("3.3") + + def get_line_context(file_path, line_number): return linecache.getline(file_path, line_number).strip() - + @@ -119,7 +123,7 @@ return data / max_val - + def add_noise(clean, noise, rir, snr): - random.seed(time.clock()) + random.seed(time.perf_counter() if is_py_3_3_plus else time.clock()) @@ -32,9 +32,9 @@ noise = add_reverb(noise, rir[:, 16:24]) noise = noise[:-7999] @@ -189,7 +193,7 @@ - + for i in range(args.wavnum): - + - random.seed(time.clock()) + random.seed(time.perf_counter() if is_py_3_3_plus else time.clock()) wav1idx = random.randint(0, len(open(wavlist1,'r').readlines())-1) diff --git a/egs2/aishell4/enh1/local/prepare_audioset_category_list.py b/egs2/aishell4/enh1/local/prepare_audioset_category_list.py index 2c9a09bb0c6..af591399f3f 100644 --- a/egs2/aishell4/enh1/local/prepare_audioset_category_list.py +++ b/egs2/aishell4/enh1/local/prepare_audioset_category_list.py @@ -2,9 +2,9 @@ # Copyright 2022 Shanghai Jiao Tong University (Author: Wangyou Zhang) # Apache 2.0 -from pathlib import Path import re import sys +from pathlib import Path def prepare_audioset_category(audio_list, audioset_dir, output_file, skip_csv_rows=3): diff --git a/egs2/aishell4/enh1/local/split_train_dev.py b/egs2/aishell4/enh1/local/split_train_dev.py index 8961c40b12d..e7e7d75e239 100755 --- a/egs2/aishell4/enh1/local/split_train_dev.py +++ b/egs2/aishell4/enh1/local/split_train_dev.py @@ -2,14 +2,12 @@ # Copyright 2022 Shanghai Jiao Tong University (Authors: Wangyou Zhang) # Apache 2.0 -from collections import Counter -from collections import defaultdict -from fractions import Fraction import math -from pathlib import Path import random -from typing import List -from typing import Tuple +from collections import Counter, defaultdict +from fractions import Fraction +from pathlib import Path +from typing import List, Tuple def int_or_float_or_numstr(value): diff --git a/egs2/aishell4/enh1/local/split_train_dev_by_column.py b/egs2/aishell4/enh1/local/split_train_dev_by_column.py index ff50a9407a7..de48ce73b33 100755 --- a/egs2/aishell4/enh1/local/split_train_dev_by_column.py +++ b/egs2/aishell4/enh1/local/split_train_dev_by_column.py @@ -3,13 +3,11 @@ # Copyright 2022 Shanghai Jiao Tong University (Authors: Wangyou Zhang) # Apache 2.0 import argparse +import random from collections import defaultdict from pathlib import Path -import random -from split_train_dev import int_or_float_or_numstr -from split_train_dev import split_train_dev -from split_train_dev import split_train_dev_v2 +from split_train_dev import int_or_float_or_numstr, split_train_dev, split_train_dev_v2 def get_parser(): diff --git a/egs2/aishell4/enh1/local/split_train_dev_by_prefix.py b/egs2/aishell4/enh1/local/split_train_dev_by_prefix.py index c04cfb1a584..c997d9774a8 100755 --- a/egs2/aishell4/enh1/local/split_train_dev_by_prefix.py +++ b/egs2/aishell4/enh1/local/split_train_dev_by_prefix.py @@ -3,13 +3,11 @@ # Copyright 2022 Shanghai Jiao Tong University (Authors: Wangyou Zhang) # Apache 2.0 import argparse +import random from collections import defaultdict from pathlib import Path -import random -from split_train_dev import int_or_float_or_numstr -from split_train_dev import split_train_dev -from split_train_dev import split_train_dev_v2 +from split_train_dev import int_or_float_or_numstr, split_train_dev, split_train_dev_v2 def get_parser(): diff --git a/egs2/bn_openslr53/asr1/local/data_prep.py b/egs2/bn_openslr53/asr1/local/data_prep.py index 4cb5a47596b..5d831435277 100644 --- a/egs2/bn_openslr53/asr1/local/data_prep.py +++ b/egs2/bn_openslr53/asr1/local/data_prep.py @@ -8,7 +8,6 @@ import os import random - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("-d", help="downloads directory", type=str, default="downloads") diff --git a/egs2/bur_openslr80/asr1/local/data_prep.py b/egs2/bur_openslr80/asr1/local/data_prep.py index 98180ea4b2e..654779696aa 100644 --- a/egs2/bur_openslr80/asr1/local/data_prep.py +++ b/egs2/bur_openslr80/asr1/local/data_prep.py @@ -8,7 +8,6 @@ import os import random - if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("-d", help="downloads directory", type=str, default="downloads") diff --git a/egs2/catslu/asr1/local/data_prep.py b/egs2/catslu/asr1/local/data_prep.py index 2ce83727a07..55bf6d2d979 100755 --- a/egs2/catslu/asr1/local/data_prep.py +++ b/egs2/catslu/asr1/local/data_prep.py @@ -4,11 +4,11 @@ # 2021 Carnegie Mellon University # Apache 2.0 +import json import os +import string as string_lib import sys from pathlib import Path -import json -import string as string_lib if len(sys.argv) != 2: print("Usage: python data_prep.py [catslu_root]") diff --git a/egs2/chime4/asr1/conf/tuning/train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k.yaml b/egs2/chime4/asr1/conf/tuning/train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k.yaml new file mode 100644 index 00000000000..cee2e0c896d --- /dev/null +++ b/egs2/chime4/asr1/conf/tuning/train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k.yaml @@ -0,0 +1,90 @@ +# minibatch related +batch_type: folded +batch_size: 32 +accum_grad: 1 +grad_clip: 5 +max_epoch: 50 +patience: none +# The initialization method for model parameters +init: xavier_uniform +val_scheduler_criterion: +- valid +- loss +best_model_criterion: +- - valid + - acc + - max +keep_nbest_models: 10 +unused_parameters: true +# SSL-based frontend is fixed during training for training efficiency, +# however, the gradients are backprogated through frontend to the enhancement. +freeze_param: [ + "frontend.upstream" +] + +# network architecture +frontend: s3prl +frontend_conf: + frontend_conf: + upstream: wavlm_large # Note: If the upstream is changed, please change the input_size in the preencoder. + download_dir: ./hub + multilayer_feature: True + +preencoder: linear +preencoder_conf: + input_size: 1024 # Note: If the upstream is changed, please change this value accordingly. + output_size: 128 + +# encoder related +encoder: transformer +encoder_conf: + output_size: 256 + attention_heads: 4 + linear_units: 2048 + num_blocks: 12 + dropout_rate: 0.1 + attention_dropout_rate: 0.0 + input_layer: conv2d2 + normalize_before: true + +# decoder related +decoder: transformer +decoder_conf: + input_layer: embed + attention_heads: 4 + linear_units: 2048 + num_blocks: 6 + dropout_rate: 0.1 + positional_dropout_rate: 0.0 + self_attention_dropout_rate: 0.0 + src_attention_dropout_rate: 0.0 + +model_conf: + ctc_weight: 0.3 + lsm_weight: 0.1 + length_normalized_loss: false + extract_feats_in_collect_stats: false + +optim: adam +optim_conf: + lr: 0.001 +scheduler: warmuplr +scheduler_conf: + warmup_steps: 20000 + +specaug: specaug +specaug_conf: + apply_time_warp: true + time_warp_window: 5 + time_warp_mode: bicubic + apply_freq_mask: true + freq_mask_width_range: + - 0 + - 100 + num_freq_mask: 4 + apply_time_mask: true + time_mask_width_range: + - 0 + - 40 + num_time_mask: 2 + diff --git a/egs2/chime4/asr1/local/sym_channel.py b/egs2/chime4/asr1/local/sym_channel.py index 8a3bdcce2a9..dcffd487c4c 100644 --- a/egs2/chime4/asr1/local/sym_channel.py +++ b/egs2/chime4/asr1/local/sym_channel.py @@ -1,6 +1,6 @@ +import argparse import os from os import path -import argparse def create_sym(data_dir, track, wav): diff --git a/egs2/chime4/enh1/conf/tuning/train_enh_convtasnet_small.yaml b/egs2/chime4/enh1/conf/tuning/train_enh_convtasnet_small.yaml new file mode 100644 index 00000000000..7c73d4c868f --- /dev/null +++ b/egs2/chime4/enh1/conf/tuning/train_enh_convtasnet_small.yaml @@ -0,0 +1,64 @@ +optim: adam +init: xavier_uniform +max_epoch: 100 +batch_type: folded +batch_size: 32 +iterator_type: chunk +chunk_length: 32000 +num_workers: 4 +optim_conf: + lr: 1.0e-03 + eps: 1.0e-08 + weight_decay: 1.0e-05 +patience: 4 +val_scheduler_criterion: +- valid +- loss +best_model_criterion: +- - valid + - si_snr + - max +- - valid + - loss + - min +keep_nbest_models: 1 +scheduler: reducelronplateau +scheduler_conf: + mode: min + factor: 0.5 + patience: 3 +model_conf: + loss_type: si_snr +encoder: conv +encoder_conf: + channel: 256 + kernel_size: 40 + stride: 20 +decoder: conv +decoder_conf: + channel: 256 + kernel_size: 40 + stride: 20 +separator: tcn +separator_conf: + num_spk: 1 + layer: 4 + stack: 2 + bottleneck_dim: 256 + hidden_dim: 512 + kernel: 3 + causal: False + norm_type: "gLN" + nonlinear: relu +criterions: + # The first criterion + - name: si_snr + conf: + eps: 1e-7 + # the wrapper for the current criterion + # for single-talker case, we simplely use fixed_order wrapper + wrapper: + - type: fixed_order + wrapper_conf: + weight: 1.0 + diff --git a/egs2/chime4/enh_asr1/README.md b/egs2/chime4/enh_asr1/README.md new file mode 100644 index 00000000000..f01c087f211 --- /dev/null +++ b/egs2/chime4/enh_asr1/README.md @@ -0,0 +1,97 @@ + +# RESULTS +## Environments +- date: `Thu Apr 28 00:09:17 EDT 2022` +- python version: `3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]` +- espnet version: `espnet 202204` +- pytorch version: `pytorch 1.8.1` +- Git hash: `44971ff962aae30c962226f1ba3d87de057ac00e` + - Commit date: `Wed Apr 27 10:13:03 2022 -0400` + +## enh_asr_train_enh_asr_convtasnet_init_noenhloss_wavlm_transformer_init_lr1e-4_accum1_adam_specaug_bypass0_raw_en_char +- Pretrained model: https://huggingface.co/espnet/simpleoier_chime4_enh_asr_convtasnet_init_noenhloss_wavlm_transformer_init_raw_en_char +### WER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|27119|98.3|1.3|0.4|0.2|1.9|21.8| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_2mics|1640|27119|98.5|1.2|0.3|0.2|1.7|19.6| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|27119|98.6|1.1|0.3|0.2|1.5|18.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|27120|97.2|2.1|0.7|0.3|3.1|28.9| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_2mics|1640|27120|97.9|1.5|0.5|0.2|2.3|25.2| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|27120|98.4|1.2|0.4|0.1|1.7|19.9| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|21409|96.7|2.6|0.7|0.4|3.7|31.6| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_2mics|1320|21409|97.4|2.0|0.6|0.3|2.9|27.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|21409|97.8|1.8|0.4|0.2|2.5|24.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|21416|94.6|3.7|1.6|0.5|5.9|37.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_2mics|1320|21416|96.6|2.5|1.0|0.3|3.7|32.5| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|21416|97.5|1.9|0.7|0.3|2.9|28.9| + +### CER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|160390|99.4|0.2|0.4|0.2|0.8|21.8| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_2mics|1640|160390|99.5|0.2|0.3|0.2|0.7|19.6| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|160390|99.6|0.1|0.3|0.2|0.6|18.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|160400|98.8|0.5|0.7|0.3|1.5|28.9| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_2mics|1640|160400|99.2|0.3|0.5|0.2|1.1|25.2| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|160400|99.5|0.2|0.3|0.1|0.7|19.9| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|126796|98.6|0.6|0.8|0.4|1.8|31.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_2mics|1320|126796|98.9|0.4|0.7|0.3|1.4|27.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|126796|99.1|0.4|0.5|0.2|1.1|24.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|126812|97.0|1.2|1.9|0.6|3.7|37.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_2mics|1320|126812|98.2|0.6|1.1|0.4|2.1|32.5| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|126812|98.8|0.4|0.8|0.3|1.5|28.9| + +### Enhancement + +|dataset|STOI|SDR|SI_SNR| +|---|---|---|---| +|dt05_simu_isolated_1ch_track|0.86|4.97|1.77| +|et05_simu_isolated_1ch_track|0.85|5.45|0.88| + + +## enh_asr_train_enh_asr_convtasnet_fbank_transformer_raw_en_char +- Pretrained model: https://huggingface.co/espnet/simpleoier_chime4_enh_asr_train_enh_asr_convtasnet_fbank_transformer_raw_en_char + +### WER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|27119|91.8|6.0|2.2|0.8|9.0|57.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_2mics|1640|27119|93.0|5.2|1.8|0.6|7.7|53.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|27119|93.9|4.5|1.6|0.5|6.7|49.9| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|27120|89.9|7.6|2.4|1.0|11.1|59.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_2mics|1640|27120|92.2|6.0|1.9|0.7|8.6|55.5| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|27120|93.6|4.9|1.5|0.6|7.1|51.6| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|21409|84.6|11.4|4.0|1.5|17.0|69.4| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_2mics|1320|21409|86.7|9.7|3.5|1.3|14.5|64.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|21409|89.2|7.9|2.9|1.0|11.8|61.2| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|21416|82.8|13.1|4.1|1.9|19.1|69.4| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_2mics|1320|21416|86.0|10.5|3.5|1.5|15.5|67.5| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|21416|88.1|8.9|3.1|1.2|13.1|64.8| + +### CER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track|1640|160390|95.9|1.7|2.3|0.8|4.8|57.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_2mics|1640|160390|96.6|1.4|2.0|0.6|4.0|53.3| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_real_beamformit_5mics|1640|160390|97.1|1.1|1.8|0.5|3.4|49.9| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track|1640|160400|94.7|2.5|2.9|1.0|6.3|59.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_2mics|1640|160400|95.9|1.7|2.3|0.7|4.8|55.5| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/dt05_simu_beamformit_5mics|1640|160400|96.8|1.4|1.9|0.6|3.8|51.6| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_isolated_1ch_track|1320|126796|91.5|3.8|4.6|1.6|10.0|69.4| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_2mics|1320|126796|92.8|3.2|4.0|1.2|8.4|64.7| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_real_beamformit_5mics|1320|126796|94.3|2.4|3.3|1.0|6.6|61.2| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track|1320|126812|90.3|4.8|4.9|2.2|11.9|69.4| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_2mics|1320|126812|92.2|3.5|4.2|1.7|9.5|67.5| +|decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.ave/et05_simu_beamformit_5mics|1320|126812|93.7|2.7|3.5|1.4|7.7|64.8| + +### Enhancement + +|dataset|STOI|SDR|SI_SNR| +|---|---|---|---| +|dt05_simu_isolated_1ch_track|0.87|7.14|4.51| +|et05_simu_isolated_1ch_track|0.85|7.47|3.02| diff --git a/egs2/clarity21/enh_2021/cmd.sh b/egs2/chime4/enh_asr1/cmd.sh similarity index 100% rename from egs2/clarity21/enh_2021/cmd.sh rename to egs2/chime4/enh_asr1/cmd.sh diff --git a/egs2/chime4/enh_asr1/conf/chime4.cfg b/egs2/chime4/enh_asr1/conf/chime4.cfg new file mode 120000 index 00000000000..5b3477ab5c6 --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/chime4.cfg @@ -0,0 +1 @@ +../../asr1/conf/chime4.cfg \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/conf/decode_asr_transformer.yaml b/egs2/chime4/enh_asr1/conf/decode_asr_transformer.yaml new file mode 100644 index 00000000000..8e7518150a7 --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/decode_asr_transformer.yaml @@ -0,0 +1,7 @@ +batch_size: 0 +beam-size: 10 +penalty: 0.0 +maxlenratio: 0.0 +minlenratio: 0.0 +ctc-weight: 0.3 +lm-weight: 1.0 diff --git a/egs2/chime4/enh_asr1/conf/fbank.conf b/egs2/chime4/enh_asr1/conf/fbank.conf new file mode 100644 index 00000000000..82ac7bd0dbc --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/fbank.conf @@ -0,0 +1,2 @@ +--sample-frequency=16000 +--num-mel-bins=80 diff --git a/egs2/clarity21/enh_2021/conf/pbs.conf b/egs2/chime4/enh_asr1/conf/pbs.conf similarity index 100% rename from egs2/clarity21/enh_2021/conf/pbs.conf rename to egs2/chime4/enh_asr1/conf/pbs.conf diff --git a/egs2/chime4/enh_asr1/conf/pitch.conf b/egs2/chime4/enh_asr1/conf/pitch.conf new file mode 100644 index 00000000000..e959a19d5b8 --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/pitch.conf @@ -0,0 +1 @@ +--sample-frequency=16000 diff --git a/egs2/clarity21/enh_2021/conf/queue.conf b/egs2/chime4/enh_asr1/conf/queue.conf similarity index 100% rename from egs2/clarity21/enh_2021/conf/queue.conf rename to egs2/chime4/enh_asr1/conf/queue.conf diff --git a/egs2/clarity21/enh_2021/conf/slurm.conf b/egs2/chime4/enh_asr1/conf/slurm.conf similarity index 100% rename from egs2/clarity21/enh_2021/conf/slurm.conf rename to egs2/chime4/enh_asr1/conf/slurm.conf diff --git a/egs2/chime4/enh_asr1/conf/train_enh_asr_convtasnet_fbank_transformer.yaml b/egs2/chime4/enh_asr1/conf/train_enh_asr_convtasnet_fbank_transformer.yaml new file mode 120000 index 00000000000..920b436ba58 --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/train_enh_asr_convtasnet_fbank_transformer.yaml @@ -0,0 +1 @@ +tuning/train_enh_asr_convtasnet_si_snr_fbank_transformer_lr2e-3_accum2_warmup20k_specaug.yaml \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/conf/train_lm_transformer.yaml b/egs2/chime4/enh_asr1/conf/train_lm_transformer.yaml new file mode 100644 index 00000000000..a502a55381a --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/train_lm_transformer.yaml @@ -0,0 +1,48 @@ +# network architecture +# encoder related +encoder: transformer +encoder_conf: + input_layer: conv2d + num_blocks: 12 + linear_units: 2048 + dropout_rate: 0.1 + output_size: 256 + attention_heads: 4 + attention_dropout_rate: 0.0 + +# decoder related +decoder: transformer +decoder_conf: + input_layer: embed + num_blocks: 6 + linear_units: 2048 + dropout_rate: 0.1 + +# hybrid CTC/attention +model_conf: + ctc_weight: 0.3 + lsm_weight: 0.1 + length_normalized_loss: false + +# optimization related +optim: adam +accum_grad: 2 +grad_clip: 5 +patience: 10 +max_epoch: 100 +optim_conf: + lr: 0.005 +scheduler: warmuplr +scheduler_conf: + warmup_steps: 20000 + +# minibatch related +batch_type: folded +batch_size: 32 + +# criterion +best_model_criterion: +- - valid + - acc + - max +keep_nbest_models: 10 \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_init_noenhloss_wavlm_transformer_init_lr1e-4_accum1_adam_specaug_bypass0.yaml b/egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_init_noenhloss_wavlm_transformer_init_lr1e-4_accum1_adam_specaug_bypass0.yaml new file mode 100644 index 00000000000..1eb24dd8134 --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_init_noenhloss_wavlm_transformer_init_lr1e-4_accum1_adam_specaug_bypass0.yaml @@ -0,0 +1,124 @@ +# minibatch related +batch_type: folded +batch_size: 16 # A6000 x 1 +accum_grad: 1 +grad_clip: 5 +max_epoch: 12 +patience: 10 +# The initialization method for model parameters +init: xavier_uniform +val_scheduler_criterion: +- valid +- loss +best_model_criterion: +- - valid + - acc + - max +- - train + - loss + - min +keep_nbest_models: 10 +num_att_plot: 3 +unused_parameters: true +freeze_param: [ + "s2t_model.frontend.upstream", +] +init_param: [ + "../enh1/exp/enh_train_enh_convtasnet_small_raw/valid.loss.ave_1best.pth:encoder:enh_model.encoder", + "../enh1/exp/enh_train_enh_convtasnet_small_raw/valid.loss.ave_1best.pth:separator:enh_model.separator", + "../enh1/exp/enh_train_enh_convtasnet_small_raw/valid.loss.ave_1best.pth:decoder:enh_model.decoder", + "../asr1/exp/asr_train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k_raw_en_char/valid.acc.ave.pth:frontend:s2t_model.frontend", + "../asr1/exp/asr_train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k_raw_en_char/valid.acc.ave.pth:preencoder:s2t_model.preencoder", + "../asr1/exp/asr_train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k_raw_en_char/valid.acc.ave.pth:encoder:s2t_model.encoder", + "../asr1/exp/asr_train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k_raw_en_char/valid.acc.ave.pth:ctc:s2t_model.ctc", + "../asr1/exp/asr_train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k_raw_en_char/valid.acc.ave.pth:decoder:s2t_model.decoder", +] + +# network architecture +enh_encoder: conv +enh_encoder_conf: + channel: 256 + kernel_size: 40 + stride: 20 +enh_decoder: conv +enh_decoder_conf: + channel: 256 + kernel_size: 40 + stride: 20 +enh_separator: tcn +enh_separator_conf: + num_spk: 1 + layer: 4 + stack: 2 + bottleneck_dim: 256 + hidden_dim: 512 + kernel: 3 + causal: False + norm_type: "gLN" + nonlinear: relu + +frontend: s3prl +frontend_conf: + frontend_conf: + upstream: wavlm_large # Note: If the upstream is changed, please change the input_size in the preencoder. + download_dir: ./hub + multilayer_feature: true + +asr_preencoder: linear +asr_preencoder_conf: + input_size: 1024 # Note: If the upstream is changed, please change this value accordingly. + output_size: 128 + +# encoder related +asr_encoder: transformer +asr_encoder_conf: + output_size: 256 + attention_heads: 4 + linear_units: 2048 + num_blocks: 12 + dropout_rate: 0.1 + attention_dropout_rate: 0.0 + input_layer: conv2d2 + normalize_before: true + +# decoder related +asr_decoder: transformer +asr_decoder_conf: + input_layer: embed + attention_heads: 4 + linear_units: 2048 + num_blocks: 6 + dropout_rate: 0.1 + positional_dropout_rate: 0.0 + self_attention_dropout_rate: 0.0 + src_attention_dropout_rate: 0.0 + +asr_model_conf: + ctc_weight: 0.3 + lsm_weight: 0.1 + length_normalized_loss: false + extract_feats_in_collect_stats: false + +model_conf: + calc_enh_loss: false + bypass_enh_prob: 0.0 + +optim: adam +optim_conf: + lr: 0.0001 + +specaug: specaug +specaug_conf: + apply_time_warp: true + time_warp_window: 5 + time_warp_mode: bicubic + apply_freq_mask: true + freq_mask_width_range: + - 0 + - 100 + num_freq_mask: 4 + apply_time_mask: true + time_mask_width_range: + - 0 + - 40 + num_time_mask: 2 diff --git a/egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_si_snr_fbank_transformer_lr2e-3_accum2_warmup20k_specaug.yaml b/egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_si_snr_fbank_transformer_lr2e-3_accum2_warmup20k_specaug.yaml new file mode 100644 index 00000000000..8e30e5edecb --- /dev/null +++ b/egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_convtasnet_si_snr_fbank_transformer_lr2e-3_accum2_warmup20k_specaug.yaml @@ -0,0 +1,119 @@ +# minibatch related +batch_type: folded +batch_size: 16 # A6000 x 1 +accum_grad: 2 +grad_clip: 5 +max_epoch: 50 +patience: 10 +# The initialization method for model parameters +init: xavier_uniform +val_scheduler_criterion: +- valid +- loss +best_model_criterion: +- - valid + - acc + - max +- - train + - loss + - min +keep_nbest_models: 10 +num_att_plot: 0 + +# network architecture +enh_encoder: conv +enh_encoder_conf: + channel: 256 + kernel_size: 40 + stride: 20 +enh_decoder: conv +enh_decoder_conf: + channel: 256 + kernel_size: 40 + stride: 20 +enh_separator: tcn +enh_separator_conf: + num_spk: 1 + layer: 4 + stack: 2 + bottleneck_dim: 256 + hidden_dim: 512 + kernel: 3 + causal: False + norm_type: "gLN" + nonlinear: relu +enh_criterions: + # The first criterion + - name: si_snr + conf: + eps: 1e-7 + # the wrapper for the current criterion + # for single-talker case, we simplely use fixed_order wrapper + wrapper: fixed_order + wrapper_conf: + weight: 1.0 + +frontend: default +frontend_conf: + fs: 16000 + n_fft: 512 + win_length: 400 + hop_length: 160 + frontend_conf: null + apply_stft: True + +# encoder related +asr_encoder: transformer +asr_encoder_conf: + output_size: 256 + attention_heads: 4 + linear_units: 2048 + num_blocks: 12 + dropout_rate: 0.1 + attention_dropout_rate: 0.0 + input_layer: conv2d + normalize_before: true + +# decoder related +asr_decoder: transformer +asr_decoder_conf: + input_layer: embed + attention_heads: 4 + linear_units: 2048 + num_blocks: 6 + dropout_rate: 0.1 + positional_dropout_rate: 0.0 + self_attention_dropout_rate: 0.0 + src_attention_dropout_rate: 0.0 + +asr_model_conf: + ctc_weight: 0.3 + lsm_weight: 0.1 + length_normalized_loss: false + extract_feats_in_collect_stats: false + +model_conf: + bypass_enh_prob: 0.0 + +optim: adam +optim_conf: + lr: 0.002 +scheduler: warmuplr +scheduler_conf: + warmup_steps: 20000 + +specaug: specaug +specaug_conf: + apply_time_warp: true + time_warp_window: 5 + time_warp_mode: bicubic + apply_freq_mask: true + freq_mask_width_range: + - 0 + - 30 + num_freq_mask: 2 + apply_time_mask: true + time_mask_width_range: + - 0 + - 40 + num_time_mask: 2 \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/db.sh b/egs2/chime4/enh_asr1/db.sh new file mode 120000 index 00000000000..3090b1bc350 --- /dev/null +++ b/egs2/chime4/enh_asr1/db.sh @@ -0,0 +1 @@ +../../TEMPLATE/enh_asr1/db.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/enh_asr.sh b/egs2/chime4/enh_asr1/enh_asr.sh new file mode 120000 index 00000000000..b00d9b13ef7 --- /dev/null +++ b/egs2/chime4/enh_asr1/enh_asr.sh @@ -0,0 +1 @@ +../../TEMPLATE/enh_asr1/enh_asr.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/CHiME3_simulate_data_patched_parallel.m b/egs2/chime4/enh_asr1/local/CHiME3_simulate_data_patched_parallel.m new file mode 120000 index 00000000000..8f939c2e007 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/CHiME3_simulate_data_patched_parallel.m @@ -0,0 +1 @@ +../../enh1/local/CHiME3_simulate_data_patched_parallel.m \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/bth_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/bth_chime4_data_prep.sh new file mode 120000 index 00000000000..f94db52c974 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/bth_chime4_data_prep.sh @@ -0,0 +1 @@ +../../asr1/local/bth_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/chime4_asr_data.sh b/egs2/chime4/enh_asr1/local/chime4_asr_data.sh new file mode 120000 index 00000000000..58fbb0a9212 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/chime4_asr_data.sh @@ -0,0 +1 @@ +../../asr1/local/data.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/chime4_enh_data.sh b/egs2/chime4/enh_asr1/local/chime4_enh_data.sh new file mode 120000 index 00000000000..d30a4dc12a7 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/chime4_enh_data.sh @@ -0,0 +1 @@ +../../enh1/local/data.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/clean_chime4_format_data.sh b/egs2/chime4/enh_asr1/local/clean_chime4_format_data.sh new file mode 120000 index 00000000000..4826e8e382a --- /dev/null +++ b/egs2/chime4/enh_asr1/local/clean_chime4_format_data.sh @@ -0,0 +1 @@ +../../enh1/local/clean_chime4_format_data.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/clean_wsj0_data_prep.sh b/egs2/chime4/enh_asr1/local/clean_wsj0_data_prep.sh new file mode 120000 index 00000000000..5c61d4de024 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/clean_wsj0_data_prep.sh @@ -0,0 +1 @@ +../../enh1/local/clean_wsj0_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/cstr_ndx2flist.pl b/egs2/chime4/enh_asr1/local/cstr_ndx2flist.pl new file mode 120000 index 00000000000..50660a2b68e --- /dev/null +++ b/egs2/chime4/enh_asr1/local/cstr_ndx2flist.pl @@ -0,0 +1 @@ +../../enh1/local/cstr_ndx2flist.pl \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/data.sh b/egs2/chime4/enh_asr1/local/data.sh new file mode 100755 index 00000000000..dc36d70eae3 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/data.sh @@ -0,0 +1,89 @@ +#!/usr/bin/env bash + +set -e +set -u +set -o pipefail + +log() { + local fname=${BASH_SOURCE[1]##*/} + echo -e "$(date '+%Y-%m-%dT%H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*" +} +SECONDS=0 + +help_message=$(cat << EOF +Usage: $0 --extra-annotations [--stage ] [--stop_stage ] [--nj ] + + required argument: + --extra-annotations: path to a directory containing extra annotations for CHiME4 + This is required for preparing et05_simu_isolated_1ch_track. + NOTE: + You can download it manually from + http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME4/download.html + Then unzip the downloaded file to CHiME4_diff; + You will then find the extra annotations in CHiME4_diff/CHiME3/data/annotations + + optional argument: + [--stage]: 1 (default) or 2 + [--stop_stage]: 1 or 2 (default) + [--nj]: number of parallel pool workers in MATLAB +EOF +) + + +stage=0 +stop_stage=100 +extra_annotations= +local_data_opts= +train_dev=dt05_multi_isolated_1ch_track +log "$0 $*" +. utils/parse_options.sh + + +if [ $# -ne 0 ] || [ -z "${extra_annotations}" ]; then + echo "${help_message}" + exit 2 +fi + +. ./path.sh || exit 1; +. ./cmd.sh || exit 1; +. ./db.sh || exit 1; + + +if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then + log "stage 0: Enh data preparation" + local/chime4_enh_data.sh --extra_annotations ${extra_annotations} ${local_data_opts} +fi + +if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then + log "stage 1: ASR data preparation" + local/chime4_asr_data.sh --stage 0 --stop-stage 1 ${local_data_opts} +fi + +if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then + log "stage 2: Enh_ASR data preparation: combine enh and asr data" + + # dummy spk1.scp + for dset in tr05_real_noisy train_si284 dt05_real_isolated_1ch_track et05_real_isolated_1ch_track dt05_real_beamformit_2mics dt05_simu_beamformit_2mics et05_real_beamformit_2mics et05_simu_beamformit_2mics dt05_real_beamformit_5mics dt05_simu_beamformit_5mics et05_real_beamformit_5mics et05_simu_beamformit_5mics; do + cp data/${dset}/wav.scp data/${dset}/spk1.scp + done + cp data/tr05_simu_isolated_1ch_track/spk1.scp data/tr05_simu_noisy + + # utt2category + data/tr05_simu_noisy/utt2category + data/tr05_real_noisy/utt2category + data/train_si284/utt2category + data/dt05_simu_isolated_1ch_track/utt2category + data/dt05_real_isolated_1ch_track/utt2category + + utils/combine_data.sh --extra_files "utt2category spk1.scp" \ + data/tr05_multi_noisy data/tr05_simu_noisy data/tr05_real_noisy + utils/combine_data.sh --extra_files "utt2category spk1.scp" \ + data/tr05_multi_noisy_si284 data/tr05_multi_noisy data/train_si284 + utils/combine_data.sh --extra_files "utt2category spk1.scp" data/${train_dev} \ + data/dt05_simu_isolated_1ch_track data/dt05_real_isolated_1ch_track +fi + +if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then + log "stage 3: Srctexts preparation" + local/chime4_asr_data.sh --stage 2 --stop-stage 2 +fi \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/find_noisy_transcripts.pl b/egs2/chime4/enh_asr1/local/find_noisy_transcripts.pl new file mode 120000 index 00000000000..ae475b3b32d --- /dev/null +++ b/egs2/chime4/enh_asr1/local/find_noisy_transcripts.pl @@ -0,0 +1 @@ +../../enh1/local/find_noisy_transcripts.pl \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/find_transcripts.pl b/egs2/chime4/enh_asr1/local/find_transcripts.pl new file mode 120000 index 00000000000..5e58a9d0c0e --- /dev/null +++ b/egs2/chime4/enh_asr1/local/find_transcripts.pl @@ -0,0 +1 @@ +../../enh1/local/find_transcripts.pl \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/flist2scp.pl b/egs2/chime4/enh_asr1/local/flist2scp.pl new file mode 120000 index 00000000000..c44f94660eb --- /dev/null +++ b/egs2/chime4/enh_asr1/local/flist2scp.pl @@ -0,0 +1 @@ +../../enh1/local/flist2scp.pl \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/localize.m b/egs2/chime4/enh_asr1/local/localize.m new file mode 120000 index 00000000000..f93a989f0ad --- /dev/null +++ b/egs2/chime4/enh_asr1/local/localize.m @@ -0,0 +1 @@ +../../enh1/local/localize.m \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/make_stft.sh b/egs2/chime4/enh_asr1/local/make_stft.sh new file mode 120000 index 00000000000..cf9038f4ea2 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/make_stft.sh @@ -0,0 +1 @@ +../../asr1/local/make_stft.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/ndx2flist.pl b/egs2/chime4/enh_asr1/local/ndx2flist.pl new file mode 120000 index 00000000000..5f79e7991f9 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/ndx2flist.pl @@ -0,0 +1 @@ +../../asr1/local/ndx2flist.pl \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/normalize_transcript.pl b/egs2/chime4/enh_asr1/local/normalize_transcript.pl new file mode 120000 index 00000000000..1be067e3703 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/normalize_transcript.pl @@ -0,0 +1 @@ +../../enh1/local/normalize_transcript.pl \ No newline at end of file diff --git a/egs2/clarity21/enh_2021/local/path.sh b/egs2/chime4/enh_asr1/local/path.sh similarity index 100% rename from egs2/clarity21/enh_2021/local/path.sh rename to egs2/chime4/enh_asr1/local/path.sh diff --git a/egs2/chime4/enh_asr1/local/real_enhan_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/real_enhan_chime4_data_prep.sh new file mode 120000 index 00000000000..13c906eba90 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/real_enhan_chime4_data_prep.sh @@ -0,0 +1 @@ +../../asr1/local/real_enhan_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/real_ext_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/real_ext_chime4_data_prep.sh new file mode 120000 index 00000000000..6620a1d2eb4 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/real_ext_chime4_data_prep.sh @@ -0,0 +1 @@ +../../enh1/local/real_ext_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/real_noisy_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/real_noisy_chime4_data_prep.sh new file mode 120000 index 00000000000..86d5a8cca3b --- /dev/null +++ b/egs2/chime4/enh_asr1/local/real_noisy_chime4_data_prep.sh @@ -0,0 +1 @@ +../../enh1/local/real_noisy_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/run_beamform_2ch_track.sh b/egs2/chime4/enh_asr1/local/run_beamform_2ch_track.sh new file mode 120000 index 00000000000..eb7894626ea --- /dev/null +++ b/egs2/chime4/enh_asr1/local/run_beamform_2ch_track.sh @@ -0,0 +1 @@ +../../asr1/local/run_beamform_2ch_track.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/run_beamform_6ch_track.sh b/egs2/chime4/enh_asr1/local/run_beamform_6ch_track.sh new file mode 120000 index 00000000000..d8609c18f57 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/run_beamform_6ch_track.sh @@ -0,0 +1 @@ +../../asr1/local/run_beamform_6ch_track.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/show_enhance_results.sh b/egs2/chime4/enh_asr1/local/show_enhance_results.sh new file mode 120000 index 00000000000..7be0ac655cd --- /dev/null +++ b/egs2/chime4/enh_asr1/local/show_enhance_results.sh @@ -0,0 +1 @@ +../../asr1/local/show_enhance_results.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/simu_enhan_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/simu_enhan_chime4_data_prep.sh new file mode 120000 index 00000000000..f1227dc8071 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/simu_enhan_chime4_data_prep.sh @@ -0,0 +1 @@ +../../asr1/local/simu_enhan_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/simu_ext_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/simu_ext_chime4_data_prep.sh new file mode 120000 index 00000000000..58b7195ba04 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/simu_ext_chime4_data_prep.sh @@ -0,0 +1 @@ +../../enh1/local/simu_ext_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/simu_noisy_chime4_data_prep.sh b/egs2/chime4/enh_asr1/local/simu_noisy_chime4_data_prep.sh new file mode 120000 index 00000000000..da4d7f621c7 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/simu_noisy_chime4_data_prep.sh @@ -0,0 +1 @@ +../../enh1/local/simu_noisy_chime4_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/sym_channel.py b/egs2/chime4/enh_asr1/local/sym_channel.py new file mode 120000 index 00000000000..9901c190202 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/sym_channel.py @@ -0,0 +1 @@ +../../asr1/local/sym_channel.py \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/wsj_data_prep.sh b/egs2/chime4/enh_asr1/local/wsj_data_prep.sh new file mode 120000 index 00000000000..2ba8ba465af --- /dev/null +++ b/egs2/chime4/enh_asr1/local/wsj_data_prep.sh @@ -0,0 +1 @@ +../../asr1/local/wsj_data_prep.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/local/wsj_format_data.sh b/egs2/chime4/enh_asr1/local/wsj_format_data.sh new file mode 120000 index 00000000000..036fb8b8689 --- /dev/null +++ b/egs2/chime4/enh_asr1/local/wsj_format_data.sh @@ -0,0 +1 @@ +../../asr1/local/wsj_format_data.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/path.sh b/egs2/chime4/enh_asr1/path.sh new file mode 120000 index 00000000000..f2720c6899b --- /dev/null +++ b/egs2/chime4/enh_asr1/path.sh @@ -0,0 +1 @@ +../../TEMPLATE/enh_asr1/path.sh \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/pyscripts b/egs2/chime4/enh_asr1/pyscripts new file mode 120000 index 00000000000..008f9bd4bc5 --- /dev/null +++ b/egs2/chime4/enh_asr1/pyscripts @@ -0,0 +1 @@ +../../TEMPLATE/enh_asr1/pyscripts \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/run.sh b/egs2/chime4/enh_asr1/run.sh new file mode 100755 index 00000000000..c42213e9441 --- /dev/null +++ b/egs2/chime4/enh_asr1/run.sh @@ -0,0 +1,45 @@ +#!/usr/bin/env bash +# Set bash to 'debug' mode, it will exit on : +# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands', +set -e +set -u +set -o pipefail + + +extra_annotations= + +train_set=tr05_multi_noisy_si284 # tr05_multi_noisy (original training data) or tr05_multi_noisy_si284 (add si284 data) +valid_set=dt05_multi_isolated_1ch_track +test_sets="\ +dt05_real_isolated_1ch_track dt05_simu_isolated_1ch_track et05_real_isolated_1ch_track et05_simu_isolated_1ch_track \ +dt05_real_beamformit_2mics dt05_simu_beamformit_2mics et05_real_beamformit_2mics et05_simu_beamformit_2mics \ +dt05_real_beamformit_5mics dt05_simu_beamformit_5mics et05_real_beamformit_5mics et05_simu_beamformit_5mics \ +" + +enh_asr_config=conf/train_enh_asr_convtasnet_fbank_transformer.yaml +inference_config=conf/decode_asr_transformer.yaml +lm_config=conf/train_lm_transformer.yaml + + +use_word_lm=false +word_vocab_size=65000 + +./enh_asr.sh \ + --lang en \ + --spk_num 1 \ + --ref_channel 3 \ + --local_data_opts "--extra-annotations ${extra_annotations}" \ + --nlsyms_txt data/nlsyms.txt \ + --token_type char \ + --feats_type raw \ + --feats_normalize utt_mvn \ + --enh_asr_config "${enh_asr_config}" \ + --inference_config "${inference_config}" \ + --lm_config "${lm_config}" \ + --use_word_lm ${use_word_lm} \ + --word_vocab_size ${word_vocab_size} \ + --train_set "${train_set}" \ + --valid_set "${valid_set}" \ + --test_sets "${test_sets}" \ + --bpe_train_text "data/${train_set}/text" \ + --lm_train_text "data/${train_set}/text data/local/other_text/text" "$@" diff --git a/egs2/chime4/enh_asr1/scripts b/egs2/chime4/enh_asr1/scripts new file mode 120000 index 00000000000..6c0f28ef23c --- /dev/null +++ b/egs2/chime4/enh_asr1/scripts @@ -0,0 +1 @@ +../../TEMPLATE/enh_asr1/scripts \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/steps b/egs2/chime4/enh_asr1/steps new file mode 120000 index 00000000000..91f2d234e20 --- /dev/null +++ b/egs2/chime4/enh_asr1/steps @@ -0,0 +1 @@ +../../../tools/kaldi/egs/wsj/s5/steps \ No newline at end of file diff --git a/egs2/chime4/enh_asr1/utils b/egs2/chime4/enh_asr1/utils new file mode 120000 index 00000000000..f49247da827 --- /dev/null +++ b/egs2/chime4/enh_asr1/utils @@ -0,0 +1 @@ +../../../tools/kaldi/egs/wsj/s5/utils \ No newline at end of file diff --git a/egs2/chime6/asr1/README.md b/egs2/chime6/asr1/README.md new file mode 100644 index 00000000000..45a7200ec9f --- /dev/null +++ b/egs2/chime6/asr1/README.md @@ -0,0 +1,30 @@ + +# RESULTS +## Environments +- date: `Tue May 3 16:47:10 EDT 2022` +- python version: `3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0]` +- espnet version: `espnet 202204` +- pytorch version: `pytorch 1.10.1` +- Git hash: `b757b89d45d5574cebf44e225cbe32e3e9e4f522` + - Commit date: `Mon May 2 09:21:08 2022 -0400` + +## asr_train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k_raw_en_bpe1000_sp +- Pretrained model: https://huggingface.co/espnet/simpleoier_chime6_asr_transformer_wavlm_lr1e-3 +### WER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_asr_model_valid.acc.ave_5best/dev_gss_multiarray|7437|58881|69.4|20.2|10.4|8.6|39.1|75.8| + +### CER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_asr_model_valid.acc.ave_5best/dev_gss_multiarray|7437|280767|80.6|7.4|12.0|8.9|28.3|76.6| + +### TER + +|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| +|---|---|---|---|---|---|---|---|---| +|decode_asr_transformer_asr_model_valid.acc.ave_5best/dev_gss_multiarray|7437|92680|68.9|17.7|13.4|8.2|39.3|76.6| + diff --git a/egs2/chime6/asr1/asr.sh b/egs2/chime6/asr1/asr.sh new file mode 120000 index 00000000000..60b05122cfd --- /dev/null +++ b/egs2/chime6/asr1/asr.sh @@ -0,0 +1 @@ +../../TEMPLATE/asr1/asr.sh \ No newline at end of file diff --git a/egs2/chime6/asr1/cmd.sh b/egs2/chime6/asr1/cmd.sh new file mode 100644 index 00000000000..2aae6919fef --- /dev/null +++ b/egs2/chime6/asr1/cmd.sh @@ -0,0 +1,110 @@ +# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== +# Usage: .pl [options] JOB=1: +# e.g. +# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB +# +# Options: +# --time