Merge release r1.20.0 to main (NVIDIA#7167)

* update package info Signed-off-by: ericharper <complex451@gmail.com> * Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * install_bs (NVIDIA#7019) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * Fix typo and branch in tutorial (NVIDIA#7048) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * fix syntax error introduced in PR-7079 (NVIDIA#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes for pr review Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> --------- Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix links for TN (NVIDIA#7117) Signed-off-by: Evelina <ebakhturina@nvidia.com> * update branch (NVIDIA#7135) Signed-off-by: ericharper <complex451@gmail.com> * Fixed main and merging this to r1.20 (NVIDIA#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park <tango4j@gmail.com> * Update vad_utils.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> --------- Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * fix version Signed-off-by: ericharper <complex451@gmail.com> * resolve conflict the other way Signed-off-by: ericharper <complex451@gmail.com> * keep both Signed-off-by: ericharper <complex451@gmail.com> * revert keep both Signed-off-by: ericharper <complex451@gmail.com> --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Nikolay Karpov <karpnv@gmail.com> Co-authored-by: bene-ges <antonova_sasha@list.ru> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: dorotat <dorotat@nvidia.com>
dorotat-nv · Aug 24, 2023 · 0ce42d1 · 0ce42d1
1 parent b94c0ad
commit 0ce42d1
Show file tree

Hide file tree

Showing 6 changed files with 8 additions and 6 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -94,7 +94,7 @@ COPY . .
 
 # start building the final container
 FROM nemo-deps as nemo
-ARG NEMO_VERSION=1.20.0
+ARG NEMO_VERSION=1.21.0
 
 # Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container
 # version information as runtime environment variable for introspection purposes

diff --git a/nemo/collections/asr/parts/utils/vad_utils.py b/nemo/collections/asr/parts/utils/vad_utils.py
@@ -732,7 +732,7 @@ def generate_vad_segment_table(
     vad_pred_filepath_list = [os.path.join(vad_pred_dir, x) for x in os.listdir(vad_pred_dir) if x.endswith(suffixes)]
 
     if not out_dir:
-        out_dir_name = "seg_output_"
+        out_dir_name = "seg_output"
         for key in postprocessing_params:
             out_dir_name = out_dir_name + "-" + str(key) + str(postprocessing_params[key])
 

diff --git a/nemo/package_info.py b/nemo/package_info.py
@@ -14,7 +14,7 @@
 
 
 MAJOR = 1
-MINOR = 20
+MINOR = 21
 PATCH = 0
 PRE_RELEASE = 'rc0'
 

diff --git a/tutorials/asr/Offline_ASR.ipynb b/tutorials/asr/Offline_ASR.ipynb
@@ -655,4 +655,4 @@
       "outputs": []
     }
   ]
-}
+}
diff --git a/tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb b/tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb
@@ -934,7 +934,9 @@
         "id": "9T3CZcCAmxCz"
       },
       "source": [
-        "Now we have a folder with generated audios `audio/*.wav` and a nemo manifest with json records like `{\"audio_filepath\": \"audio/0.wav\", \"text\": \"no renal auditory or vestibular toxicity was observed\", \"orig_text\": \"No renal, auditory, or vestibular toxicity was observed.\"}`."
+        "Now we have a folder with generated audios `audio/*.wav` and a nemo manifest with json records like `{\"audio_filepath\": \"audio/0.wav\", \"text\": \"no renal auditory or vestibular toxicity was observed\", \"orig_text\": \"No renal, auditory, or vestibular toxicity was observed.\"}`.",
+        "\n",
+        "Note that TTS model may mispronounce some unknown words, for example, abbreviations like `tRNAs`."
       ]
     },
     {

diff --git a/tutorials/tools/CTC_Segmentation_Tutorial.ipynb b/tutorials/tools/CTC_Segmentation_Tutorial.ipynb
@@ -280,7 +280,7 @@
         "* `max_length` argument - max number of words in a segment for alignment (used only if there are no punctuation marks present in the original text. Long non-speech segments are better for segments split and are more likely to co-occur with punctuation marks. Random text split could deteriorate the quality of the alignment.\n",
         "* out-of-vocabulary words will be removed based on pre-trained ASR model vocabulary, and the text will be changed to lowercase \n",
         "* sentences for alignment with the original punctuation and capitalization will be stored under  `$OUTPUT_DIR/processed/*_with_punct.txt`\n",
-        "* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
+        "* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
         "\n",
         "### Audio preprocessing:\n",
         "* non '.wav' audio files will be converted to `.wav` format\n",
-Original file line number
+Diff line change
@@ Expand Up / @@ -655,4 +655,4 @@ @@
           "outputs": []
         }
       ]
-    }
+    }