Skip to content

Commit

Permalink
Merge release r1.20.0 to main (NVIDIA#7167)
Browse files Browse the repository at this point in the history
* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (NVIDIA#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (NVIDIA#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (NVIDIA#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (NVIDIA#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>
  • Loading branch information
7 people authored and dorotat-nv committed Aug 24, 2023
1 parent b94c0ad commit 0ce42d1
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 6 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ COPY . .

# start building the final container
FROM nemo-deps as nemo
ARG NEMO_VERSION=1.20.0
ARG NEMO_VERSION=1.21.0

# Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container
# version information as runtime environment variable for introspection purposes
Expand Down
2 changes: 1 addition & 1 deletion nemo/collections/asr/parts/utils/vad_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -732,7 +732,7 @@ def generate_vad_segment_table(
vad_pred_filepath_list = [os.path.join(vad_pred_dir, x) for x in os.listdir(vad_pred_dir) if x.endswith(suffixes)]

if not out_dir:
out_dir_name = "seg_output_"
out_dir_name = "seg_output"
for key in postprocessing_params:
out_dir_name = out_dir_name + "-" + str(key) + str(postprocessing_params[key])

Expand Down
2 changes: 1 addition & 1 deletion nemo/package_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@


MAJOR = 1
MINOR = 20
MINOR = 21
PATCH = 0
PRE_RELEASE = 'rc0'

Expand Down
2 changes: 1 addition & 1 deletion tutorials/asr/Offline_ASR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -655,4 +655,4 @@
"outputs": []
}
]
}
}
4 changes: 3 additions & 1 deletion tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -934,7 +934,9 @@
"id": "9T3CZcCAmxCz"
},
"source": [
"Now we have a folder with generated audios `audio/*.wav` and a nemo manifest with json records like `{\"audio_filepath\": \"audio/0.wav\", \"text\": \"no renal auditory or vestibular toxicity was observed\", \"orig_text\": \"No renal, auditory, or vestibular toxicity was observed.\"}`."
"Now we have a folder with generated audios `audio/*.wav` and a nemo manifest with json records like `{\"audio_filepath\": \"audio/0.wav\", \"text\": \"no renal auditory or vestibular toxicity was observed\", \"orig_text\": \"No renal, auditory, or vestibular toxicity was observed.\"}`.",
"\n",
"Note that TTS model may mispronounce some unknown words, for example, abbreviations like `tRNAs`."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion tutorials/tools/CTC_Segmentation_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@
"* `max_length` argument - max number of words in a segment for alignment (used only if there are no punctuation marks present in the original text. Long non-speech segments are better for segments split and are more likely to co-occur with punctuation marks. Random text split could deteriorate the quality of the alignment.\n",
"* out-of-vocabulary words will be removed based on pre-trained ASR model vocabulary, and the text will be changed to lowercase \n",
"* sentences for alignment with the original punctuation and capitalization will be stored under `$OUTPUT_DIR/processed/*_with_punct.txt`\n",
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
"\n",
"### Audio preprocessing:\n",
"* non '.wav' audio files will be converted to `.wav` format\n",
Expand Down

0 comments on commit 0ce42d1

Please sign in to comment.