From 643813cf5e695033df6c95a44df799dec7556cb3 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Wed, 8 May 2024 10:16:13 -0700 Subject: [PATCH 1/2] add TN/ITN link in speech tools list Signed-off-by: Elena Rastorgueva --- docs/source/nlp/text_normalization/wfst/intro.rst | 2 +- docs/source/tools/intro.rst | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/nlp/text_normalization/wfst/intro.rst b/docs/source/nlp/text_normalization/wfst/intro.rst index a5d6ab3a8c5d..9805345b30b8 100644 --- a/docs/source/nlp/text_normalization/wfst/intro.rst +++ b/docs/source/nlp/text_normalization/wfst/intro.rst @@ -5,7 +5,7 @@ NeMo-text-processing supports Text Normalization (TN), audio-based TN and Invers .. warning:: - *TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.* + TN/ITN transitioned from `NVIDIA/NeMo `__ repository to a standalone `NVIDIA/NeMo-text-processing `__ repository. All updates and discussions/issues should go to the new repository. WFST-based TN/ITN: diff --git a/docs/source/tools/intro.rst b/docs/source/tools/intro.rst index 5a08d05f3405..b38e435353c6 100644 --- a/docs/source/tools/intro.rst +++ b/docs/source/tools/intro.rst @@ -20,3 +20,4 @@ There are also additional NeMo-related tools hosted in separate github repositor :maxdepth: 1 speech_data_processor + ../nlp/text_normalization/intro From 80261f491fce61d889c32e94c12dc9e6b0416f77 Mon Sep 17 00:00:00 2001 From: Elena Rastorgueva Date: Wed, 8 May 2024 11:05:59 -0700 Subject: [PATCH 2/2] fix TN docs warnings Signed-off-by: Elena Rastorgueva --- .../nn_text_normalization.rst | 26 +++++++++---------- .../text_normalization_as_tagging.rst | 6 ++--- .../wfst/wfst_text_normalization.rst | 5 ++-- .../wfst/wfst_text_processing_deployment.rst | 5 ++-- 4 files changed, 20 insertions(+), 22 deletions(-) diff --git a/docs/source/nlp/text_normalization/nn_text_normalization.rst b/docs/source/nlp/text_normalization/nn_text_normalization.rst index 0de5ccefef05..87530dbcbc29 100644 --- a/docs/source/nlp/text_normalization/nn_text_normalization.rst +++ b/docs/source/nlp/text_normalization/nn_text_normalization.rst @@ -26,7 +26,7 @@ The term *duplex* refers to the fact that our system can be trained to do both T Quick Start Guide ----------------- -To run the pretrained models interactively see :ref:`inference_text_normalization`. +To run the pretrained models interactively see :ref:`inference_text_normalization_nn`. Available models ^^^^^^^^^^^^^^^^ @@ -79,7 +79,7 @@ The purpose of the preprocessing scripts is to standardize the format in order t We also changed punctuation class `PUNCT` to be treated like a plain token ( label changed from ` to ```), since we want to preserve punctuation even after normalization. For text normalization it is crucial to avoid unrecoverable errors, which are linguistically coherent and not semantic preserving. We noticed that due to data scarcity the model struggles verbalizing long numbers correctly, so we changed the ground truth for long numbers to digit by digit verbalization. -We also ignore certain semiotic classes from neural verbalization, e.g. `ELECTRONIC` or `WHITELIST` -- `VERBATIM` and `LETTER` in the original dataset. Instead we label urls/email addresses and abbreviations as plain tokens, and handle it separately with WFST-based grammars, see :ref:`inference_text_normalization`. +We also ignore certain semiotic classes from neural verbalization, e.g. `ELECTRONIC` or `WHITELIST` -- `VERBATIM` and `LETTER` in the original dataset. Instead we label urls/email addresses and abbreviations as plain tokens, and handle it separately with WFST-based grammars, see :ref:`inference_text_normalization_nn`. This simplifies the task for the model and significantly reduces unrecoverable errors. @@ -199,7 +199,7 @@ To enable training with the tarred dataset, add the following arguments: data.train_ds.use_tarred_dataset=True \ data.train_ds.tar_metadata_file=\PATH_TO\\metadata.json -.. _inference_text_normalization: +.. _inference_text_normalization_nn: Model Inference --------------- @@ -230,16 +230,16 @@ To run inference from a file adjust the previous command by This pipeline consists of - * WFST-based grammars to verbalize hard classes, such as urls and abbreviations. - * regex pre-preprocssing of the input, e.g. - * adding space around `-` in alpha-numerical words, e.g. `2-car` -> `2 - car` - * converting unicode fraction e.g. ½ to 1/2 - * normalizing greek letters and some special characters, e.g. `+` -> `plus` - * Moses :cite:`nlp-textnorm-koehnetal2007moses`. tokenization/preprocessing of the input - * inference with neural tagger and decoder - * Moses postprocessing/ detokenization - * WFST-based grammars to verbalize some `VERBATIM` - * punctuation correction for TTS (to match the output punctuation to the input form) +* WFST-based grammars to verbalize hard classes, such as urls and abbreviations. +* regex pre-preprocssing of the input, e.g. + * adding space around `-` in alpha-numerical words, e.g. `2-car` -> `2 - car` + * converting unicode fraction e.g. ½ to 1/2 + * normalizing greek letters and some special characters, e.g. `+` -> `plus` +* Moses :cite:`nlp-textnorm-koehnetal2007moses` tokenization/preprocessing of the input +* inference with neural tagger and decoder +* Moses postprocessing/ detokenization +* WFST-based grammars to verbalize some `VERBATIM` +* punctuation correction for TTS (to match the output punctuation to the input form) Model Architecture ------------------ diff --git a/docs/source/nlp/text_normalization/text_normalization_as_tagging.rst b/docs/source/nlp/text_normalization/text_normalization_as_tagging.rst index 702fb9425026..07e1fbd7702c 100644 --- a/docs/source/nlp/text_normalization/text_normalization_as_tagging.rst +++ b/docs/source/nlp/text_normalization/text_normalization_as_tagging.rst @@ -20,7 +20,7 @@ An example bash-script that runs inference and evaluation is provided here: `run Quick Start Guide ----------------- -To run the pretrained models see :ref:`inference_text_normalization`. +To run the pretrained models see :ref:`inference_text_normalization_tagging`. Available models ^^^^^^^^^^^^^^^^ @@ -115,7 +115,7 @@ Example of a training command: -.. _inference_text_normalization: +.. _inference_text_normalization_tagging: Model Inference --------------- @@ -162,4 +162,4 @@ References .. bibliography:: tn_itn_all.bib :style: plain :labelprefix: NLP-TEXTNORM-TAG - :keyprefix: nlp-textnorm-tag + :keyprefix: nlp-textnorm-tag- diff --git a/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst b/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst index 7e1a34c3864e..8fab07e6e278 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_text_normalization.rst @@ -5,8 +5,7 @@ Text (Inverse) Normalization .. warning:: - *TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.* - + TN/ITN transitioned from `NVIDIA/NeMo `_ repository to a standalone `NVIDIA/NeMo-text-processing `_ repository. All updates and discussions/issues should go to the new repository. The `nemo_text_processing` Python package is based on WFST grammars :cite:`textprocessing-norm-mohri2005weighted` and supports: @@ -188,7 +187,7 @@ Language Support Matrix See :ref:`Grammar customization ` for grammar customization details. -See :ref:`Text Processing Deployment ` for deployment in C++ details. +See :doc:`Text Processing Deployment <./wfst_text_processing_deployment>` for deployment in C++ details. WFST TN/ITN resources could be found in :ref:`here `. diff --git a/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst b/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst index 396a7cde578e..4d584e13526b 100644 --- a/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst +++ b/docs/source/nlp/text_normalization/wfst/wfst_text_processing_deployment.rst @@ -5,14 +5,13 @@ Deploy to Production with C++ backend .. warning:: - *TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.* - + TN/ITN transitioned from `NVIDIA/NeMo `_ repository to a standalone `NVIDIA/NeMo-text-processing `_ repository. All updates and discussions/issues should go to the new repository. NeMo-text-processing provides tools to deploy :doc:`TN and ITN ` for production :cite:`textprocessing-deployment-zhang2021nemo`. It uses `Sparrowhawk `_ :cite:`textprocessing-deployment-sparrowhawk` -- an open-source C++ framework by Google. The grammars written with NeMo-text-processing can be exported into an `OpenFST `_ Archive File (FAR) and dropped into Sparrowhawk. - .. image:: images/deployment_pipeline.png + .. image:: ./images/deployment_pipeline.png :align: center :alt: Deployment pipeline :scale: 50%