Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add TN/ITN link in speech tools list #9142

Merged
merged 4 commits into from
May 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions docs/source/nlp/text_normalization/nn_text_normalization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The term *duplex* refers to the fact that our system can be trained to do both T
Quick Start Guide
-----------------

To run the pretrained models interactively see :ref:`inference_text_normalization`.
To run the pretrained models interactively see :ref:`inference_text_normalization_nn`.

Available models
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -79,7 +79,7 @@ The purpose of the preprocessing scripts is to standardize the format in order t
We also changed punctuation class `PUNCT` to be treated like a plain token ( label changed from `<sil> to ``<self>`), since we want to preserve punctuation even after normalization.
For text normalization it is crucial to avoid unrecoverable errors, which are linguistically coherent and not semantic preserving.
We noticed that due to data scarcity the model struggles verbalizing long numbers correctly, so we changed the ground truth for long numbers to digit by digit verbalization.
We also ignore certain semiotic classes from neural verbalization, e.g. `ELECTRONIC` or `WHITELIST` -- `VERBATIM` and `LETTER` in the original dataset. Instead we label urls/email addresses and abbreviations as plain tokens, and handle it separately with WFST-based grammars, see :ref:`inference_text_normalization`.
We also ignore certain semiotic classes from neural verbalization, e.g. `ELECTRONIC` or `WHITELIST` -- `VERBATIM` and `LETTER` in the original dataset. Instead we label urls/email addresses and abbreviations as plain tokens, and handle it separately with WFST-based grammars, see :ref:`inference_text_normalization_nn`.
This simplifies the task for the model and significantly reduces unrecoverable errors.


Expand Down Expand Up @@ -199,7 +199,7 @@ To enable training with the tarred dataset, add the following arguments:
data.train_ds.use_tarred_dataset=True \
data.train_ds.tar_metadata_file=\PATH_TO\<TARRED_DATA_OUTPUT_DIR>\metadata.json

.. _inference_text_normalization:
.. _inference_text_normalization_nn:

Model Inference
---------------
Expand Down Expand Up @@ -230,16 +230,16 @@ To run inference from a file adjust the previous command by

This pipeline consists of

* WFST-based grammars to verbalize hard classes, such as urls and abbreviations.
* regex pre-preprocssing of the input, e.g.
* adding space around `-` in alpha-numerical words, e.g. `2-car` -> `2 - car`
* converting unicode fraction e.g. ½ to 1/2
* normalizing greek letters and some special characters, e.g. `+` -> `plus`
* Moses :cite:`nlp-textnorm-koehnetal2007moses`. tokenization/preprocessing of the input
* inference with neural tagger and decoder
* Moses postprocessing/ detokenization
* WFST-based grammars to verbalize some `VERBATIM`
* punctuation correction for TTS (to match the output punctuation to the input form)
* WFST-based grammars to verbalize hard classes, such as urls and abbreviations.
* regex pre-preprocssing of the input, e.g.
* adding space around `-` in alpha-numerical words, e.g. `2-car` -> `2 - car`
* converting unicode fraction e.g. ½ to 1/2
* normalizing greek letters and some special characters, e.g. `+` -> `plus`
* Moses :cite:`nlp-textnorm-koehnetal2007moses` tokenization/preprocessing of the input
* inference with neural tagger and decoder
* Moses postprocessing/ detokenization
* WFST-based grammars to verbalize some `VERBATIM`
* punctuation correction for TTS (to match the output punctuation to the input form)

Model Architecture
------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ An example bash-script that runs inference and evaluation is provided here: `run
Quick Start Guide
-----------------

To run the pretrained models see :ref:`inference_text_normalization`.
To run the pretrained models see :ref:`inference_text_normalization_tagging`.

Available models
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -115,7 +115,7 @@ Example of a training command:



.. _inference_text_normalization:
.. _inference_text_normalization_tagging:

Model Inference
---------------
Expand Down Expand Up @@ -162,4 +162,4 @@ References
.. bibliography:: tn_itn_all.bib
:style: plain
:labelprefix: NLP-TEXTNORM-TAG
:keyprefix: nlp-textnorm-tag
:keyprefix: nlp-textnorm-tag-
2 changes: 1 addition & 1 deletion docs/source/nlp/text_normalization/wfst/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ NeMo-text-processing supports Text Normalization (TN), audio-based TN and Invers

.. warning::

*TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.*
TN/ITN transitioned from `NVIDIA/NeMo <https://github.com/NVIDIA/NeMo>`__ repository to a standalone `NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`__ repository. All updates and discussions/issues should go to the new repository.


WFST-based TN/ITN:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ Text (Inverse) Normalization

.. warning::

*TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.*

TN/ITN transitioned from `NVIDIA/NeMo <https://github.com/NVIDIA/NeMo>`_ repository to a standalone `NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`_ repository. All updates and discussions/issues should go to the new repository.

The `nemo_text_processing` Python package is based on WFST grammars :cite:`textprocessing-norm-mohri2005weighted` and supports:

Expand Down Expand Up @@ -188,7 +187,7 @@ Language Support Matrix

See :ref:`Grammar customization <wfst_customization>` for grammar customization details.

See :ref:`Text Processing Deployment <wfst_text_processing_deployment>` for deployment in C++ details.
See :doc:`Text Processing Deployment <./wfst_text_processing_deployment>` for deployment in C++ details.

WFST TN/ITN resources could be found in :ref:`here <wfst_resources>`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,13 @@ Deploy to Production with C++ backend

.. warning::

*TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.*

TN/ITN transitioned from `NVIDIA/NeMo <https://github.com/NVIDIA/NeMo>`_ repository to a standalone `NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`_ repository. All updates and discussions/issues should go to the new repository.

NeMo-text-processing provides tools to deploy :doc:`TN and ITN <wfst_text_normalization>` for production :cite:`textprocessing-deployment-zhang2021nemo`.
It uses `Sparrowhawk <https://github.com/google/sparrowhawk>`_ :cite:`textprocessing-deployment-sparrowhawk` -- an open-source C++ framework by Google.
The grammars written with NeMo-text-processing can be exported into an `OpenFST <https://www.openfst.org/>`_ Archive File (FAR) and dropped into Sparrowhawk.

.. image:: images/deployment_pipeline.png
.. image:: ./images/deployment_pipeline.png
:align: center
:alt: Deployment pipeline
:scale: 50%
Expand Down
1 change: 1 addition & 0 deletions docs/source/tools/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ There are also additional NeMo-related tools hosted in separate github repositor
:maxdepth: 1

speech_data_processor
../nlp/text_normalization/intro
Loading