Skip to content

Commit

Permalink
Fix docs errors and most warnings (NVIDIA#9006)
Browse files Browse the repository at this point in the history
* add various docs fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* make conf.py changes clearer

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* fix Duplicate explicit target name error for links

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* more fixes, mainly citations

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* fix some code formatting

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* update hf space iframe link

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* fix new ERRORs

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update docs

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
  • Loading branch information
3 people authored May 1, 2024
1 parent 0e65f1b commit fc4d67b
Show file tree
Hide file tree
Showing 53 changed files with 306 additions and 268 deletions.
53 changes: 24 additions & 29 deletions docs/source/asr/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -261,11 +261,6 @@ Semi Sorted Batching

Sorting samples by duration and spliting them into batches speeds up training, but can degrade the quality of the model. To avoid quality degradation and maintain some randomness in the partitioning process, we add pseudo noise to the sample length when sorting.

.. image:: images/ssb.png
:align: center
:alt: semi sorted batching
:scale: 50%

It may result into training speeedup of more than 40 percent with the same quality. To enable and use semi sorted batching add some lines in config.

.. code::
Expand Down Expand Up @@ -772,30 +767,30 @@ To enable multimodal dataloading, we provide several configuration options:

Example 3. Combine an ASR (audio-text) dataset with an MT (text-only) dataset so that mini-batches have some examples from both datasets. Provide a custom prompt field for both datasets (to be leveraged by a relevant dataset class):

```yaml
use_multimodal_sampling: true
batch_tokens: 1024
token_equivalent_duration: 0.08 # 0.01 frame shift * 8 subsampling factor
quadratic_factor: 50
num_buckets: 30
use_bucketing: true
input_cfg:
- type: nemo_tarred
manifest_filepath: /path/to/manifest__OP_0..512_CL_.json
tarred_audio_filepath: /path/to/tarred_audio/audio__OP_0..512_CL_.tar
weight: 0.5
tags:
lang: en
prompt: "Given the following recording, transcribe what the person is saying:"
- type: txt_pair
source_path: /path/to/en__OP_0..512_CL_.txt
target_path: /path/to/pl__OP_0..512_CL_.txt
source_language: en
target_language: pl
weight: 0.5
tags:
prompt: "Translate the following text to Polish:"
```
.. code-block:: yaml
use_multimodal_sampling: true
batch_tokens: 1024
token_equivalent_duration: 0.08 # 0.01 frame shift * 8 subsampling factor
quadratic_factor: 50
num_buckets: 30
use_bucketing: true
input_cfg:
- type: nemo_tarred
manifest_filepath: /path/to/manifest__OP_0..512_CL_.json
tarred_audio_filepath: /path/to/tarred_audio/audio__OP_0..512_CL_.tar
weight: 0.5
tags:
lang: en
prompt: "Given the following recording, transcribe what the person is saying:"
- type: txt_pair
source_path: /path/to/en__OP_0..512_CL_.txt
target_path: /path/to/pl__OP_0..512_CL_.txt
source_language: en
target_language: pl
weight: 0.5
tags:
prompt: "Translate the following text to Polish:"
.. caution:: We strongly recommend to use multiple shards for text files as well so that different nodes and dataloading workers are able to randomize the order of text iteration. Otherwise, multi-GPU training has a high risk of duplication of text examples.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/asr/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,11 +156,11 @@ Canary-1B is a multi-lingual, multi-task model, supporting automatic speech-to-t

.. raw:: html

<iframe src="https://hf.space/embed/nvidia/canary-1b/+"
<iframe src="https://nvidia-canary-1b.hf.space"
width="100%" class="gradio-asr" allow="microphone *"></iframe>

<script type="text/javascript" language="javascript">
$('.gradio-asr').css('height', $(window).height()+'px');
$('.gradio-asr').css('height', $(window).height() * 0.8+'px');
</script>


Expand Down
4 changes: 3 additions & 1 deletion docs/source/asr/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,14 @@ HuggingFace Spaces to try out Parakeet models in your browser:
* `Parakeet-TDT-1.1B <https://huggingface.co/spaces/nvidia/parakeet-tdt-1.1b>`__ space

.. _Conformer_model:

Conformer
---------

.. _Conformer-CTC_model:

Conformer-CTC
~~~~~~~~~~~~~
-------------

Conformer-CTC is a CTC-based variant of the Conformer model introduced in :cite:`asr-models-gulati2020conformer`. Conformer-CTC has a
similar encoder as the original Conformer but uses CTC loss and decoding instead of RNNT/Transducer loss, which makes it a non-autoregressive model.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/asr/speech_intent_slot/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@ Mixins
.. autoclass:: nemo.collections.asr.parts.mixins.ASRModuleMixin
:show-inheritance:
:members:
:no-index:

.. autoclass:: nemo.collections.asr.parts.mixins.ASRBPEMixin
:show-inheritance:
:members:
:no-index:

2 changes: 2 additions & 0 deletions docs/source/asr/ssl/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ Mixins
.. autoclass:: nemo.collections.asr.parts.mixins.mixins.ASRModuleMixin
:show-inheritance:
:members:
:no-index:

.. autoclass:: nemo.core.classes.mixins.access_mixins.AccessMixin
:show-inheritance:
:members:
:no-index:



4 changes: 2 additions & 2 deletions docs/source/ckpt_converters/dev_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Script Placement and Naming Conventions
Code Template
-------------

Below template tries to address the 11 steps in the guideline part. Please also use `Gemma Huggingface to NeMo converter <https://github.com/NVIDIA/NeMo/tree/main/scripts/checkpoint_converters/convert_gemma_hf_to_nemo.py>`_ as an full example for development.
Below template tries to address the 11 steps in the guideline part. Please also use `Gemma Huggingface to NeMo converter <https://github.com/NVIDIA/NeMo/tree/main/scripts/checkpoint_converters/convert_gemma_hf_to_nemo.py>`__ as an full example for development.

.. code-block:: python
Expand Down Expand Up @@ -210,7 +210,7 @@ A Simple Guide for Model Mapping and Conversion

2. **Common issues when converting: results not matching between Community model and NeMo model**:

a. Megatron Core uses a special QKV layout, which needs careful handling and reshaping from community models, especially when GQA or MQA is used. Refer to the `Gemma Huggingface to NeMo converter <https://github.com/NVIDIA/NeMo/tree/main/scripts/checkpoint_converters/convert_gemma_hf_to_nemo.py#L144>`_ for guidance.
a. Megatron Core uses a special QKV layout, which needs careful handling and reshaping from community models, especially when GQA or MQA is used. Refer to the `Gemma Huggingface to NeMo converter <https://github.com/NVIDIA/NeMo/tree/main/scripts/checkpoint_converters/convert_gemma_hf_to_nemo.py#L144>`__ for guidance.

b. GLU Variants weights could also be a common source of error. In Megatron Core, the regular feedforward projection weights and gated forward weights are fused together, requiring careful attention to the order of these two. Refer to the `Gemma Huggingface to NeMo converter <https://github.com/NVIDIA/NeMo/tree/main/scripts/checkpoint_converters/convert_gemma_hf_to_nemo.py#L135>`_ for more details.

Expand Down
Loading

0 comments on commit fc4d67b

Please sign in to comment.