Merge branch 'r1.22.0' of github.com:NVIDIA/NeMo into r1.22.0

NVIDIA · Jan 8, 2024 · 490a9a4 · 490a9a4
2 parents a50b42c + 9526584
commit 490a9a4
Show file tree

Hide file tree

Showing 21 changed files with 228 additions and 103 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -14,7 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:23.08-py3
+ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:23.10-py3
 
 # build an image that includes only the nemo dependencies, ensures that dependencies
 # are included first for optimal caching, and useful for building a development

diff --git a/Jenkinsfile b/Jenkinsfile
@@ -1,7 +1,7 @@
 pipeline {
   agent {
         docker {
-          image 'nvcr.io/nvidia/pytorch:23.09-py3'
+          image 'nvcr.io/nvidia/pytorch:23.10-py3'
           args '--device=/dev/nvidia0 --gpus all --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache:/root/.cache --shm-size=8g --env TRANSFORMERS_OFFLINE=0 --env HYDRA_FULL_ERROR=1'
         }
   }
@@ -61,21 +61,23 @@ pipeline {
       steps {
          sh 'git clone https://github.com/NVIDIA/TransformerEngine.git && \
              cd TransformerEngine && \
-             git fetch origin e6676c53f26f6ef072943c909d136cf2a39c1d90 && \
+             git fetch origin cf6fc898286e4ad347ff88925c88663324e2b87d && \
              git checkout FETCH_HEAD && \
              git submodule init && git submodule update && \
              NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .'
       }
     }
 
-    stage('Megatron Core installation') {
-      steps {
-         sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \
-             cd Megatron-LM && \
-             git checkout 973330e9c3681604703bf1eb6b5a265d1b9b9b38 && \
-             pip install .'
-      }
-    }
+    // pip package should be working with main, if not we can update the commit here
+    // until the pip package is updated
+    // stage('Megatron Core installation') {
+    //   steps {
+    //      sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \
+    //          cd Megatron-LM && \
+    //          git checkout 973330e9c3681604703bf1eb6b5a265d1b9b9b38 && \
+    //          pip install .'
+    //   }
+    // }
 
     stage('PyTorch Lightning version') {
       steps {

diff --git a/README.rst b/README.rst
@@ -108,7 +108,7 @@ Key Features
     * `GLUE benchmark <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/glue_benchmark.html>`_
     * `Information retrieval <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/information_retrieval.html>`_
     * `Entity Linking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/entity_linking.html>`_
-    * `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/sgd_qa.html>`_
+    * `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/dialogue.html>`_
     * `Prompt Learning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/prompt_learning.html>`_
     * `NGC collection of pre-trained NLP models. <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_nlp>`_
     * `Synthetic Tabular Data Generation <https://developer.nvidia.com/blog/generating-synthetic-data-with-transformers-a-solution-for-enterprise-data-challenges/>`_

diff --git a/docs/source/asr/asr_language_modeling.rst b/docs/source/asr/asr_language_modeling.rst
@@ -440,8 +440,7 @@ works in lexicon decoding mode, it does not work in lexicon-free mode. Word boos
 such that you can manually increase or decrease the probability of emitting certain words. This can be very helpful if you have certain
 uncommon or industry-specific words which you want to ensure transcribe correctly.
 
-For more information on word boosting, see `here <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/asr-python-advanced-wordboosting.html>`__
-and `here <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting>`__
+For more information on word boosting, `here <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting>`__
 
 In order to use word boosting in Nemo, you need to create a simple tab-separated text file which contains each word to be boosted, followed by
 tab, and then the boosted score for that word.

diff --git a/docs/source/nlp/joint_intent_slot.rst b/docs/source/nlp/joint_intent_slot.rst
@@ -233,7 +233,7 @@ Model Evaluation and Inference
 There is no separate script for the evaluation and inference of this model in NeMo, however, inside of the example file `examples/nlp/intent_slot_classification/intent_slot_classification.py` 
 after the training part is finished, you can see the code that evaluates the trained model on an evaluation test set and then an example of doing inference using a list of given queries.
 
-For the deployment in the production environment, refer to `NVIDIA Riva <https://developer.nvidia.com/nvidia-riva-getting-started>`__ and `NVIDIA TLT documentation <https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/nlp/index.html>`__.
+For the deployment in the production environment, refer to `NVIDIA Riva <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html>`__ and `NVIDIA TLT documentation <https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/nlp/index.html>`__.
 
 References
 ----------

diff --git a/docs/source/nlp/token_classification.rst b/docs/source/nlp/token_classification.rst
@@ -68,7 +68,7 @@ Dataset Conversion
 ------------------
 
 To convert an `IOB format <https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)>`__ (short for inside, outside, beginning) data to the format required for training, use
-`examples/nlp/token_classification/data/import_from_iob_format.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/token_classification/data/import_from_iob_format.py)>`_.
+`examples/nlp/token_classification/data/import_from_iob_format.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/token_classification/data/import_from_iob_format.py>`_.
 
 .. code::
 

diff --git a/requirements/requirements_asr.txt b/requirements/requirements_asr.txt
@@ -12,6 +12,7 @@ packaging
 pyannote.core
 pyannote.metrics
 pydub
+resampy
 ruamel.yaml
 scipy>=0.14
 soundfile

diff --git a/requirements/requirements_lightning.txt b/requirements/requirements_lightning.txt
@@ -1,7 +1,8 @@
 hydra-core>1.3,<=1.3.2
+lightning-utilities<0.10.0  # temporary fix, broken named tuples in 0.10.0, remove once fixed
 omegaconf<=2.3
 pytorch-lightning>=2.0,<=2.0.7
 torchmetrics>=0.11.0
-transformers>=4.0.1,<=4.33.3
+transformers>=4.36.0
 wandb
 webdataset>=0.1.48,<=0.1.62
diff --git a/requirements/requirements_nlp.txt b/requirements/requirements_nlp.txt
@@ -10,7 +10,7 @@ ijson
 jieba
 markdown2
 matplotlib>=3.3.2
-megatron_core==0.3.0
+megatron_core==0.4.0
 nltk>=3.6.5
 opencc<1.1.7
 pangu

diff --git a/scripts/speech_recognition/convert_hf_dataset_to_nemo.py b/scripts/speech_recognition/convert_hf_dataset_to_nemo.py
@@ -361,7 +361,7 @@ def main(cfg: HFDatasetConversionConfig):
             split=cfg.split,
             cache_dir=None,
             streaming=cfg.streaming,
-            use_auth_token=cfg.use_auth_token,
+            token=cfg.use_auth_token,
         )
 
     except Exception as e: