Skip to content

Commit

Permalink
Merge branch 'r1.22.0' of github.com:NVIDIA/NeMo into r1.22.0
Browse files Browse the repository at this point in the history
  • Loading branch information
karpnv committed Jan 8, 2024
2 parents a50b42c + 9526584 commit 490a9a4
Show file tree
Hide file tree
Showing 21 changed files with 228 additions and 103 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:23.08-py3
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:23.10-py3

# build an image that includes only the nemo dependencies, ensures that dependencies
# are included first for optimal caching, and useful for building a development
Expand Down
22 changes: 12 additions & 10 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
pipeline {
agent {
docker {
image 'nvcr.io/nvidia/pytorch:23.09-py3'
image 'nvcr.io/nvidia/pytorch:23.10-py3'
args '--device=/dev/nvidia0 --gpus all --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache:/root/.cache --shm-size=8g --env TRANSFORMERS_OFFLINE=0 --env HYDRA_FULL_ERROR=1'
}
}
Expand Down Expand Up @@ -61,21 +61,23 @@ pipeline {
steps {
sh 'git clone https://github.com/NVIDIA/TransformerEngine.git && \
cd TransformerEngine && \
git fetch origin e6676c53f26f6ef072943c909d136cf2a39c1d90 && \
git fetch origin cf6fc898286e4ad347ff88925c88663324e2b87d && \
git checkout FETCH_HEAD && \
git submodule init && git submodule update && \
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .'
}
}

stage('Megatron Core installation') {
steps {
sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout 973330e9c3681604703bf1eb6b5a265d1b9b9b38 && \
pip install .'
}
}
// pip package should be working with main, if not we can update the commit here
// until the pip package is updated
// stage('Megatron Core installation') {
// steps {
// sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \
// cd Megatron-LM && \
// git checkout 973330e9c3681604703bf1eb6b5a265d1b9b9b38 && \
// pip install .'
// }
// }

stage('PyTorch Lightning version') {
steps {
Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Key Features
* `GLUE benchmark <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/glue_benchmark.html>`_
* `Information retrieval <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/information_retrieval.html>`_
* `Entity Linking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/entity_linking.html>`_
* `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/sgd_qa.html>`_
* `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/dialogue.html>`_
* `Prompt Learning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/prompt_learning.html>`_
* `NGC collection of pre-trained NLP models. <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_nlp>`_
* `Synthetic Tabular Data Generation <https://developer.nvidia.com/blog/generating-synthetic-data-with-transformers-a-solution-for-enterprise-data-challenges/>`_
Expand Down
3 changes: 1 addition & 2 deletions docs/source/asr/asr_language_modeling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -440,8 +440,7 @@ works in lexicon decoding mode, it does not work in lexicon-free mode. Word boos
such that you can manually increase or decrease the probability of emitting certain words. This can be very helpful if you have certain
uncommon or industry-specific words which you want to ensure transcribe correctly.

For more information on word boosting, see `here <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/asr-python-advanced-wordboosting.html>`__
and `here <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting>`__
For more information on word boosting, `here <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting>`__

In order to use word boosting in Nemo, you need to create a simple tab-separated text file which contains each word to be boosted, followed by
tab, and then the boosted score for that word.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/nlp/joint_intent_slot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ Model Evaluation and Inference
There is no separate script for the evaluation and inference of this model in NeMo, however, inside of the example file `examples/nlp/intent_slot_classification/intent_slot_classification.py`
after the training part is finished, you can see the code that evaluates the trained model on an evaluation test set and then an example of doing inference using a list of given queries.

For the deployment in the production environment, refer to `NVIDIA Riva <https://developer.nvidia.com/nvidia-riva-getting-started>`__ and `NVIDIA TLT documentation <https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/nlp/index.html>`__.
For the deployment in the production environment, refer to `NVIDIA Riva <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html>`__ and `NVIDIA TLT documentation <https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/nlp/index.html>`__.

References
----------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/nlp/token_classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Dataset Conversion
------------------

To convert an `IOB format <https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)>`__ (short for inside, outside, beginning) data to the format required for training, use
`examples/nlp/token_classification/data/import_from_iob_format.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/token_classification/data/import_from_iob_format.py)>`_.
`examples/nlp/token_classification/data/import_from_iob_format.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/token_classification/data/import_from_iob_format.py>`_.

.. code::
Expand Down
1 change: 1 addition & 0 deletions requirements/requirements_asr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ packaging
pyannote.core
pyannote.metrics
pydub
resampy
ruamel.yaml
scipy>=0.14
soundfile
Expand Down
3 changes: 2 additions & 1 deletion requirements/requirements_lightning.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
hydra-core>1.3,<=1.3.2
lightning-utilities<0.10.0 # temporary fix, broken named tuples in 0.10.0, remove once fixed
omegaconf<=2.3
pytorch-lightning>=2.0,<=2.0.7
torchmetrics>=0.11.0
transformers>=4.0.1,<=4.33.3
transformers>=4.36.0
wandb
webdataset>=0.1.48,<=0.1.62
2 changes: 1 addition & 1 deletion requirements/requirements_nlp.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ijson
jieba
markdown2
matplotlib>=3.3.2
megatron_core==0.3.0
megatron_core==0.4.0
nltk>=3.6.5
opencc<1.1.7
pangu
Expand Down
2 changes: 1 addition & 1 deletion scripts/speech_recognition/convert_hf_dataset_to_nemo.py
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ def main(cfg: HFDatasetConversionConfig):
split=cfg.split,
cache_dir=None,
streaming=cfg.streaming,
use_auth_token=cfg.use_auth_token,
token=cfg.use_auth_token,
)

except Exception as e:
Expand Down
Loading

0 comments on commit 490a9a4

Please sign in to comment.