Open source export and deploy modules #8743

oyilmaz-nvidia · 2024-03-25T18:56:58Z

What does this PR do ?

Open sourcing the export and deploy module that exports a nemo checkpoint to TensorRT-LLM and serves it using PyTriton.

Collection: [None]

Changelog

Added export and deploy folders under the main folder.

Usage

You can potentially add a usage example below

# A simple example is as follows. 

# Run the following on the server side

from nemo.export import TensorRTLLM
from nemo.deploy import DeployPyTriton

# To export to TRT-LLM

trt_llm_exporter = TensorRTLLM(model_dir="/path/to/trt_llm_engine_folder")
trt_llm_exporter.export(
    nemo_checkpoint_path="/path/to/nemo/ckpt.nemo",
    model_type="model_type",
    n_gpus=1,
)

# To start serving on Triton server

nm = DeployPyTriton(model=trt_llm_exporter, triton_model_name="LLM")
nm.deploy()
nm.run()

# To query the model, please run the following on a the client side

from nemo.deploy import NemoQuery

nq = NemoQuery(url="localhost:8000", model_name="LLM")
output = nq.query_llm(
    prompts=["what is the color of a banana?"], , top_k=1, top_p=0.0, temperature=1.0,
)

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? (Test location: https://github.com/oyilmaz-nvidia/NeMo/blob/oss-export/tests/export/test_nemo_export.py)
Did you add or update any necessary documentation? (Added documentation on the gitlab repo for the docs. Will need to make some updates though)
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

for more information, see https://pre-commit.ci

titu1994

There are some major and minor comments that require addressing, however the PR overall is very well done.

nemo/deploy/__init__.py

nemo/deploy/deploy_base.py

titu1994 · 2024-03-26T19:17:44Z

nemo/deploy/deploy_pytriton.py

+from .deploy_base import DeployBase
+
+
+class DeployPyTriton(DeployBase):


The classname has insufficient context. Deploy Pytriton with which model? We are planning to deploy pytriton for streaming asr and tts too as new tools, so please call this class DeployPytritonLLM or something that denotes what it is deploying.

There is no need for a specific model. The goal is to support the deployment of any model. You can either pass a nemo ckpt using this param checkpoint_path or in memory model using this model param. The in memory model support will be added fully later though.

Hmm, so you're saying the same code for streaming will apply to both LLM and ASR? Then no need to act on this comment, but let's keep it un resolved so ASR team can look into it later

nemo/deploy/deploy_pytriton.py

nemo/export/utils.py

scripts/export/export_to_trt.py

tests/deploy/lambada.json

tests/deploy/test_nemo_deploy.py

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

for more information, see https://pre-commit.ci

oyilmaz-nvidia · 2024-04-02T18:00:12Z

@titu1994 Thanks for your review and all of the suggestions. Tried to address all of your comments. I agree with most of them and updated the code based on your suggestion. Some of them, I left comment. Can you please do another review? Would love to merge this asap because there are many tasks depend on this PR.

@ericharper Please let us know what you think as well.

tests/deploy/lambada.json

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

…/NeMo into oss-export-deploy

titu1994

Let's add these folders to ignore in setup.py, then the PR is ready to merge

nemo/export/trt_llm/decoder/__init__.py

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia · 2024-04-03T15:32:36Z

@titu1994 Added the deploy and the export to exclude package list here https://github.com/oyilmaz-nvidia/NeMo/blob/oss-export-deploy/setup.py#L237

titu1994

Looks good for now, thanks for all the changes !

titu1994 · 2024-04-03T20:36:09Z

Jenkins

ericharper · 2024-04-03T22:09:01Z

jenkins

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

JimmyZhang12 · 2024-04-03T17:37:04Z

nemo/export/trt_llm/nemo/convert.py

+
+# Similar to split_save_weight but done on GPU for performance
+@torch.no_grad()
+def save_weight_torch(tp_rank, saved_dir, split_factor, key, vals, storage_type, act_range, config):


We convert the nemo weights to TRTLLM format, and we have two functions to do this, one using numpy and one using torch, I wonder if these could share some code, instead of using a chain of if else statements the mapping of nemo names to TRTLLM names could be stored in some dict, and keys index into that. It would also make maintaining this code easier since nemo param names have been known to change.

JimmyZhang12 · 2024-04-03T17:43:16Z

nemo/export/trt_llm/tensor_utils.py

+    """Returns the tensor_parallel_group config based on tensor_parallel."""
+    from mpi4py import MPI
+
+    mpi_rank = MPI.COMM_WORLD.Get_rank()


could we use tensorrt_llm.mpi_rank() for self consistency?

nemo/export/trt_llm/tensorrt_llm_build.py

nemo/export/trt_llm/tensorrt_llm_model.py

nemo/export/trt_llm/utils.py

tests/infer_data_path.py

nemo/export/trt_llm/tensor_utils.py

JimmyZhang12 · 2024-04-04T00:35:51Z

nemo/export/trt_llm/nemo/nemo_ckpt_convert.py

+        index += 1
+
+
+def rename_key(old_key: str, pp_rank: int, num_layers: int, pp_size: int):


Why do need to rename keys here? Dont we already map the nemo param name to TRTLLM name inside split_and_save_weight()?

nemo/export/trt_llm/__init__.py

nemo/export/trt_llm/nemo/nemo_ckpt_convert.py

nemo/export/trt_llm/nemo_utils.py

tests/export/test_nemo_export.py

ericharper

Overall great PR, just some minor comments. Could you take a pass through the CodeQL, there's a lot of unused imports that need to be cleaned up.

Also, will you follow up with a PR for developer docs? It will be helpful for nemo developers that want to use these new modules.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

nemo/export/tensorrt_llm.py

nemo/export/trt_llm/tensorrt_llm_model.py

+        share_embedding_table = False
+        share_weight = None
+        if share_embedding_table:
+            share_weight = self.embedding.vocab_embedding.weight


nemo/export/trt_llm/nemo_utils.py

+import shutil
+import sys
+import tempfile
+import typing


nemo/export/trt_llm/nemo/nemo_ckpt_convert.py

+from pathlib import Path
+
+import numpy as np
+import tensorstore  # this is important even though not used


nemo/export/tensorrt_llm.py

+                )
+
+            return weights.cpu().detach()
+        return None


tests/export/test_nemo_export.py

+            (
+                trtllm_accuracy,
+                trtllm_accuracy_relaxed,
+                trtllm_deployed_accuracy,
+                trtllm_deployed_accuracy_relaxed,
+            ) = run_trt_llm_inference(
+                model_name=args.model_name,
+                model_type=args.model_type,
+                prompt=prompt_template,
+                checkpoint_path=args.checkpoint_dir,
+                trt_llm_model_dir=args.trt_llm_model_dir,
+                n_gpu=n_gpus,
+                max_batch_size=args.max_batch_size,
+                max_input_token=args.max_input_token,
+                max_output_token=args.max_output_token,
+                ptuning=args.ptuning,
+                p_tuning_checkpoint=args.p_tuning_checkpoint,
+                lora=args.lora,
+                lora_checkpoint=args.lora_checkpoint,
+                tp_size=args.tp_size,
+                pp_size=args.pp_size,
+                top_k=args.top_k,
+                top_p=args.top_p,
+                temperature=args.temperature,
+                run_accuracy=args.run_accuracy,
+                debug=args.debug,
+                streaming=args.streaming,
+                test_deployment=args.test_deployment,
+            )


nemo/export/trt_llm/utils.py

nemo/export/trt_llm/nemo/convert.py

nemo/export/__init__.py

+# except Exception as e:
+#    LOGGER.warning("TensorRTLLM could not be imported.")


Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia · 2024-04-05T00:26:38Z

jenkins

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

ericharper · 2024-04-05T01:19:10Z

jenkins

nemo/export/trt_llm/nemo/nemo_ckpt_convert.py

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

…/NeMo into oss-export-deploy

oyilmaz-nvidia · 2024-04-05T14:48:57Z

jenkins

ericharper · 2024-04-05T16:04:39Z

jenkins

oyilmaz-nvidia · 2024-04-05T17:37:13Z

jenkins

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia · 2024-04-05T19:55:52Z

jenkins

ericharper

LGTM. Thanks!

* export and deploy modules Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add export tests Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR reviews Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add try except Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Moved query_llm to nlp folder Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed lambada.json Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Reverting the Jenkinsfile Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Exclude deploy and export from the pip Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Address the CodeQL issues Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressing reviews Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * remove deploy test for now Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Addressing CodeQL comments Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * wrap imports with try except Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add test data param and fix codeql issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>

* export and deploy modules Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add export tests Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR reviews Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add try except Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Moved query_llm to nlp folder Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed lambada.json Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Reverting the Jenkinsfile Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Exclude deploy and export from the pip Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Address the CodeQL issues Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressing reviews Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * remove deploy test for now Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Addressing CodeQL comments Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * wrap imports with try except Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add test data param and fix codeql issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Ao Tang <aot@nvidia.com>

* export and deploy modules Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add export tests Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR reviews Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add try except Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Moved query_llm to nlp folder Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed lambada.json Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Reverting the Jenkinsfile Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Exclude deploy and export from the pip Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Address the CodeQL issues Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressing reviews Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * remove deploy test for now Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Addressing CodeQL comments Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * wrap imports with try except Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Add test data param and fix codeql issue Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>

oyilmaz-nvidia added 2 commits March 25, 2024 14:54

export and deploy modules

9926619

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Add export tests

e86e60e

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

github-actions bot added the CI label Mar 25, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

62c6cdd

for more information, see https://pre-commit.ci

titu1994 requested changes Mar 26, 2024

View reviewed changes

oyilmaz-nvidia marked this pull request as ready for review April 1, 2024 22:06

oyilmaz-nvidia and others added 5 commits April 2, 2024 11:32

Address PR reviews

ef938b2

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Merge branch 'main' into oss-export-deploy

2747fbe

Add try except

300c780

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Moved query_llm to nlp folder

cc8f98f

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6da56a6

for more information, see https://pre-commit.ci

titu1994 reviewed Apr 2, 2024

View reviewed changes

tests/deploy/lambada.json Outdated Show resolved Hide resolved

ericharper reviewed Apr 2, 2024

View reviewed changes

tests/deploy/lambada.json Outdated Show resolved Hide resolved

oyilmaz-nvidia added 2 commits April 2, 2024 14:16

removed lambada.json

0e11a3d

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Merge branch 'oss-export-deploy' of https://github.com/oyilmaz-nvidia…

083f901

…/NeMo into oss-export-deploy

titu1994 reviewed Apr 3, 2024

View reviewed changes

nemo/export/trt_llm/decoder/__init__.py Show resolved Hide resolved

oyilmaz-nvidia added 2 commits April 3, 2024 11:09

Reverting the Jenkinsfile

680ebee

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Exclude deploy and export from the pip

f22e9e0

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Merge branch 'main' into oss-export-deploy

648acb6

titu1994 previously approved these changes Apr 3, 2024

View reviewed changes

github-advanced-security bot found potential problems Apr 3, 2024

View reviewed changes

JimmyZhang12 reviewed Apr 4, 2024

View reviewed changes

ericharper reviewed Apr 4, 2024

View reviewed changes

nemo/export/trt_llm/__init__.py Outdated Show resolved Hide resolved

ericharper reviewed Apr 4, 2024

View reviewed changes

nemo/export/trt_llm/nemo/nemo_ckpt_convert.py Outdated Show resolved Hide resolved

ericharper reviewed Apr 4, 2024

View reviewed changes

nemo/export/trt_llm/nemo_utils.py Outdated Show resolved Hide resolved

ericharper reviewed Apr 4, 2024

View reviewed changes

nemo/export/trt_llm/nemo_utils.py Outdated Show resolved Hide resolved

ericharper reviewed Apr 4, 2024

View reviewed changes

tests/export/test_nemo_export.py Outdated Show resolved Hide resolved

ericharper reviewed Apr 4, 2024

View reviewed changes

tests/export/test_nemo_export.py Outdated Show resolved Hide resolved

ericharper reviewed Apr 4, 2024

View reviewed changes

Address the CodeQL issues

fb27b4e

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia dismissed titu1994’s stale review via fb27b4e April 4, 2024 18:06

pre-commit-ci bot and others added 3 commits April 4, 2024 18:08

[pre-commit.ci] auto fixes from pre-commit.com hooks

15effc8

for more information, see https://pre-commit.ci

Merge branch 'main' into oss-export-deploy

75d57b8

Addressing reviews

c495af5

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

github-advanced-security bot found potential problems Apr 4, 2024

View reviewed changes

remove deploy test for now

07e34c8

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

oyilmaz-nvidia and others added 2 commits April 4, 2024 20:26

Merge branch 'main' into oss-export-deploy

4217047

Addressing CodeQL comments

541838d

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

github-advanced-security bot found potential problems Apr 5, 2024

View reviewed changes

nemo/export/trt_llm/nemo/nemo_ckpt_convert.py Dismissed Show dismissed Hide dismissed

ericharper and others added 3 commits April 4, 2024 20:54

Merge branch 'main' into oss-export-deploy

fd7a99c

wrap imports with try except

310dc49

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

Merge branch 'oss-export-deploy' of https://github.com/oyilmaz-nvidia…

9f8ef15

…/NeMo into oss-export-deploy

Merge branch 'main' into oss-export-deploy

b201bf5

Add test data param and fix codeql issue

56fd156

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

ericharper approved these changes Apr 6, 2024

View reviewed changes

ericharper merged commit 97d1abb into NVIDIA:main Apr 6, 2024
10 checks passed

janekl mentioned this pull request Apr 9, 2024

Quantized checkpoint support in export and deploy modules #8859

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open source export and deploy modules #8743

Open source export and deploy modules #8743

oyilmaz-nvidia commented Mar 25, 2024

titu1994 left a comment

titu1994 Mar 26, 2024

oyilmaz-nvidia Apr 1, 2024

titu1994 Apr 1, 2024

oyilmaz-nvidia commented Apr 2, 2024

titu1994 left a comment

oyilmaz-nvidia commented Apr 3, 2024

titu1994 left a comment

titu1994 commented Apr 3, 2024

ericharper commented Apr 3, 2024

github-advanced-security bot left a comment

JimmyZhang12 Apr 3, 2024

JimmyZhang12 Apr 3, 2024

JimmyZhang12 Apr 4, 2024

ericharper left a comment •

edited

Loading

oyilmaz-nvidia commented Apr 5, 2024

ericharper commented Apr 5, 2024

oyilmaz-nvidia commented Apr 5, 2024

ericharper commented Apr 5, 2024

oyilmaz-nvidia commented Apr 5, 2024

oyilmaz-nvidia commented Apr 5, 2024

ericharper left a comment

		from .deploy_base import DeployBase


		class DeployPyTriton(DeployBase):

		index += 1


		def rename_key(old_key: str, pp_rank: int, num_layers: int, pp_size: int):

		# except Exception as e:
		# LOGGER.warning("TensorRTLLM could not be imported.")

Open source export and deploy modules #8743

Open source export and deploy modules #8743

Conversation

oyilmaz-nvidia commented Mar 25, 2024

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 Mar 26, 2024

Choose a reason for hiding this comment

oyilmaz-nvidia Apr 1, 2024

Choose a reason for hiding this comment

titu1994 Apr 1, 2024

Choose a reason for hiding this comment

oyilmaz-nvidia commented Apr 2, 2024

titu1994 left a comment

Choose a reason for hiding this comment

oyilmaz-nvidia commented Apr 3, 2024

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 commented Apr 3, 2024

ericharper commented Apr 3, 2024

github-advanced-security bot left a comment

Choose a reason for hiding this comment

JimmyZhang12 Apr 3, 2024

Choose a reason for hiding this comment

JimmyZhang12 Apr 3, 2024

Choose a reason for hiding this comment

JimmyZhang12 Apr 4, 2024

Choose a reason for hiding this comment

ericharper left a comment • edited Loading

Choose a reason for hiding this comment

oyilmaz-nvidia commented Apr 5, 2024

ericharper commented Apr 5, 2024

oyilmaz-nvidia commented Apr 5, 2024

ericharper commented Apr 5, 2024

oyilmaz-nvidia commented Apr 5, 2024

oyilmaz-nvidia commented Apr 5, 2024

ericharper left a comment

Choose a reason for hiding this comment

ericharper left a comment •

edited

Loading