Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open source export and deploy modules #8743

Merged
merged 25 commits into from
Apr 6, 2024

Conversation

oyilmaz-nvidia
Copy link
Collaborator

What does this PR do ?

Open sourcing the export and deploy module that exports a nemo checkpoint to TensorRT-LLM and serves it using PyTriton.

Collection: [None]

Changelog

  • Added export and deploy folders under the main folder.

Usage

  • You can potentially add a usage example below
# A simple example is as follows. 

# Run the following on the server side

from nemo.export import TensorRTLLM
from nemo.deploy import DeployPyTriton

# To export to TRT-LLM

trt_llm_exporter = TensorRTLLM(model_dir="/path/to/trt_llm_engine_folder")
trt_llm_exporter.export(
    nemo_checkpoint_path="/path/to/nemo/ckpt.nemo",
    model_type="model_type",
    n_gpus=1,
)

# To start serving on Triton server

nm = DeployPyTriton(model=trt_llm_exporter, triton_model_name="LLM")
nm.deploy()
nm.run()

# To query the model, please run the following on a the client side

from nemo.deploy import NemoQuery

nq = NemoQuery(url="localhost:8000", model_name="LLM")
output = nq.query_llm(
    prompts=["what is the color of a banana?"], , top_k=1, top_p=0.0, temperature=1.0,
)

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@github-actions github-actions bot added the CI label Mar 25, 2024
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some major and minor comments that require addressing, however the PR overall is very well done.

nemo/deploy/__init__.py Outdated Show resolved Hide resolved
nemo/deploy/deploy_base.py Show resolved Hide resolved
nemo/deploy/deploy_base.py Show resolved Hide resolved
from .deploy_base import DeployBase


class DeployPyTriton(DeployBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classname has insufficient context. Deploy Pytriton with which model? We are planning to deploy pytriton for streaming asr and tts too as new tools, so please call this class DeployPytritonLLM or something that denotes what it is deploying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for a specific model. The goal is to support the deployment of any model. You can either pass a nemo ckpt using this param checkpoint_path or in memory model using this model param. The in memory model support will be added fully later though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so you're saying the same code for streaming will apply to both LLM and ASR? Then no need to act on this comment, but let's keep it un resolved so ASR team can look into it later

nemo/deploy/deploy_pytriton.py Show resolved Hide resolved
nemo/export/utils.py Outdated Show resolved Hide resolved
scripts/export/export_to_trt.py Outdated Show resolved Hide resolved
scripts/export/export_to_trt.py Outdated Show resolved Hide resolved
tests/deploy/lambada.json Outdated Show resolved Hide resolved
tests/deploy/test_nemo_deploy.py Outdated Show resolved Hide resolved
@oyilmaz-nvidia oyilmaz-nvidia marked this pull request as ready for review April 1, 2024 22:06
oyilmaz-nvidia and others added 5 commits April 2, 2024 11:32
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Collaborator Author

@titu1994 Thanks for your review and all of the suggestions. Tried to address all of your comments. I agree with most of them and updated the code based on your suggestion. Some of them, I left comment. Can you please do another review? Would love to merge this asap because there are many tasks depend on this PR.

@ericharper Please let us know what you think as well.

tests/deploy/lambada.json Outdated Show resolved Hide resolved
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add these folders to ignore in setup.py, then the PR is ready to merge

nemo/export/trt_llm/decoder/__init__.py Show resolved Hide resolved
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Collaborator Author

@titu1994 Added the deploy and the export to exclude package list here https://github.com/oyilmaz-nvidia/NeMo/blob/oss-export-deploy/setup.py#L237

titu1994
titu1994 previously approved these changes Apr 3, 2024
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for now, thanks for all the changes !

@titu1994
Copy link
Collaborator

titu1994 commented Apr 3, 2024

Jenkins

@ericharper
Copy link
Collaborator

jenkins

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.


# Similar to split_save_weight but done on GPU for performance
@torch.no_grad()
def save_weight_torch(tp_rank, saved_dir, split_factor, key, vals, storage_type, act_range, config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We convert the nemo weights to TRTLLM format, and we have two functions to do this, one using numpy and one using torch, I wonder if these could share some code, instead of using a chain of if else statements the mapping of nemo names to TRTLLM names could be stored in some dict, and keys index into that. It would also make maintaining this code easier since nemo param names have been known to change.

"""Returns the tensor_parallel_group config based on tensor_parallel."""
from mpi4py import MPI

mpi_rank = MPI.COMM_WORLD.Get_rank()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use tensorrt_llm.mpi_rank() for self consistency?

nemo/export/trt_llm/tensorrt_llm_build.py Outdated Show resolved Hide resolved
nemo/export/trt_llm/tensorrt_llm_model.py Show resolved Hide resolved
nemo/export/trt_llm/utils.py Outdated Show resolved Hide resolved
tests/infer_data_path.py Show resolved Hide resolved
nemo/export/trt_llm/tensor_utils.py Outdated Show resolved Hide resolved
index += 1


def rename_key(old_key: str, pp_rank: int, num_layers: int, pp_size: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do need to rename keys here? Dont we already map the nemo param name to TRTLLM name inside split_and_save_weight()?

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great PR, just some minor comments. Could you take a pass through the CodeQL, there's a lot of unused imports that need to be cleaned up.

Also, will you follow up with a PR for developer docs? It will be helpful for nemo developers that want to use these new modules.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
nemo/export/tensorrt_llm.py Dismissed Show resolved Hide resolved
share_embedding_table = False
share_weight = None
if share_embedding_table:
share_weight = self.embedding.vocab_embedding.weight

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.
import shutil
import sys
import tempfile
import typing

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'typing' is imported with both 'import' and 'import from'.
from pathlib import Path

import numpy as np
import tensorstore # this is important even though not used

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'tensorstore' is not used.
)

return weights.cpu().detach()
return None

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.
tests/export/test_nemo_export.py Fixed Show resolved Hide resolved
Comment on lines 472 to 500
(
trtllm_accuracy,
trtllm_accuracy_relaxed,
trtllm_deployed_accuracy,
trtllm_deployed_accuracy_relaxed,
) = run_trt_llm_inference(
model_name=args.model_name,
model_type=args.model_type,
prompt=prompt_template,
checkpoint_path=args.checkpoint_dir,
trt_llm_model_dir=args.trt_llm_model_dir,
n_gpu=n_gpus,
max_batch_size=args.max_batch_size,
max_input_token=args.max_input_token,
max_output_token=args.max_output_token,
ptuning=args.ptuning,
p_tuning_checkpoint=args.p_tuning_checkpoint,
lora=args.lora,
lora_checkpoint=args.lora_checkpoint,
tp_size=args.tp_size,
pp_size=args.pp_size,
top_k=args.top_k,
top_p=args.top_p,
temperature=args.temperature,
run_accuracy=args.run_accuracy,
debug=args.debug,
streaming=args.streaming,
test_deployment=args.test_deployment,
)

Check failure

Code scanning / CodeQL

Mismatch in multiple assignment Error test

Left hand side of assignment contains 4 variables, but right hand side is a tuple of length 2.
nemo/export/trt_llm/utils.py Dismissed Show resolved Hide resolved
nemo/export/trt_llm/nemo/convert.py Fixed Show resolved Hide resolved
Comment on lines 25 to 26
# except Exception as e:
# LOGGER.warning("TensorRTLLM could not be imported.")

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Collaborator Author

jenkins

oyilmaz-nvidia and others added 2 commits April 4, 2024 20:26
@ericharper
Copy link
Collaborator

jenkins

nemo/export/trt_llm/nemo/nemo_ckpt_convert.py Dismissed Show dismissed Hide dismissed
@oyilmaz-nvidia
Copy link
Collaborator Author

jenkins

1 similar comment
@ericharper
Copy link
Collaborator

jenkins

@oyilmaz-nvidia
Copy link
Collaborator Author

jenkins

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Collaborator Author

jenkins

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper ericharper merged commit 97d1abb into NVIDIA:main Apr 6, 2024
10 checks passed
anmolgupt pushed a commit to anmolgupt/NeMo that referenced this pull request Apr 11, 2024
* export and deploy modules

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add export tests

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address PR reviews

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add try except

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Moved query_llm to nlp folder

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed lambada.json

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Reverting the Jenkinsfile

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Exclude deploy and export from the pip

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Address the CodeQL issues

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing reviews

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* remove deploy test for now

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Addressing CodeQL comments

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* wrap imports with try except

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add test data param and fix codeql issue

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
* export and deploy modules

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add export tests

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address PR reviews

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add try except

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Moved query_llm to nlp folder

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed lambada.json

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Reverting the Jenkinsfile

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Exclude deploy and export from the pip

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Address the CodeQL issues

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing reviews

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* remove deploy test for now

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Addressing CodeQL comments

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* wrap imports with try except

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add test data param and fix codeql issue

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* export and deploy modules

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add export tests

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address PR reviews

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add try except

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Moved query_llm to nlp folder

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed lambada.json

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Reverting the Jenkinsfile

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Exclude deploy and export from the pip

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Address the CodeQL issues

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing reviews

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* remove deploy test for now

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Addressing CodeQL comments

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* wrap imports with try except

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

* Add test data param and fix codeql issue

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants