-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding debug options to trtllm-build to visualize the TRT Network before Engine build #1238
Conversation
tensorrt_llm/commands/build.py
Outdated
@@ -139,7 +153,13 @@ def parse_arguments(): | |||
return args | |||
|
|||
|
|||
def build_model(model: PretrainedModel, build_config: BuildConfig) -> Engine: | |||
def build_model(rank: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build_model
is moved to builder.py
, and renamed to build
function.
We want to keep the interface of build
function to be stable,
def build(model: PretrainedModel, build_config: BuildConfig) -> Engine:
So, I suggest we add the dry_run
and visualize_network
as a field of BuildConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR to add dry_run
and visualize_network
to BuildConfig
as suggested.
tensorrt_llm/commands/build.py
Outdated
@@ -529,7 +564,8 @@ def main(): | |||
'weight_only_precision': args.weight_only_precision, | |||
} | |||
parallel_build(source, build_config, args.output_dir, workers, | |||
args.log_level, model_cls, **kwargs) | |||
args.log_level, model_cls, args.dry_run, | |||
args.visualize_network, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a **kwargs
field, so we can reuse it to aovid too much changes on different functions.
kwargs = {
'logits_dtype': args.logits_dtype,
'use_fused_mlp': args.use_fused_mlp,
'weight_only_precision': args.weight_only_precision,
'dry_run': args.dry_run,
'visualize_network': args.visualize_network,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer necessary since we added dry_run
and visualize_network
to BuildConfig
as suggested.
Hi @Lokiiiiii , Sorry for the later response. Thanks for submitting the MR and really appreciate your contributions to TensorRT-LLM. Could you please rebase the MR to the latest main branch? |
@QiJune Could you please review this again ? |
It LGTM now. We plan to integrate your contributions as part of our refinement work and when the work gets landed into the github, we will add you as the co-author and also acknowledge your efforts. |
@QiJune I noticed that this change did not land in TRT-LLM 0.9.0 release tag. Can you provide an ETA ? |
Hi @Lokiiiiii , thanks a lot for your contribution and support! We've merged your changes into the internal codebase, which will be included in the update to the GitHub main branch this week, and land in the next stable release. |
Close it since we've merged the changes. |
Overview
This PR adds 2 new flags to trtllm-build to support debugging.
--visualize-network
dumps the finalized TRT Network as SVG files for visual analysis.--dry-run
runs through all the steps except the Engine build and serialization which are typically the operations with the most overhead.When used together, the TRT Network is dumped in ~10 secs for llama2-70B.
Testing
These changes have been manually tested for a few configurations of llama2.
Unit Tests
I can add more unit tests to this PR and fix existing unit tests if this basic design is acceptable.