feat: Improve Dynamo partitioning System Performance on Large Models #2175

gs-olive · 2023-08-04T17:40:41Z

Description

Problem Context

The Dynamo partitioning system was very slow for large models (>1000 Nodes) with segmentation. The existing partitioner was using an exhaustive partitioning mechanism which was more than quadratic in the number of nodes, and worsened with more segmentation. This new system uses a simpler adjacency-based partitioning system which is much more performant on large models.

Upgrade Dynamo partitioning to use a custom version of the Torch _SplitterBase for efficiency and optimized usage in the Dynamo case
Validate existing use cases are still functional, with the same partitioning schema as before
Upgrade qualified name checking

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ - ] I have added tests to verify my fix or my feature
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

py/torch_tensorrt/dynamo/lowering/_partition_old.py

py/torch_tensorrt/dynamo/lowering/_partition.py

py/torch_tensorrt/dynamo/partitioning/__init__.py

py/torch_tensorrt/dynamo/compile.py

gs-olive · 2023-08-08T21:23:34Z

Inform user if none of the nodes are supported/no valid partitions
Automatically fall back to global partitioning if adjacency/fast partitioning fails
- Alert user via warning
- Show trace in debug logs

- Upgrade Dynamo partitioning to use a custom version of the Torch _SplitterBase for efficiency and optimized usage in the Dynamo case - Validate existing use cases are still functional, with the same partitioning schema as before - Upgrade qualified name checking - Update testing for new partitioner - Add new directory to store available partitioners

- Fall back to global partitioner if fast partitioner fails

peri044

LGTM

gs-olive requested review from narendasan and peri044 August 4, 2023 17:40

gs-olive self-assigned this Aug 4, 2023

facebook-github-bot added the cla signed label Aug 4, 2023

github-actions bot added component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: lowering Issues re: The lowering / preprocessing passes component: torch_compile labels Aug 4, 2023

gs-olive commented Aug 4, 2023

View reviewed changes

py/torch_tensorrt/dynamo/lowering/_partition_old.py Show resolved Hide resolved

narendasan reviewed Aug 4, 2023

View reviewed changes

py/torch_tensorrt/dynamo/lowering/_partition.py Outdated Show resolved Hide resolved

gs-olive commented Aug 4, 2023

View reviewed changes

py/torch_tensorrt/dynamo/lowering/_partition.py Show resolved Hide resolved

gs-olive force-pushed the dynamo_partitioning_perf_improvement branch from 6cfbb59 to f5e8dff Compare August 4, 2023 20:16

github-actions bot added the component: tests Issues re: Tests label Aug 4, 2023

gs-olive force-pushed the dynamo_partitioning_perf_improvement branch 2 times, most recently from ca94dca to 292f5ce Compare August 4, 2023 23:18

github-actions bot added the component: build system Issues re: Build system label Aug 4, 2023

gs-olive force-pushed the dynamo_partitioning_perf_improvement branch 4 times, most recently from bd0b0c5 to bbf514f Compare August 7, 2023 16:51

gs-olive requested a review from narendasan August 7, 2023 19:02

gs-olive commented Aug 7, 2023

View reviewed changes

py/torch_tensorrt/dynamo/partitioning/__init__.py Show resolved Hide resolved

gs-olive mentioned this pull request Aug 8, 2023

Explore stable diffusion support in Torch-TRT #1545

Closed

gs-olive commented Aug 8, 2023

View reviewed changes

py/torch_tensorrt/dynamo/compile.py Outdated Show resolved Hide resolved

gs-olive added 2 commits August 8, 2023 16:51

feat: Add feature to toggle partitioner

18c0680

gs-olive force-pushed the dynamo_partitioning_perf_improvement branch from bbf514f to ffed9d6 Compare August 9, 2023 01:18

gs-olive added 2 commits August 8, 2023 18:26

fix: Linting and formatting updates

cb57850

feat: Inform user if no valid partitions

631b7b7

- Fall back to global partitioner if fast partitioner fails

gs-olive force-pushed the dynamo_partitioning_perf_improvement branch from ffed9d6 to 631b7b7 Compare August 9, 2023 01:26

fix: Remove unused code in Dynamo compile

734dce1

peri044 approved these changes Aug 15, 2023

View reviewed changes

gs-olive merged commit b57d83e into pytorch:main Aug 15, 2023

gs-olive deleted the dynamo_partitioning_perf_improvement branch August 15, 2023 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Improve Dynamo partitioning System Performance on Large Models #2175

feat: Improve Dynamo partitioning System Performance on Large Models #2175

Uh oh!

gs-olive commented Aug 4, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gs-olive commented Aug 8, 2023

Uh oh!

peri044 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Improve Dynamo partitioning System Performance on Large Models #2175

feat: Improve Dynamo partitioning System Performance on Large Models #2175

Uh oh!

Conversation

gs-olive commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem Context

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gs-olive commented Aug 8, 2023

Uh oh!

peri044 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gs-olive commented Aug 4, 2023 •

edited

Loading