fix: Error with `aten.view` across Tensor memory #2464

gs-olive · 2023-11-14T04:33:13Z

Description

Address error where aten.view is called on TRT output Tensors, which can be in a different memory format than Torch expects
Specifically, TRT can modify tensor memory to optimize certain layers, but Torch's view operator depends on specific configurations which can be violated at runtime (but not at compile time, since Torch itself would run these configurations correctly)
Add a custom lowering pass to replace view with reshape, avoiding this issue. Reshape will make a copy of the underlying Tensor if necessary
Torch-TRT's aten.view implementation is the same as that for aten.reshape, and they share a schema so no changes are needed on the converter side
Add test case to validate new lowering pass

Error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Addresses #2415

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

[ x ] My code follows the style guidelines of this project (You can use the linters)
[ x ] I have performed a self-review of my own code
[ x ] I have commented my code, particularly in hard-to-understand areas and hacks
[ x ] I have made corresponding changes to the documentation
[ x ] I have added tests to verify my fix or my feature
[ x ] New and existing unit tests pass locally with my changes
[ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

- Address error where `aten.view` is called on TRT output Tensors, which can be in a different memory format than Torch expects - Specifically, TRT can modify tensor memory to optimize certain layers, but Torch's view operator depends on specific configurations which can be violated at runtime (but not at compile time, since Torch itself would run these configurations correctly) - Add a custom lowering pass to replace `view` with `reshape`, avoiding this issue. Reshape will make a copy of the underlying Tensor if necessary - Torch-TRT's `aten.view` implementation is the same as that for `aten.reshape`, and they share a schema so no changes are needed on the converter side - Add test case to validate new lowering pass

zewenli98

This PR looks good to me. Just curious, is this problem that Pytorch would do something after converting ops or after creating TRT Engines (because I think aten.reshape.default and aten.view.default are exactly same)?

gs-olive · 2023-11-15T05:27:38Z

Thanks for the review @zewenli98 - the main issue here is that when an aten.view.default operator falls into a PyTorch segment due to partitioning restrictions (like a high min_block_size), the requirements for the memory layout of the input tensor can be invalidated by TensorRT. Specifically, the PyTorch aten.view op can make some assumptions about the memory format (contiguous, strided, etc.) based on the flow of the tensor in the graph preceding it, but when that graph gets run in TensorRT, those assumptions can be invalidated, causing the error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

In that sense, this lowering pass is much more for PyTorch (and fallback graphs) than it is for TensorRT

zewenli98 · 2023-11-16T01:29:02Z

Aha I see, so does this mean we can omit view because reshape can deal with more scenarios than view? or say whether the scenarios where view can be used could replace with reshape?

gs-olive · 2023-11-16T02:00:25Z

@zewenli98 - I believe the cases that aten.view can handle are a strict subset of those aten.reshape can handle, because reshape should be copying the Tensor memory only if needed (otherwise, should behave like aten.view)

gs-olive self-assigned this Nov 14, 2023

facebook-github-bot added the cla signed label Nov 14, 2023

github-actions bot added component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests labels Nov 14, 2023

gs-olive added priority: high and removed component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Nov 14, 2023

gs-olive force-pushed the aten_view_bugfix branch from 9d3248a to 7f88494 Compare November 14, 2023 18:04

github-actions bot added component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests labels Nov 14, 2023

gs-olive requested a review from zewenli98 November 14, 2023 22:53

zewenli98 approved these changes Nov 15, 2023

View reviewed changes

gs-olive merged commit 60bfa04 into pytorch:main Nov 27, 2023

gs-olive deleted the aten_view_bugfix branch November 27, 2023 18:29

gs-olive added a commit that referenced this pull request Nov 29, 2023

fix: Error with aten.view across Tensor memory (#2464)

ad6fa22

gs-olive mentioned this pull request Nov 29, 2023

cherry-pick: View and slice bugfixes #2500

Merged

This was referenced Dec 13, 2023

❓ [Question] The stable diffusion example doesn't work #2530

Closed

🐛 [Bug] Issue with remove_ops lowering pass in FX/Dynamo #2036

Closed

peri044 mentioned this pull request Dec 12, 2024

🐛 [Bug] view_to_reshape metadata mismatch #3221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Error with `aten.view` across Tensor memory #2464

fix: Error with `aten.view` across Tensor memory #2464

Uh oh!

gs-olive commented Nov 14, 2023 •

edited

Loading

Uh oh!

zewenli98 left a comment

Uh oh!

gs-olive commented Nov 15, 2023 •

edited

Loading

Uh oh!

zewenli98 commented Nov 16, 2023

Uh oh!

gs-olive commented Nov 16, 2023

Uh oh!

Uh oh!

fix: Error with aten.view across Tensor memory #2464

fix: Error with aten.view across Tensor memory #2464

Uh oh!

Conversation

gs-olive commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

zewenli98 left a comment

Choose a reason for hiding this comment

Uh oh!

gs-olive commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zewenli98 commented Nov 16, 2023

Uh oh!

gs-olive commented Nov 16, 2023

Uh oh!

Uh oh!

fix: Error with `aten.view` across Tensor memory #2464

fix: Error with `aten.view` across Tensor memory #2464

gs-olive commented Nov 14, 2023 •

edited

Loading

gs-olive commented Nov 15, 2023 •

edited

Loading