Skip to content

fix: Error with aten.view across Tensor memory #2464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 27, 2023

Conversation

gs-olive
Copy link
Collaborator

@gs-olive gs-olive commented Nov 14, 2023

Description

  • Address error where aten.view is called on TRT output Tensors, which can be in a different memory format than Torch expects
  • Specifically, TRT can modify tensor memory to optimize certain layers, but Torch's view operator depends on specific configurations which can be violated at runtime (but not at compile time, since Torch itself would run these configurations correctly)
  • Add a custom lowering pass to replace view with reshape, avoiding this issue. Reshape will make a copy of the underlying Tensor if necessary
  • Torch-TRT's aten.view implementation is the same as that for aten.reshape, and they share a schema so no changes are needed on the converter side
  • Add test case to validate new lowering pass

Error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Addresses #2415

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [ x ] My code follows the style guidelines of this project (You can use the linters)
  • [ x ] I have performed a self-review of my own code
  • [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ x ] I have made corresponding changes to the documentation
  • [ x ] I have added tests to verify my fix or my feature
  • [ x ] New and existing unit tests pass locally with my changes
  • [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

@gs-olive gs-olive self-assigned this Nov 14, 2023
@github-actions github-actions bot added component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests labels Nov 14, 2023
@gs-olive gs-olive added priority: high and removed component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Nov 14, 2023
- Address error where `aten.view` is called on TRT output Tensors, which
can be in a different memory format than Torch expects
- Specifically, TRT can modify tensor memory to optimize certain layers,
but Torch's view operator depends on specific configurations which can
be violated at runtime (but not at compile time, since Torch itself
would run these configurations correctly)
- Add a custom lowering pass to replace `view` with `reshape`, avoiding
this issue. Reshape will make a copy of the underlying Tensor if
necessary
- Torch-TRT's `aten.view` implementation is the same as that for
`aten.reshape`, and they share a schema so no changes are needed on the
converter side
- Add test case to validate new lowering pass
@github-actions github-actions bot added component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests labels Nov 14, 2023
@gs-olive gs-olive requested a review from zewenli98 November 14, 2023 22:53
Copy link
Collaborator

@zewenli98 zewenli98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me. Just curious, is this problem that Pytorch would do something after converting ops or after creating TRT Engines (because I think aten.reshape.default and aten.view.default are exactly same)?

@gs-olive
Copy link
Collaborator Author

gs-olive commented Nov 15, 2023

Thanks for the review @zewenli98 - the main issue here is that when an aten.view.default operator falls into a PyTorch segment due to partitioning restrictions (like a high min_block_size), the requirements for the memory layout of the input tensor can be invalidated by TensorRT. Specifically, the PyTorch aten.view op can make some assumptions about the memory format (contiguous, strided, etc.) based on the flow of the tensor in the graph preceding it, but when that graph gets run in TensorRT, those assumptions can be invalidated, causing the error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

In that sense, this lowering pass is much more for PyTorch (and fallback graphs) than it is for TensorRT

@zewenli98
Copy link
Collaborator

Aha I see, so does this mean we can omit view because reshape can deal with more scenarios than view? or say whether the scenarios where view can be used could replace with reshape?

@gs-olive
Copy link
Collaborator Author

@zewenli98 - I believe the cases that aten.view can handle are a strict subset of those aten.reshape can handle, because reshape should be copying the Tensor memory only if needed (otherwise, should behave like aten.view)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests priority: high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants