-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nvfuser] view size is not compatible with input tensor's size and stride #93648
Comments
nvfuser is returning a contiguous tensor while eager mode creates a strided tensor. Here is standalone repro:
cc @jjsjann123 |
yeah... this is expected behavior. A note is that we do intend to improve memory format support after we wrap up our transpose scheduler. Currently we don't promise any memory format on output tensors, because the lack of performance transpose scheduler, permutation on output could bomb our kernel perf. We intended to improve it by following TS profiled output memory format. I don't currently have a timeline for that, but the PR for permutation scheduler is getting there. csarofeen#1927. We'll still need to plumb it through after that PR. Likely separately for TS integration and nvprim. |
Just to clarify.
hmmm... looking at this example, I'm not sure how things get messed up with nvfuser. We are not fusing transpose, and our pointwise memory format handling should be pretty close to what eager has... But we also have We can take a look at the TS graph passed to nvfuser and figure out what went wrong there. |
our new attitude is that backends MUST match original eager striding exactly... at least until @dagitses finishes stride agnostic pytorch. |
Is that currently a WIP? Sounds like a pretty big endeavor, do we have any guesstimation on delivery time? |
it's a big endeavor, no estimate on timing |
In the meantime, the existing tracing mechanisms (mostly) correctly propagate strides (a few quirks here and there, mostly for the ambiguous strides of tensors with 0 elements or size-1 dimensions), and those can be queried on the fx graph nodes, so backends don't need to develop their own systems of stride propagation, they only need to query and match the strides that tracer gives them at the boundaries of the compiled region. |
It's difficult to come up with a repro. Label it as an inductor issue since it's specific to backend. |
FYI, nvfuser has been removed from TorchScript. we should close this issue. I see an |
Instructions
python repro/repro.py
The repro is minified using pytorch/torchdynamo#1056
This only fail with
aot_nvfuser
backend.cc @ezyang @msaroufim @wconstab @bdhirsh @zou3519 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @soumith @ngimel
The text was updated successfully, but these errors were encountered: