-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No perf advantage for torch.compile on examples from pytorch tutorial #1721
Comments
I am not getting the same results (latest llvm-target branch, LTS driver, and pytorch/pytorch@75f64e1):
Perhaps there is some logging we can enable to find the difference? Can you try running with I did not have time to rebuild pytorch now but I can also try that pytorch commit you used, though at first glance mine is much older. |
@alexbaden : you did not reproduce my eager mode results, but your torch.compile results are similar to what I have. Your pytorch version is very old and I think eager mode simply falls back to CPU on some aten ops (silently because you are also missing intel/torch-xpu-ops#318). You are missing at least the following torch-xpu-ops updates which implemented a lot of aten ops:
Update fyi: I tried pytorch/pytorch@75f64e1 + PR318. The following eager aten ops fall to cpu: |
Got it, that makes sense. Let me update PyTorch to latest main and try again. |
See #1770 for potential fix. |
I am trying pytorch tutorial for
torch.compile()
: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html#demonstrating-speedups adopting it forxpu
backend bys/cuda/xpu
. Using pytorch/pytorch@f063027. Tutorial has performance examples demonstratingtorch.compile
advantage over eager mode for Nvidia. Unfortunately I don't observe similar benefits for xpu -torch.compile
runs with similar speed as eager mode. Are there any optimization currently missing for XPU affecting these tutorials? This occurs for both examples in tutorial: for inference and for training.Results (inference):
Script (inference):
Note that I did
def timed
implementation in tutorial to measure e2e time due to pytorch/pytorch#131840. Also note that I did try apply pytorch/pytorch#126456 - this did not change performance results for XPU backend.The text was updated successfully, but these errors were encountered: