-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
building with triton support? #166
Comments
@Tobias-Fischer Are you aware of any quick example demonstrating the usage of triton? No worries if not, I will look upstream. I am now back to home base and can help debug this further |
The first code snippet is an easy example: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html |
Alright, it fails with
|
@h-vetinari and @hmaarrfk, just fyi. My current assessment is that we will likely need to wait for triton 2.x, then simply add it as run dep and see if things work out. I tried adding |
I just confirmed that this example works: https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html |
Ah, something surprising (?): It also works without installing triton, just by having a |
Made some progress - see conda-forge/triton-feedstock#6 The open question in my mind is still the circular dependency from both torch to triton and from triton to torch. Not sure how to best deal with it .. a |
Probably with a |
I keep running into: cannot find -lcuda: No such file or directory when using Any idea what might be going on? |
We need conda-forge/triton-feedstock#6 |
Bumping this since conda-forge/triton-feedstock#6 was merged. Thanks for the good work! |
Can we quickly test this? Just use the package we have and install triton. Do things magically work or do we need to more tweaks in this feedstock? |
Anecdotal experience, but pytorch and triton from conda-forge seem to pick up each other fine today (as in no segfaults when calling torch.compile with the inductor backend). I believe special care should be taken with dependency versions. Both triton and support for it in torch are really experimental and probably have quite narrow ranges of versions where they are supposed to work together. |
@RaulPPelaez good idea on 2.0.1. Would you be interested in submitting a PR? I could start one now... |
see #172 we could also include formal triton support in that one too |
Has anyone looked at this again recently? |
Here's what we've done at Anaconda for v2.3.0: https://github.com/AnacondaRecipes/triton-feedstock There are some notes in the meta.yaml about various choices we made. Let me know if you've got any questions. |
I'm going to try making a new pull request for 3.1.0, as that's the version required by PyTorch 2.5.1. @danpetry, thanks. Curious enough, I've just tried diffing the PyPI 3.1.0 package against the one provided by PyTorch, and — at least as far as |
I think pytorch call(ed) their conda package torchtriton too? |
I probably need to re-check the logic of that statement, but that's the conclusion I came to when I worked on it, at least I wanted to be cautious and say, "this should not be used except with pytorch, with which it has been explicitly integration tested" |
It uses llvm at runtime, rather than build time |
The issue was that triton did not have wheels and even when it did it took time to get releases in. There was also an issue with rocm support not getting merged. All of these have been resolved I think. |
ok, so we can now use the wheels rather than the git repo to build? IIRC this wasn't possible |
In conda packaging? We don't want to use pre-compiled wheels in conda-build. |
I've pushed my WIP to conda-forge/triton-feedstock#26. |
Worth bearing in mind that LLVM is only used by triton at runtime, to compile cuda kernels. And in the end, the binary format I guess is determined by the cuda compiler rather than llvm. So, keeping it vendored-in isn't an issue so far as compatibility with the rest of the distro is concerned, I think. |
I don't know if anyone else can confirm this? |
I think the bigger issue here is that triton either downloads a prebuilt LLVM version if it detects a supported platform, or uses system LLVM (expecting this specific commit) when it doesn't. |
Cc @amjames. Andrew, you had some useful insights into the PyTorch -> Triton -> LLVM coupling, so you may be interested in this topic and in conda-forge/triton-feedstock#26. Making Triton compatible with a proper LLVM release can be very useful for Conda-forge (and probably other distros as well). conda-forge/triton-feedstock#26 (comment) summarizes how this was achieved for Triton 3.1.0 with the LLVM 19 release. Some manual testing seems to confirm success at the "build and seems to compile stuff with nvcc" level - perhaps you have some suggestions into what subset of the PyTorch test suite to run to confirm that PyTorch + Triton works as designed? |
There's a smoke test which tests torch.compile with cuda (if an environment variable is appropriately set) |
Thanks. Looks like I was wrong and some patching is necessary for regular CC to be able to find CUDA headers:
|
@rgommers Thanks!
What kind of time-limits are we working with? Full coverage is probably a non-starter, but the inductor tests will be the best place to focus on. I would start with:
Some of these tests will require a specific GPU like A100, but in general the tests should be annotated to skip if they have special requirements like that which are not met. |
The plot thickens. After fixing the path errors, I'm getting:
My guess would be that it doesn't support CUDA 12.6 for some reason. But 12.0 is too old, and I don't think we can cleanly do 12.5 just for Triton without changing PyTorch. Will try to figure out a good solution tomorrow, but I'd appreciate any hints. |
Are you sure it’s not picking up your system CUDA (12.5?) and then competing against conda CUDA (12.6?) |
It might be a problem that the feedstock uses |
Actually, it turns out we need one more upstream patch for CUDA 12.6 support. But I also need to fix path search, I'll make a pull request later.
Hmm, is that actually wrong? I didn't think |
AFAIU But Maybe it's to do with the "conda-build" conditional in the link above, so the I/L flags aren't being added? |
Already solved via conda-forge/triton-feedstock#28. Just wondering if I should update the dependencies while at it. |
@mgorny - does any more work need to be done here? |
The remaining question is: do we want to add a requirement from PyTorch to Triton, so that it's pulled in automatically? Unless I'm mistaken, Conda doesn't have "recommends" kind of requirements, so it probably have to be unconditional (and then it would create a cycle). |
On PyPI,
That would be desirable, but only for |
Well, at least of
Yes, that should be fine. |
There's a bunch of So it's optional at most. It's also possible to use Triton with JAX: https://jax.readthedocs.io/en/latest/pallas/design/design.html#lowering-pallas-to-triton-for-gpu. So it looks to me like the dependency in the triton feedstock should be removed. I don't know why it was added in the first place though. |
Likely for the tests. I agree it can be removed! Thanks looking into it! |
Okay, I'll look into it tomorrow. I also think I'm ready to give rattler-build in pytorch another try! |
Lets try to give the windows folk a break on the rebasing. They seem pretty close, and the new resources in the form of CIs make it finally a possibility. Given that pytorch build times are dominated by mathematical compilation, I don't see much of an advantage in working on this immediately |
Per the discussion on pytorch-cpu-feedstock: conda-forge/pytorch-cpu-feedstock#166 (comment) (and below)
Sure, I'll wait.
Well, just for the record, it would make my local dev builds faster, since (with an up-to-date ccache available) conda-build is actually being a bottleneck :-). |
Interesting..... I know your working on alot, but I would be willing to learn if you can document your process of getting ccache working with conda-forge's isolated builds! My strategy is currently to follow a variant of conda-forge/tensorflow-feedstock#360 |
Right, I suppose it makes sense to add it to the README as well. Basically, I'm running Oh, I'm passing
I'll document that when I update some pull request next. |
Originally posted by @Tobias-Fischer in #165 (comment)
--
More background: #151
The text was updated successfully, but these errors were encountered: