-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: yet another attempt to add windows builds #231
base: main
Are you sure you want to change the base?
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe:
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipe:
For recipe:
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe:
|
Both pipelines failed because they ran out of disk space:
What would be the most idiomatic way to solve this issue? |
Try following https://conda-forge.org/docs/maintainer/conda_forge_yml/#azure to clear some disk space. Set this in
and then rerender the feedstock. |
I think there’s little we can do - the Azure free disk space setting is already enabled. I’d try and see if these build locally. Perhaps there is a way to use the Quansight servers for Windows as well, the same way they are used for Linux builds? If not, I guess if there are some volunteers to build these locally then this would be an option - I did that for aarch64 for a while for qt. Conda-forge has a windows server too, but disk space has always been quite restricted there too so it might be a bit of a pain. |
Perhaps cross-compiling Windows from Linux is worth trying? Here is a different feedstock PR that does this ( conda-forge/polars-feedstock#187 ) If we were to use Quansight resources for Windows, being able to run the build on Linux (so cross-compiling) would be very helpful |
Sadly thats already set: pytorch-cpu-feedstock/conda-forge.yml Line 2 in 9e99e03
I assume you mean the runners provided through open-gpu-server by Quantsight and MetroStar? This PR only build the cpu-only version but if we also start building for Cuda I think this is the only possible way forward (let alone for other related repositories like tensorflow). However, the open-gpu-servers don't seem to provide any Windows images. Do you know who I should contact to get the ball rolling?
That would be an option but Id prefer to automate and open-source things as much as possible. Having something hooked up to this repository would be ideal.
The native code of the example you linked is using Rust which makes this much easier. I doubt that this would be easy to achieve with pytorch. |
I also expect another error when actual linking starts. On my local machine that takes at least 16GB of memory. The cuda version will mostly require more. |
If we don't try, we won't know |
Although that is technically true, its already hard enough to build pytorch natively. Adding cross-compilation in the mix seems to me to complicate this even further. Id much rather first focus on getting native builds working. Even if we need to modify the infrastructure to do so. I think having the ability to do resource intensive windows builds would be a huge benefit for the conda-forge ecosystem in general. However, if all else fails cross-compiling seems like a worthwhile avenue to explore. |
One thing to try is to move the build from
You should have roughly 70 GB free on |
Thanks! I added that to the PR. I quickly searched github and it seems c:\bld\ is used more often so I tried that. |
Just make sure that the directory exists and is writeable. Also, you need to rerender for the variable to be set. This comment should trigger the bot. @conda-forge-admin, please rerender |
A bit of history. Back when this feedstock was created 6 years ago, the pytorch officially suggested that people install two distinct packages I personally feel like for windows users, we would HURT their experience to not have a GPU package in 2024. |
Couldnt agree more. I started with CPU only to be able to make incremental progression. My goal is definitely to be able to build the cuda version too! |
well few things:
Typically we "stop" the compilation on the CIs when we reach your stage (seems like it is working OK enough...). |
Hi @baszalmstra @hmaarrfk - do you have any updates on this? It would be amazing to see this happen :)! |
@Tobias-Fischer Im still working on the Cuda builds but its a slow process because it takes ages to build them locally so iteration times are suuuper slow. In parallel we are also looking into getting large Windows runners into the conda-forge infrastructure. |
I got to the testing stage and noticed this: pytorch-cpu-feedstock/recipe/meta.yaml Line 300 in f9fd731
However this seems to always fail with (this is from the logs of the latest release):
Some dependencies are missing. Particularly:
(as can be seen here https://github.com/pytorch/pytorch/blob/6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0/test/run_test.py#L1705) Given that the test is allowed to fail (due to |
The more we fix, the better. If it's really a lot of failures, we might not fix it right away (though depending on the severity of the failures, we might want to think twice about releasing something in that state). In any case, let's leave the testing in, add the required dependencies, and pick up as many fixes as we can. |
Looks like we're getting straigh-up segfaults now:
|
I don't know how segfaults exhibit on Windows, but on Linux "Aborted" usually means some failed assertions. You can then use |
Pytest should produce regular reporting for failed tests. The "aborted" is an irregular shutdown |
Co-authored-by: Michał Górny <mgorny@gentoo.org>
Error for reference:
I'll see if it also happens with a non-mkl build. |
@conda-forge-admin please rerender |
…onda-forge-pinning 2024.12.31.18.36.35
It's another reason why we should get rid of intel-openmp already for this PR |
I still don’t know enough to fix this unfortunately - if you know what to do, please go ahead (or if you can provide guidance/pointers what to do) |
We are back to the cuda_version issue:
/cc @isuruf |
Looks like there's some memory corruption after all:
|
Okay, so I don't think the issues we're seeing are trivial to fix, but I think the overall build architecture is good. And I'd like to give rattler-build a shot here without creating merge conflicts. So I'd like to propose that:
WDYT? |
What concrete benefit do you forsee from a switch to rattler? This has nothing to do with the test errors. I'd prefer to keep this on conda-build for now, not least because the anaconda folks are planning to contribute some of their recipe bits, and this will surely be easier if we stay on the same format. For the rest of the plan I'm fine - we can merge this PR as long as the windows builds are still disabled and the linux builds are passing (and not breaking your local workflow). |
Well, it will definitely make my local testing faster (given ccache makes the actual build very fast), dependency resolution and other processing in conda-build is a bottleneck. But mostly, I'm worried that the growing complexity of the recipe is making the eventual conversion harder, so I'd rather do it sooner than later, so that future changes are already done in the new format. Of course, I'm not insisting. There's definitely some cost involved: like the ugly comment form seen in triton-feedstock now, or the fact that the |
Hmm, I don't think we have As a data point, it looks like the upstream build from PyPI doesn't crash — |
Looking at https://github.com/pytorch/pytorch/actions/runs/11471282816/job/31928694532, upstream is building with mkl as BLAS backend and libiomp5md rather than libomp. |
The openmp change I was talking about is not difficult at all, just replace |
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)Fixes #32
This PR is another attempt to add Windows builds (see #134) .
For now I disabled all other builds to be able to test the windows part first. I made this PR draft so we don't accidentally merge it.