Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: try CUDA on PPC again #859

Closed
wants to merge 9 commits into from
Closed

Conversation

h-vetinari
Copy link
Member

Based on #848, will be rebased once that's in.

@conda-forge-linter
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari h-vetinari force-pushed the ppc_cuda branch 2 times, most recently from 9afbf6f to baef5be Compare September 13, 2022 20:00
@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

h-vetinari commented Sep 15, 2022

As noted in #659:

I'd like to still spend some time on trying to figure out cross-compilation, because otherwise this feedstock becomes basically infeasible to build.

This is because, if we run aarch/ppc in emulation, we have 16 builds of which half time out on any given run. This means it'll take 5-6 restarts1 on average to get any one CI run passing. This is pretty much infeasible IMO, and it also blocks us from building arrow-cpp without python (which would collapse the CI jobs into one per arch that then also builds all the pyarrow's).

Footnotes

  1. i.e. ~30h for the best case scenario, with 5x optimally timed manual intervention

@h-vetinari
Copy link
Member Author

Though I admit it's very possible that cross-compilation will elude us for a while, I'd still like to find out which pieces are missing. Copying a comment from #793:

@jakirkham: Unfortunately cross-compiling and CUDA builds don't work together today.

@kkraus14: Doesn't that only apply to device code requiring nvcc? The arrow package just uses the CUDA driver API without any actual device code. We should be able to cross compile against a libcuda stub?

@jakirkham: Which means we need to use the CUDA Docker images. This is part of the issue.

@h-vetinari: I think I figured out the images part (or at least the rendering part) in #859. Meaning we have x86 build compilers, but I don't yet know how we get the libcuda stub for host (aarch/ppc) into there. We might be able to download it in the build scripts...?

@kkraus14: You're not allowed to redistribute a libcuda stub in a container unless the container was based on the nvidia/cuda container.

But downloading it in the host env here is not the same as redistributing it. The way I imagined it (naïvely perhaps), is that we can build against the stub here, but rely on libcuda being available on the user's machine.

@kkraus14
Copy link
Contributor

kkraus14 commented Sep 15, 2022

But downloading it in the host env here is not the same as redistributing it. The way I imagined it (naïvely perhaps), is that we can build against the stub here, but rely on libcuda being available on the user's machine.

We have the __cuda virtual package that guarantees a CUDA driver being available and working on the user's machine at runtime.

How are you downloading a libcuda stub in the host env? There's no conda package for it in conda-forge because it can not be redistributed per the EULA.

@h-vetinari
Copy link
Member Author

How are you downloading a libcuda stub in the host env? There's no conda package for it in conda-forge because it can not be redistributed per the EULA.

I'm not yet doing that. It was how I thought the process might work given the constraints and our infrastructure.

AFAIU as long as we don't distribute libcuda, we can still use it in the build process. if there's a EULA-compatible way to do that (anything from curl during the build scripts, to dedicated cross compilation images by Nvidia), then I'd like to try.

@kkraus14
Copy link
Contributor

Sorry, I may have given a bit of misinformation here. We have ppc64el cuda images: https://github.com/conda-forge/docker-images/tree/main/linux-anvil-ppc64le-cuda.

These are based off of the nvidia/cuda images and have the libcuda stub library that ships as part of the toolkit. This allows them to be used for emulated builds, but we can't extract things to use for cross compilation.

I believe there was a separate issue of cuda builds timing out in emulation which is why they weren't enabled.

@h-vetinari
Copy link
Member Author

h-vetinari commented Sep 16, 2022

I believe there was a separate issue of cuda builds timing out in emulation which is why they weren't enabled.

That was on travis apparently? In any case, the CI for 207b451 is green.

These are based off of the nvidia/cuda images and have the libcuda stub library that ships as part of the toolkit. This allows them to be used for emulated builds, but we can't extract things to use for cross compilation.

That's a pity. How did you imagine doing cross-compilation then? We need both a build_platform compiler and a target_platform library stub.

FWIW, I believe you that we can't extract things, but it's not apparent to me how cross-compiling artefacts would somehow violate the EULA when regular builds do not (with the only visible exception being that there's no ready-made image for cross-compilation; but then quay.io/condaforge/linux-anvil-cuda:11.2 is also not a vanilla image).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants