Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build for new archs #8

Merged
merged 10 commits into from
Oct 6, 2023
Merged

Build for new archs #8

merged 10 commits into from
Oct 6, 2023

Conversation

carterbox
Copy link
Member

@carterbox carterbox commented Sep 28, 2023

Refactor the build scripts to target consistent CUDA archs across toolkit versions and enable POWERPC builds.

CUDA compatability is increased. The binaries will now run on all architectures supported by each toolkit. Before, the minimum targeted arch for ppc64le and aarch64 was 50; now it is 35.

Compile-time optimization has decreased. I no longer target 61, these devices can run 60. I tried to target the lowest arch from each named generation. 35 now only gets PTX, so it has to be compiled by the end users' CUDA driver for those devices; this will increase first startup times for those devices. Actually, for Windows, all archs have been getting PTX only because it takes to long to generate the machine code here on the feedstock.

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@carterbox carterbox requested a review from a team as a code owner September 28, 2023 15:25
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@carterbox
Copy link
Member Author

I have tried appending a new CUDAToolkit_ROOT to CMAKE_ARGS. IF that doesn't work, try replacing the existing one, so it isn't duplicated.

recipe/meta.yaml Outdated
skip: true # [cuda_compiler_version == "None"]
skip: true # [cuda_compiler_version == "10.2"]
skip: true # [cuda_compiler_version == "11.0"]
skip: true # [cuda_compiler_version == "11.1"]
skip: true # [ppc64le]
skip: true # [cuda_compiler_version == "11.2"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry why is this necessary? can we not be compatible with both using little effort?

Copy link
Member Author

@carterbox carterbox Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until what date do you want me to continue to release builds for 11.2? The 11.2 build takes to long for the CI, so I will need to reduce the number of target archs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you install the newer cuda stuff with pytorch or tensorflow from 11.2? if so then that is ok with me.

We just don't have cuda != 11.2 for either of those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, the addition of ppc64le is unrelated to the removal of 11.2.

I think you would need an 11.2 build of magma to build pytorch with 11.2. I don't think you can use 11.8 build of magma to build pytorch with 11.2, so I'll try to get the 11.2 build passing again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could go the other way too, and try to build pytorch and ensorflow with 11.8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In term of migrating to CUDA 11.8 both PyTorch ( conda-forge/pytorch-cpu-feedstock#195 ) and TensorFlow ( conda-forge/tensorflow-feedstock#344 ) have updated

Regarding CUDA 11.2, PyTorch dropped it ( conda-forge/pytorch-cpu-feedstock#195 ) and TensorFlow is planning to ( conda-forge/tensorflow-feedstock#347 (comment) )

So maybe we can simplify the builds here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll keep that in mind for the next release!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Tracking in issue ( #12 )

@carterbox
Copy link
Member Author

The variablility in build times is frustrating; it's like variable by like 15%!

@carterbox carterbox added the automerge Merge the PR when CI passes label Oct 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • azure: failed

Thus the PR was not passing and not merged.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Hi! This is the friendly conda-forge automerge bot!

Commits were made to this PR after the automerge label was added. For security reasons, I have disabled automerge by removing the automerge label. Please add the automerge label again (or ask a maintainer to do so) if you'd like to enable automerge again!

@github-actions github-actions bot removed the automerge Merge the PR when CI passes label Oct 5, 2023
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 5, 2023

so did you end up "reducing" anything? who will be affected if you did?

@carterbox
Copy link
Member Author

Compatability is actually increased. The binaries will run on all architectures supported by the toolkit. Before, the minimum arch for ppc64le or aarch64 was 50; now it is 35.

Optimization has decreased. I no longer target 61 (these devices can run 60), and 35 only gets PTX, so it has to be compiled by the end users' CUDA driver for those devices. Actually, for Windows, all archs have been getting PTX only because it takes to long to generate the machine code.

@carterbox carterbox merged commit 1765fda into conda-forge:main Oct 6, 2023
@carterbox carterbox deleted the new-archs branch October 6, 2023 16:28
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 6, 2023

Does Nvidia provide a table for what "61" means in terms of the products that you buy?

GTX????
RTX????

@carterbox
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants