Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Reorganize windows artifacts #14

Merged

Conversation

carterbox
Copy link
Member

@carterbox carterbox commented Dec 4, 2023

Windows DLLs are not needed at compile/link time; only headers and LIB files are needed. Thus to silence downstream overlinking warnings/errors, Windows DLLs should be moved from the noarch target_platform output (which is a second-order dependency downstream) to the native output which is listed in the run_exports directly.

More Explanation in #13 (comment)

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Closes #13
Closes #16 (as superseded)

Redistribute Windows artifacts according to whether they
are needed at compile-link-time OR runtime. For Windows,
DLLs are only needed at Windows runtime. LIBs and headers
are only needed at build time. For Unix, SOs are needed both
at build and runtime.

Include target_name directory for Windows to support
cross-compile (in future?).
@carterbox carterbox requested a review from adibbley as a code owner December 4, 2023 00:58
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@carterbox
Copy link
Member Author

@conda-forge-admin, please rerender

@jakirkham
Copy link
Member

Thanks for exploring this Daniel! 🙏

Think one thing we need to be aware of is cuda-nvcc_{{ target_platform }} is making use of the current structure (as noted in comment: #13 (comment) ). In particular it expects cuda-cudart_{{ target_platform }} (pulled in via cuda-cudart-dev_{{ target_platform }}) to have the DLLs in Library/bin. So if we were to restructure cuda-cudart, we would also need to restructure cuda-nvcc. Think (though could be wrong) that there is a greater risk of breakage when changing both cuda-cudart and cuda-nvcc. Also this likely would result in a divergence between Linux and Windows (potentially leading to additional maintenance burden)

Recognize though that the introduction of cross-compiler support for CUDA Linux and the attempt to mimic this with CUDA Windows (where cross-compiler support is absent) has resulted in some ambiguity (like what cuda-cudart_{{ target_platform }} means). Though would prefer to solve these issues with the smallest set of changes possible. At least so far the best we have come up with is one more item in run_exports ( #13 (comment) )

@carterbox
Copy link
Member Author

Also this likely would result in a divergence between Linux and Windows (potentially leading to additional maintenance burden)

This PR does not increase the divergence between Linux and Windows packages; it makes them more similar. This PR does not remove any package; it does not change any dependency relationships. In fact, it adds the static library packages that were missing from the Windows variant of this recipe and only redistributes the Windows artifacts so they are (correctly) sorted according to what is needed at compile/link-time vs run-time.

Think one thing we need to be aware of is cuda-nvcc_{{ target_platform }} is making use of the current structure (as noted in comment: #13 (comment) ). In particular it expects cuda-cudart_{{ target_platform }} (pulled in via cuda-cudart-dev_{{ target_platform }}) to have the DLLs in Library/bin.

As I tried to explain in this comment, the different purposes/intention of cuda-cudart_{{ target_platform }} vs cuda-cudart are to provide artifacts for compile-time vs to provide artifacts for runtime. You install cuda-cudart_ppc64le-linux when you want compile for powerpc linux, but you run_export/install cuda-cudart when you want to run on powerpc linux (natively). Thus, it is incorrect for cuda-cudart-{{ target_platform }} to contain Windows DLLs because they are never needed at compile-time, LIB files are needed at compile-time. A compiler package (like nvcc) should not need any DLLs from cudart because DLLs are unused when compiling a CUDA program (and the nvcc compiler binary itself does not link to the cudart runtime).

Recognize though that the introduction of cross-compiler support for CUDA Linux and the attempt to mimic this with CUDA Windows (where cross-compiler support is absent) has resulted in some ambiguity (like what cuda-cudart_{{ target_platform }} means).

It seems unambiguous to me that the purpose of a {{ target_platform }} package is to provide artifacts that are needed at compile-time to target the target_platform, because in my experience, {{ target_platform }} packages seem to be only used when cross-compiling is needed. I am very curious if you could explain an alternative interpretation of the purpose of {{ target_platform }}.

So if we were to restructure cuda-cudart, we would also need to restructure cuda-nvcc.

Restructure No. Patch Yes. As explained above, DLLs are not needed at compile time. Only LIBs are needed at compile-time, and LIBs are included in cuda-cudart-dev_{{ target_platform }}. However, this PR does move the libs down a folder from 'Library\lib' to 'Library\lib\target_name', so a patch to the nvcc feedstock is probably needed.

Please let me know if I can answer any more questions.

recipe/meta.yaml Outdated
Comment on lines 158 to 160
- Library\lib\{{ target_name }}\cuda.lib # [win]
- Library\lib\{{ target_name }}\cudadevrt.lib # [win]
- Library\lib\{{ target_name }}\cudart.lib # [win]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving these libs down a directory probably requires a patch to the NVCC feedstock. However, this PR could probably be refactored to not use Library\lib\{{ target_name }}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would this look like if we kept .libs in Library\lib?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no target_name directory, you cannot install multiple cuda-cudart-dev_{{ target_platform }} packages at the same time because the artifacts are not separated and will clobber. However, this is probably not a common use case and is not possible yet for Windows because there is no alternative to x86 for Windows.

Copy link
Member Author

@carterbox carterbox Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without implementing {{ target_name }} for Windows, this PR is just moving the DLL from cuda-cudart_{{ target_platform }} to cuda-cudart. Which is what isuru originally suggested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can actually shrink the diff more by removing the existence tests I added.

@carterbox carterbox requested a review from jakirkham December 9, 2023 05:08
@carterbox
Copy link
Member Author

@conda-forge-admin, please rerender

Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/cuda-cudart-feedstock/actions/runs/7188148583.

@carterbox carterbox closed this Dec 12, 2023
@carterbox carterbox reopened this Dec 12, 2023
@carterbox
Copy link
Member Author

I deny responsibility for these recently failing builds on Linux because I haven't changed any of the Linux packages in this recipe, and it looks like the Windows build fails even before the build script is executed.

@jakirkham
Copy link
Member

The Linux build fails to apparent package corruption. Wonder if some build tool was updated that caused issues

SafetyError: The package for cuda-cudart_linux-64 located at /home/conda/feedstock_root/build_artifacts/pkg_cache/cuda-cudart_linux-64-12.0.107-h59595ed_7
appears to be corrupted. The path 'targets/x86_64-linux/lib/libcudart.so.12.0.107'
has an incorrect size.
  reported size: 691481 bytes
  actual size: 704097 bytes

@jakirkham
Copy link
Member

jakirkham commented Dec 13, 2023

In any event, think there are two potential proposals to point out here to other readers. Namely whether Windows uses a target structure

  1. With
  2. Without

@msarahan
Copy link
Member

I deny responsibility for these recently failing builds

I'm saving this. Genius! I may or may not ping and attribute you every time I use it. At least until you tell me to stop.

In all seriousness, I think the target structure in lib is not appropriate. The lib folder is assumed to be native by conda conventions. If you want to provide non-native libraries to link against, I think the place to do it is in another folder, like the $PREFIX/targets folder discussed in other cuda feedstocks.

@carterbox
Copy link
Member Author

I'm saving this. Genius! I may or may not ping and attribute you every time I use it. At least until you tell me to stop.

Fine

In all seriousness, I think the target structure in lib is not appropriate. The lib folder is assumed to be native by conda conventions. If you want to provide non-native libraries to link against, I think the place to do it is in another folder, like the $PREFIX/targets folder discussed in other cuda feedstocks.

I was just trying to follow the upstream artifact layout. However, following conda convention makes more sense.

@carterbox
Copy link
Member Author

OK. As @jakirkham suggested, the Windows fail was my fault for checking the errorcode before it was set. That is now fixed.

I still deny responsibility for the Linux fail, however. I have built the linux package locally without Docker and it doesn't fail via file corruption.

@adibbley
Copy link
Contributor

The linux failures are from our rpath tests, I made this update in another PR to avoid conflicting rpath patching. Seems to be some varying order of operations that happens around this.

Otherwise this LGTM. Windows users will still end up with cuda-cudart in their build environment following the chain of dependencies from cuda-nvcc_{{ cross_target_platform }} -> cuda-nvcc-impl -> cuda-cudart-dev

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

  • Failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint . from the recipe directory.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@carterbox
Copy link
Member Author

Thanks, @jakirkham @adibbley!

@jakirkham
Copy link
Member

Thanks Daniel for the PR! And Michael and Alex for the reviews 🙏

Think we've landed on a good approach here after a bit of iteration. Thanks for sticking with it Daniel

Sounds like we are on-board with this change and it should behave well with minimal disruption

Am going to configure to merge once CI passes (added one last stylistic tweak)

We can follow up on anything else separately

@jakirkham jakirkham added the automerge Merge the PR when CI passes label Dec 14, 2023
@github-actions github-actions bot merged commit 1e16072 into conda-forge:main Dec 14, 2023
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • travis: passed
  • azure: passed

Thus the PR was passing, but not in a mergeable state (mergeable=None, mergeable_state=unknown).

Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • travis: passed
  • azure: passed

Thus the PR was passing and merged! Have a great day!

jakirkham added a commit that referenced this pull request Dec 14, 2023
Rebuild for PR: #14
Comment on lines 26 to +27
build:
number: 6
number: 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this bumped the build/number to 7, there was already a bump to 7 from PR ( #15 ), which was merged previously. As a result there was no build/number bump for this PR and packages were not published

Have manually bumped the build/number to 8 with commit ( 00246a6 ). So we should soon see new packages with the fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@carterbox carterbox deleted the reorganize-windows-artifacts branch December 14, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

conda-build may raise false alarm on Windows?
4 participants