Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V6.2.0 #121

Merged
merged 16 commits into from
Aug 25, 2024
Merged

V6.2.0 #121

merged 16 commits into from
Aug 25, 2024

Conversation

KrisThielemans
Copy link
Contributor

No description provided.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@KrisThielemans
Copy link
Contributor Author

KrisThielemans commented Aug 21, 2024

  • Some Linux cuda jobs with python 3.12 fail in initial Python import test with
    ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
    
    Checking a few log files, it looks like these jobs installed libparallelproj-cpu, as opposed to *cuda. This leads to 2 questions:
    • Why do some cuda jobs not select the cuda version of libparallelproj. @gschramm any idea?
    • What is the STIR recipe missing to get libcudart if we wouldn't import libparallelproj? the run section does have
       - __cuda  # [cuda_compiler_version != "None"]
      
  • Linux cuda jobs with python 3.11 or older (and hence install a root_base version that makes STIR switch to C++ 20) fail in CMake with
    CMake Error in src/recon_buildblock/CMakeLists.txt:
    Target "recon_buildblock" requires the language dialect "CUDA20" .  But the
    current compiler "NVIDIA" does not support this, or CMake does not know the
    flags to enable it.
    
    which is the same error as in https://stackoverflow.com/a/75038504 where the answer says you need CMake 3.25.2 for C++ 20 support and CUDA, but the build is already using CMake 3.30.2.
  • Linux cuda jobs that do not have the above problems fail with
     Running OSMAPOSL CUDA/OSMAPOSL_test_sim_Parallelproj_CUDARDP.par 
     ./run_test_simulate_and_recon.sh: line 118:  4484 Segmentation fault 
    
    This is because there's no CUDA run-time on the azure infrastructure. We should therefore test for nvidia-smi here as well.
  • osx jobs fail with
    ld: library not found for -latomic
    
    these use clang. This will be due to the work-around I introduced in STIR for CLang OpenMP linking errors, see https://gitlab.kitware.com/cmake/cmake/-/issues/26037
  • Windows CUDA jobs fail with
    import stir
    File "D:\bld\stir_1724185476851\_test_env\lib\site-packages\stir.py", line 12, in <module>
      from _stir import *
    ImportError: DLL load failed while importing _stir: The specified module could not be found.
    
    which is likely the same problem as libcudart.so not found. The DLL checks above report
    WARNING (stir,Library/bin/extract_segments.exe): $RPATH/cudart64_12.dll not found in packages, sysroot(s) nor the missing_dso_whitelist.
    .. is this binary repackaging?
    
    However, I see this problem in at least one job that does install libparallel-proj-cuda, e.g. here

@gschramm
Copy link
Contributor

Hi Kris,

  1. No idea why the builds with cuda comp v12 want to install the cpu version of libarallelproj :/
  2. Regarding libcudart, the recipe for libparallelproj is below. There are extra cuda-version requirements for host and run, but I don't remember why we added those (with the help of the conda-forge people). Maybe those will solve the problems.
    requirements:
      build:
        - {{ stdlib("c") }}
        - {{ compiler('c') }}
        - {{ compiler('cxx') }}  # [cuda_compiler_version != "None"]
        - {{ compiler('cuda') }}  # [cuda_compiler_version != "None"]
        - cmake >=3.23
        - ninja
        - python
      host:
        - libgomp      # [linux]
        - llvm-openmp  # [osx]
        - cuda-version {{ cuda_compiler_version }}  # [(cuda_compiler_version or "None") != "None"]
      run:
        - __cuda  # [cuda_compiler_version != "None"]
        - cuda-version {{ cuda_compiler_version }}  # [(cuda_compiler_version or "None") != "None"]

@KrisThielemans
Copy link
Contributor Author

There are extra cuda-version requirements for host and run,

I did check that cuda-version is installed, so I think that won't help. but doing this check more carefully, I see in this job

There's differences therefore between host and build in both version and installed libraries. As everything builds fine, I suspect that this is ok (and maybe we don't need cuda at all on the host). My feeling is that the problem is really just the last bullet. I suppose I could explicitly add cuda-cudart in the run section.... (I have no idea what __cuda adds).

@gschramm
Copy link
Contributor

https://conda-forge.org/docs/maintainer/knowledge_base/#cuda-builds

Conda exposes the maximum CUDA version supported by the installed Nvidia drivers through a virtual package named __cuda. By default, conda will install the highest version available for the packages involved. To override this behaviour, you can define a CONDA_OVERRIDE_CUDA environment variable. More details in the Conda docs.

@KrisThielemans
Copy link
Contributor Author

@gchramm, why do you have

      ignore_run_exports:
        - cudatoolkit  # [(cuda_compiler_version or "None").startswith("11")]

No ignore for 12?

@KrisThielemans
Copy link
Contributor Author

More info on conda-forge/conda-forge.github.io#1963

@KrisThielemans
Copy link
Contributor Author

and https://github.com/conda-forge/cuda-feedstock/blob/main/recipe/doc/recipe_guide.md. Seems that the cuda-version might be a good idea after all. There's far too much there for me to spend time on though. Sigh.

@gschramm
Copy link
Contributor

Indeed challenging to keep up to date. That's why I usually ask for help on the conda-forge gitter (which usually works very well)

@KrisThielemans
Copy link
Contributor Author

@conda-forge/core I hope you can help. The main change here is that STIR now has CUDA code itself, so we need to take that into account. Summarising above problems related to CUDA:

  • For CUDA12 jobs, libcudart.so.12 (or DLL) is missing for the test, so the "import stir` test fails. Do we need an explicit dependency?
  • The hope was that if CUDA is detected, libparallelproj would choose its CUDA version as well, but that doesn't seem to be guaranteed.

Obviously, your help with the CMake issue would also be appreciated, but I'm not expecting any.

@hmaarrfk
Copy link
Contributor

See https://github.com/conda-forge/pytorch-cpu-feedstock/blob/main/recipe/meta.yaml for an example of how to build cuda recipes.

@KrisThielemans
Copy link
Contributor Author

  • Linux CUDA 12 jobs now work (older CUDA versoins fail due to unrelated CMake C++-20 problem).
  • Windows CUDA 12 fail with cuda-driver-dev does not exist. Older CUDA version still fail with DLL not found. That's not a surprise as the previous fix was only for CUDA 12, but I have no idea what to do for CUDA 11. Anyone?
  • OSX jobs still fail due to STIR CMake clang problem mentioned above

@hmaarrfk
Copy link
Contributor

but I have no idea what to do for CUDA 11. Anyone?

its probably fine to drop in 2024 at the feedstock level. don't let it hold you back.

If somebody wants it back they can make a PR and bring back support.

@hmaarrfk
Copy link
Contributor

I think this can be used as an example for windows maybe conda-forge/pytorch-cpu-feedstock#231

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • Selectors are suggested to take a <two spaces>#<one space>[<expression>] form. See lines [26, 28]

@KrisThielemans
Copy link
Contributor Author

@conda-forge-admin please rerender

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • Selectors are suggested to take a <two spaces>#<one space>[<expression>] form. See lines [26, 28]
  • The feedstock has no .ci_support files and thus will not build any packages.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • The feedstock has no .ci_support files and thus will not build any packages.

@KrisThielemans
Copy link
Contributor Author

@conda-forge-admin please rerender

Copy link

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/stir-feedstock/actions/runs/10522325040.

@KrisThielemans
Copy link
Contributor Author

@conda-forge-admin please rerender

Copy link

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/stir-feedstock/actions/runs/10522647113.

@KrisThielemans
Copy link
Contributor Author

Seems that I know skip everything and can't get rid of it. @casperdcl any ideas?

@KrisThielemans
Copy link
Contributor Author

@conda-forge-admin, please lint

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • The feedstock has no .ci_support files and thus will not build any packages.

@gschramm
Copy link
Contributor

@KrisThielemans : When I look at the "Files Changed" tab of this PR, it seems that indeed all the files .ci_support files have been deleted (not sure if that was done by any of the bots)

@KrisThielemans
Copy link
Contributor Author

yup, the rerender. I'm a bit confused by the sequence here. I had some white-space issues in the recipe etc, but I suppose I just have the wrong skip clause (trying to do only cuda 12 on Windows, and skipping osx for now)

recipe/meta.yaml Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Member

@conda-forge-admin, please rerender

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

recipe/meta.yaml Outdated Show resolved Hide resolved
recipe/meta.yaml Outdated Show resolved Hide resolved
recipe/meta.yaml Outdated Show resolved Hide resolved
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
@KrisThielemans
Copy link
Contributor Author

@conda-forge-admin, please rerender

@KrisThielemans
Copy link
Contributor Author

Thanks @h-vetinari for fixing my (stupid) mistake with the "bare" skip and the suggestion. As always, great support from the core!

Hopefully we're there now.

recipe/meta.yaml Outdated Show resolved Hide resolved
recipe/meta.yaml Outdated Show resolved Hide resolved
cuda_major<12 includes non-cuda builds.
@KrisThielemans
Copy link
Contributor Author

@conda-forge-admin, please rerender

@KrisThielemans KrisThielemans merged commit fdf820c into conda-forge:main Aug 25, 2024
42 checks passed
@KrisThielemans KrisThielemans deleted the v6.2.0 branch August 25, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants