Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bart pics with GPU generates only empty output when compiled with cuda 12.5.1, works with cuda 12.0.0 #334

Open
dabosch opened this issue Jul 23, 2024 · 1 comment

Comments

@dabosch
Copy link

dabosch commented Jul 23, 2024

Hi,

I'm compiling bart within an apptainer (singularity) container. Here's the recipe:

Singularity container recipe
Bootstrap: docker
From: nvidia/cuda:12.5.1-devel-ubuntu22.04
Stage: first

%post
    toolVersion='0.9.00'
    toolName='bart'
    apt-get update --yes
    apt-get install --yes make gcc libfftw3-dev liblapacke-dev libpng-dev libopenblas-dev curl
    cd /opt
    mkdir /opt/${toolName}-${toolVersion}
    curl -fsSL --retry 5 https://github.com/mrirecon/bart/archive/v${toolVersion}.tar.gz | tar -f- -xz -C /opt/${toolName}-${toolVersion}/ --strip-components 1
    cd /opt/${toolName}-${toolVersion}
    NVCCFLAGS="-gencode arch=compute_80,code=sm_80" CUDA_BASE=/usr/local/cuda/ CUDA_LIB=lib64 CUDA=1 make

When running bart pics with GPU, I only get empty data:

Apptainer> /opt/bart-0.9.00/bart pics -l2 -g ksp sens im
GPU reconstruction
Size: 6029312 Samples: 626688 Acc: 9.62
WARN: Estimated scale is zero. Set to one.l2 regularization: 0.000000
Regularization terms: 1, Supporting variables: 0
conjugate gradients
WARN: Warning: data empty
Total Time: 3.950549

However, when I change the base to cuda:12.0.0-devel-ubuntu22.04, I can reconstruct an image:

Apptainer> /opt/bart-0.9.00/bart pics -l2 -g ksp sens im
GPU reconstruction
Size: 6029312 Samples: 626688 Acc: 9.62
WARN: Estimated scale is zero. Set to one.l2 regularization: 0.000000
Regularization terms: 1, Supporting variables: 0
conjugate gradients
Total Time: 6.013617

This seems to be reproducible across builds. My host system has an Nvidia RTX A4000 GPU.
I tested several CUDA versions and can report the following:

Cuda 12.0.0: works
Cuda 12.3.2: does not work
Cuda 12.4.1: does not work
Cuda 12.5.1: does not work

Best,
Dario

@mblum94
Copy link
Contributor

mblum94 commented Aug 12, 2024

Hi Dario,

I can reproduce the error. Actually running the gpu unit tests (make utest_gpu) in the respective containers, they pass in the 12.0 container and fail in the 12.5 container with "CUDA Error: the provided PTX was compiled with an unsupported toolchain". The same error occures in the pics tool but is not printed due to the asynchronous execution on the GPU.

My host driver is for CUDA Version 12.2 as reported by nvidia-smi on the host and in the container. I think the error is due to a miss match of the driver with the CUDA toolkit version. However, I'm currently not able to update the host driver for testing.

Best,
Moritz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants