Skip to content

TVdenoising invalid argument on multiple GPUs #657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SamiLaubo opened this issue Apr 30, 2025 · 2 comments
Open

TVdenoising invalid argument on multiple GPUs #657

SamiLaubo opened this issue Apr 30, 2025 · 2 comments

Comments

@SamiLaubo
Copy link

Hello, I'm running into some memory problems while reconstructing some data using ossart_tv. Normal ossart works fine and I've narrowed down the problem to the tvdenoise function (from _tv_proximal) called from utilities/im_3d_denoise.py. My goal is to reconstruct data into a 2000^3 volume. Other algorithms like fdk, sirt, os_pcsd, and os_asd_pocs works fine.

I use TIGRE version 3.0.0 in Python with the following versions:

  • Python/3.10.4
  • GCC/11.3.0
  • CUDA/12.1.1
  • cuDNN/8.9.2.26

Hardware (HPC):

  • CPU: Intel Xeon Platinum 8470
  • GPU: NVIDIA H100 80GB HBM3

I've tried the code with 1, 2, and 3 GPUs.

A minimal example which replicates im_3d_denoise.py:

import numpy as np
from _tv_proximal import tvdenoise
from tigre.utilities.gpu import GpuIds

for i in range(1000, 2100, 100):
    print(f'{i = }')

    img = np.random.rand(i, i, i).astype(np.float32)
    gpuids=GpuIds()
    lmbda=50
    iter=50

    print(f'Image memory consumption: {img.nbytes / 1024 / 1024 / 1024:.4f} GB.')

    img = tvdenoise(img, iter, lmbda, gpuids)

This code produces the following error for some different combinations of cpu memory and number of GPUs:

CPU memory 512GB, 1 GPU: Error at i=2000 - Image memory = 30GB
Common/CUDA/TIGRE_common.cpp (18): tvDenoise:tvdenoising:Memory TV dneoising requires 5 times the image memory. Your GPU(s) do not have the required memory.
This memory will be attempted to allocate on the CPU, Whic may fail or slow the computation by a very significant amount.

CPU memory 512GB, 2 GPUs: Error at i=1600 - Image memory = 16GB
Common/CUDA/TIGRE_common.cpp (7): TV minimization
Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:TVdenoising invalid argument

CPU memory 512GB, 3 GPUs: Error at i=1600 - Image memory = 16GB
Common/CUDA/TIGRE_common.cpp (7): TV minimization
Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:TVdenoising invalid argument

CPU memory 128GB, 3 GPUs: Error at i=1600 - Image memory = 16GB
Common/CUDA/TIGRE_common.cpp (7): TV minimization
Common/CUDA/TIGRE_common.cpp (14): CBCT:CUDA:TVdenoising invalid argument

From my understanding I would need around 150GB (2000^3) of memory on the GPUs, i.e. 2 or even 3 GPUs should be enough (80GB each).

  • Do you know why this happens? Maybe it makes sense, but my profficiency in CUDA code is not the best.
  • Do you have any suggestions as to what I can try to fix this?
@AnderBiguri
Copy link
Member

Heya!

So, TIGRE can deal with images bigger than GPU RAM, as long as you have enough CPU RAM (Which you do!).

However, there is some strange bug at certain image size, GPUs, number of GPUs, RAM sizes that causes this issue. Its a big in code I think, but none of my machines seem to be able to reproduce it...

A suggestion could be to break up the image yourself, and run the TVdenoising in pieces. Maybe that helps? if I have time I would like to fix this bug, but realistically I won't anytime soon :(

@SamiLaubo
Copy link
Author

Thanks for the quick answer!

Interesting that you could not replicate this on other hardware/software, maybe I could experiment with some different versions. Breaking up the image is also a good suggestion.

I'll leave any updates here if I find any:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants