Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA runtime error - open3D v0.14.1 #4679

Closed
3 tasks done
BBO-repo opened this issue Feb 1, 2022 · 6 comments · Fixed by #6440
Closed
3 tasks done

CUDA runtime error - open3D v0.14.1 #4679

BBO-repo opened this issue Feb 1, 2022 · 6 comments · Fixed by #6440
Labels
build/install Build or installation issue cuda

Comments

@BBO-repo
Copy link

BBO-repo commented Feb 1, 2022

Checklist

Describe the issue

With the following configuration:

  • Ubuntu "18.04.6 LTS (Bionic Beaver)"
  • gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
  • nvcc version: Cuda compilation tools, release 11.6, V11.6.55
  • cmake version 3.22.2

I got a runtime error, running the DenseSlam.cpp when I set "--device CUDA:0" but everything works fine when I use "--device CPU:0"

The error is the following
[Open3D INFO] Using device: CUDA:0 terminate called after throwing an instance of 'std::runtime_error' what(): [Open3D Error] (void open3d::core::__OPEN3D_CUDA_CHECK(cudaError_t, const char*, int)) /home/ubuntu/Work/Projects/handheld-scanning-prototype/Open3DBox/build/open3d/src/external_open3d/cpp/open3d/core/CUDAUtils.cpp:301: /home/ubuntu/Work/Projects/handheld-scanning-prototype/Open3DBox/build/open3d/src/external_open3d/cpp/open3d/core/MemoryManagerCUDA.cpp:43 CUDA runtime error: operation not supported

I do not understand what is the issue, since testing my cuda install I've run the deviceQuery cuda application which outputs me the following
bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro M1000M"
CUDA Driver Version / Runtime Version 11.6 / 11.6
....
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.6, CUDA Runtime Version = 11.6, NumDevs = 1 Result = PASS

Could you please provide any support to solve my issue?

Also joining my cmake file to build open3D

# Option 1: Use ExternalProject_Add, as shown in this CMake example.
# Option 2: Install Open3D first and use find_package, see
#           http://www.open3d.org/docs/release/cpp_project.html for details.
include(ExternalProject)
ExternalProject_Add(
    external_open3d
    PREFIX open3d
    GIT_REPOSITORY https://github.com/intel-isl/Open3D.git
    GIT_TAG v0.14.1
    GIT_SHALLOW ON
    UPDATE_COMMAND ""
    # Check out https://github.com/intel-isl/Open3D/blob/master/CMakeLists.txt
    # For the full list of available options.
    CMAKE_ARGS
        -DCMAKE_INSTALL_PREFIX=<INSTALL_DIR>
        -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
        -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
        -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
        -DGLIBCXX_USE_CXX11_ABI=${GLIBCXX_USE_CXX11_ABI}
        -DSTATIC_WINDOWS_RUNTIME=${STATIC_WINDOWS_RUNTIME}
        -DBUILD_SHARED_LIBS=ON
        -DBUILD_PYTHON_MODULE=OFF
        -DBUILD_EXAMPLES=OFF
        -DBUILD_WEBRTC=OFF
        -DBUILD_CUDA_MODULE=ON
)

Steps to reproduce the bug

In a ubuntu 18.04.6 linux distribution with a machine supporting cuda
Install cuda 11.6
Use open3D external cmake add external 
Run the example DenseSLAM with flag "--device CUDA:0"

Error message

[Open3D INFO] Using device: CUDA:0 terminate called after throwing an instance of 'std::runtime_error' what(): [Open3D Error] (void open3d::core::__OPEN3D_CUDA_CHECK(cudaError_t, const char*, int)) /home/ubuntu/Work/Projects/handheld-scanning-prototype/Open3DBox/build/open3d/src/external_open3d/cpp/open3d/core/CUDAUtils.cpp:301: /home/ubuntu/Work/Projects/handheld-scanning-prototype/Open3DBox/build/open3d/src/external_open3d/cpp/open3d/core/MemoryManagerCUDA.cpp:43 CUDA runtime error: operation not supported

Expected behavior

Running DenseSLAM without crashing it is the case when running with the flag "--device CPU:0"

Open3D, Python and System information

- Operating system: Ubuntu 18.04.6
- Open3D version: 0.14.1
- System type: 64 bit machine
- Is this remote workstation?: no
- How did you install Open3D?: build from source
- Compiler version (if built from source): gcc 7.5

Additional information

image

@BBO-repo BBO-repo added the bug Not a build issue, this is likely a bug. label Feb 1, 2022
@theNded
Copy link
Contributor

theNded commented Feb 1, 2022

Quadro M1000M is an old card and I suspect cudaMallocAsync is not supported these machines, see JuliaGPU/CUDA.jl#637
(@yxlao we may want to add this checker in addition to the CUDART version macro).

One potential fix is to replace all the functions with Async postfix with their non-async versions.

@theNded theNded added build/install Build or installation issue cuda and removed bug Not a build issue, this is likely a bug. labels Feb 1, 2022
@BBO-repo
Copy link
Author

BBO-repo commented Feb 1, 2022

Hi @theNded
Thank you for you fast answer.
You were right the Async was making the issue.
I've changed in the file open3d/src/external_open3d/cpp/open3d/core/MemoryManagerCUDA.cpp, by commenting the lines 41 to 44 to disable the cudaMallocAsync and the lines 58 to 62 to disable the cudaFreeAsync

The denseSlam is now running until a point where I do face another error: an out of memory error

[Open3D INFO] Processing 925/2407...
[Open3D INFO] Processing 926/2407...
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Open3D Error] (void open3d::core::__OPEN3D_CUDA_CHECK(cudaError_t, const char*, int)) /home/ubuntu/Work/Projects/handheld-scanning-prototype/Open3DBox/build/open3d/src/external_open3d/cpp/open3d/core/CUDAUtils.cpp:301: /home/ubuntu/Work/Projects/Open3DBox/build/open3d/src/external_open3d/cpp/open3d/core/MemoryManagerCUDA.cpp:45 CUDA runtime error: out of memory```

It always failed to this 926th RGB-D image.
Do you have any idea to solve this?

@theNded
Copy link
Contributor

theNded commented Feb 1, 2022

Quadro M1000M only has 4G GPU memory, so there are not many things we can do with it. One potential change is to increase the voxel size by a factor of say 2, but it will sacrifice the tracking and reconstruction quality.

@theNded theNded closed this as completed Feb 1, 2022
@BBO-repo
Copy link
Author

BBO-repo commented Feb 2, 2022

Ok then it is all solved!
Thank you.

@ao2
Copy link
Contributor

ao2 commented Aug 22, 2023

Hi,

@theNded I would like to re-open this issue as there are new findings about it.

To recap the CUDA runtime error: operation not supported error referred to the fact that the cudaMallocAsync() function does not work on some GPUs, namely Quadro M1000M and Quadro M3000M (the one I have).

Digging in the CUDA documentation we can find out that the Stream Ordered Memory Allocator is not available on all NVIDIA GPUs and that support should be verified at runtime, see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY__POOLS.html

In practical terms this would mean that the compile-time check on the driver version performed in

#if CUDART_VERSION >= 11020
OPEN3D_CUDA_CHECK(cudaMallocAsync(static_cast<void**>(&ptr), byte_size,
cuda::GetStream()));
#else
OPEN3D_CUDA_CHECK(cudaMalloc(static_cast<void**>(&ptr), byte_size));
#endif
should be replaced with something like:

    if (device.cudaSupportsMemoryPools()) {
         OPEN3D_CUDA_CHECK(cudaMallocAsync(static_cast<void**>(&ptr), byte_size,
                                          cuda::GetStream()));
    } else {
        OPEN3D_CUDA_CHECK(cudaMalloc(static_cast<void**>(&ptr), byte_size));
    }

and the implementation of device.cudaSupportsMemoryPools() could be something like this:

    int driverVersion = 0;
    int deviceSupportsMemoryPools = 0;
    OPEN3D_CUDA_CHECK(cudaDriverGetVersion(&driverVersion));
    if (driverVersion >= 11020) { // avoid invalid value error in cudaDeviceGetAttribute
        OPEN3D_CUDA_CHECK(cudaDeviceGetAttribute(&deviceSupportsMemoryPools, cudaDevAttrMemoryPoolsSupported, device));
    }

    return !!deviceSupportsMemoryPools;

I'll try to propose a patch for this, but if someone more familiar with the Open3D codebase wants to anticipate me, please go ahead.

Thank you, Antonio

ao2 added a commit to ao2/Open3D that referenced this issue Oct 21, 2023
… time (isl-org#4679)

Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11020, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.
ao2 added a commit to ao2/Open3D that referenced this issue Oct 21, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11020, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.
ao2 added a commit to ao2/Open3D that referenced this issue Oct 21, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11020, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.
@ao2
Copy link
Contributor

ao2 commented Oct 21, 2023

Pushed a tentative fix to #6440

ao2 added a commit to ao2/Open3D that referenced this issue Oct 21, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11020, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.
ao2 added a commit to ao2/Open3D that referenced this issue Oct 27, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11020, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.
ao2 added a commit to ao2/Open3D that referenced this issue Oct 28, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11.2, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.

Still keep the compile time check to support building with CUDA versions
older than 11.2.
ao2 added a commit to ao2/Open3D that referenced this issue Oct 30, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11.2, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.

Still keep the compile time check to support building with CUDA versions
older than 11.2.
ssheorey pushed a commit that referenced this issue Oct 31, 2023
Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11.2, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.

Still keep the compile time check to support building with CUDA versions
older than 11.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build/install Build or installation issue cuda
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants