Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIBR_gaussianViewer_app build and runtime issues - "PyTorch is not linked with support for CUDA devices" #77

Open
wright7 opened this issue Aug 14, 2024 · 2 comments

Comments

@wright7
Copy link

wright7 commented Aug 14, 2024

Hello,

I'm trying to build the SIBR_gaussianViewer_app from your sources, but I'm having runtime issues. Let me explain what I've done to build it, let's focus on debug configuration:

  1. I've cloned the repository , I'm using main branch
  2. I've configured, generated and opened the project
    2.1. On Windows 11
    2.2. CMake GUI 3.29.2, I've added OpenCV_RUNTIME=vc17 entry
    2.3. With Visual Studio 2022 LTSC 17.8 (17.8.12)
    2.4. Python 3.10
    2.5. CUDA 11.8
  3. I've copied src/core/viewer folder from original SIBR repository
  4. I've downloaded libtorch debug version from the link in your README and put it into extlibs/ directory
  5. I've configured both SIBR_gaussianViewer_app and sibr_gaussian project in Visual Studio according to your README
  6. I've build SIBR_gaussianViewer_app (debug configuration) and copied all missing dlls to install directory

What I'm getting with such build is an unhandled exception:
Unhandled exception at 0x00007FFE2FE1FABC in SIBR_gaussianViewer_app_d.exe: Microsoft C++ exception: c10::Error at memory location 0x0000009458EFA7D0.
at line
TORCH_CHECK(p, "PyTorch is not linked with support for ", type, " devices");
From SIBR_viewers\extlibs\libtorch\debug\include\c10\core\impl\DeviceGuardImplInterface.h:318
It's caused by p being null, so device_guard_impl_registry[1] (CUDA is 1) is null.
The stacktrace shows that dll being used is torch_cpu.dll.

I tried using other builds of torch 1.10, but that caused fail on loading data from file, which seems to be a library version compatibility issue:
opacity_mlp_module = torch::jit::load(opacity_mlp_path, _libtorch_device);

Could you help me with this issue?

@tongji-rkr
Copy link
Collaborator

Can you successfully compile with the release build?

@wright7
Copy link
Author

wright7 commented Sep 2, 2024

Sorry for the delay in responding, I have been on holiday.

Yes, I can successfully compile release build (using release version of libtorch, dlls, etc.) and the results look similar - it's probably the same issue as in debug of torch not being able to use CUDA:

[SIBR] --  INFOS  --:   Initialization of GLFW
[SIBR] --  INFOS  --:   OpenGL Version: 4.6.0 NVIDIA 560.94[major: 4, minor: 6]
[SIBR] --  INFOS  --:   Dataset type:
Number of input Images to read: 279
Number of Cameras set up: 279
LOADSFM: Try to open D:\repos\Scaffold-GS-Official\SIBR_viewers\data\kitchen/sparse/0/points3D.bin
Num 3D pts 241367
[SIBR] --  INFOS  --:   SfM Mesh 'D:\repos\Scaffold-GS-Official\SIBR_viewers\data\kitchen/sparse/0/points3d.bin successfully loaded.  (241367) vertices detected. Init GL ...
[SIBR] --  INFOS  --:   Init GL mesh complete
[SIBR] --  INFOS  --:   Loading models from: D:\repos\Scaffold-GS-Official\SIBR_viewers\ckpt\kitchen/point_cloud//
[SIBR] --  INFOS  --:   opacity_mlp : 1
[SIBR] --  INFOS  --:   cov_mlp : 1
[SIBR] --  INFOS  --:   color_mlp : 1
[SIBR] --  INFOS  --:   embedding_appearance : 0

D:/repos/Scaffold-GS-Official/SIBR_viewers/install/bin\SIBR_gaussianViewer_app.exe (process 46968) exited with code -1073740791.

It seems that sibr::GaussianView::GaussianView's CUDA device count test passes (so CUDA Toolkit recognizes the GPU as CUDA device and returns 1 from cudaGetDeviceCount and then sets the device), but then libtorch does not see this device and crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants