Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: an illegal memory access was encountered #1

Open
SunHongyang10 opened this issue Jul 21, 2024 · 39 comments
Open

Comments

@SunHongyang10
Copy link

Hello~ Wonderful Work!

I am trying to run the train_coarse.py, then I meet an error:

image

I tried to solve it, but I failed😭

Is there a problem with my virtual environment?

@Vilour
Copy link

Vilour commented Jul 21, 2024

Same error here. Have you solved it?

@Snosixtyboo
Copy link
Collaborator

Hi,

unfortunately, this error is not super specific, we have seen it before in 3D Gaussian Splatting. We tried our best to replicate it, but we were never able to get it on any of our machines, so we never worked out how to debug it...

Could you let us know your OS / GPU (how many GPUs are in your machine)? Getting the latest NVIDIA drivers might help, but bottom line, without full access to a setup where it happens, it might be really tough to find it.

@Vilour
Copy link

Vilour commented Jul 21, 2024

Hi,

unfortunately, this error is not super specific, we have seen it before in 3D Gaussian Splatting. We tried our best to replicate it, but we were never able to get it on any of our machines, so we never worked out how to debug it...

Could you let us know your OS / GPU (how many GPUs are in your machine)? Getting the latest NVIDIA drivers might help, but bottom line, without full access to a setup where it happens, it might be really tough to find it.

Hi,

I'm working with Ubuntu 18.04 with one GPU (RTX 3090). I used this setup in 3D Gaussian Splatting before and it works fine. By the way, my nvcc -V returns 11.6 and I created the virtual environment as the repo describes. Does it associate with CUDA version? Is visibility_filter really working here? Maybe I can just comment this line..

@Vilour
Copy link

Vilour commented Jul 21, 2024

The error persisted even if I comment this line..

@ameuleman
Copy link
Collaborator

Hi,
Did you install pytorch corresponding to cuda 11.x?
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118

@Vilour
Copy link

Vilour commented Jul 21, 2024

Hi, Did you install pytorch corresponding to cuda 11.x? pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118

Yes, I installed pytorch with this command. By the way, the error on my machine appeared at 10040 iterations, not from the beginning.

@Vilour
Copy link

Vilour commented Jul 21, 2024

Hi,

I tied on another machine (Ubuntu 20.04/A6000) with the same dataset, and the error appears again on the 10040 iterations. I guess this issue associates with dataset?

@ameuleman
Copy link
Collaborator

I just tried downloading SmallCity and running full_train.py. The coarse optimization went smoothly. Are we working with the same dataset?

@Vilour
Copy link

Vilour commented Jul 21, 2024

I just tried downloading SmallCity and running full_train.py. The coarse optimization went smoothly. Are we working with the same dataset?

I was working with a dataset which I collected myself. I'm trying with SmallCity right now.

@Vilour
Copy link

Vilour commented Jul 21, 2024

I just tried downloading SmallCity and running full_train.py. The coarse optimization went smoothly. Are we working with the same dataset?

I just tried with SmallCity and the error appeared immediately.

@Snosixtyboo
Copy link
Collaborator

@Vilour @SunHongyang10 When it fails, could you try keep an eye on the GPU memory consumption? Is it possible that the system goes out of video memory? This should not happen on a 3090...

@SunHongyang10
Copy link
Author

@Vilour @SunHongyang10 When it fails, could you try keep an eye on the GPU memory consumption? Is it possible that the system goes out of video memory? This should not happen on a 3090...

I just tried with small_city dataset, and it fails immediately, my device is a 3090

@anchun
Copy link

anchun commented Jul 22, 2024

Same here in Ubuntu20.04, with the following call stacks:

File "train_coarse.py", line 190, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train_coarse.py", line 106, in training
loss.backward()
File "/home/anchun/software/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/anchun/software/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/home/anchun/software/miniconda3/envs/hierarchical_3d_gaussians/lib/python3.12/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered

@Vilour
Copy link

Vilour commented Jul 22, 2024

@Vilour @SunHongyang10 When it fails, could you try keep an eye on the GPU memory consumption? Is it possible that the system goes out of video memory? This should not happen on a 3090...

The video memory consumption is ok. Only takes a few gigabytes.

@ameuleman
Copy link
Collaborator

Hi,

Could you please provide cuda and nvidia driver versions?

@PLUS-WAVE
Copy link

Hi,

Could you please provide cuda and nvidia driver versions?

I have same issue, here is my version:

  • 3070 Laptop
  • Driver Version: 560.70
  • cuda 12.1

@Vilour
Copy link

Vilour commented Jul 22, 2024

Hi,

Could you please provide cuda and nvidia driver versions?

The driver version is 525.105.17, and torch.version.cuda returns 11.8

@ameuleman
Copy link
Collaborator

Thanks for providing details. I managed to replicate the error using nvidia/cuda:12.1.0-devel-ubuntu20.04. We will look into it.

@kevintsq
Copy link

#1 (comment)

my nvcc -V returns 11.6 and I created the virtual environment as the repo describes

.#1 (comment)

pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu**118**

It seems there is a version mismatch here. The nvcc -V version must be the same as the version you install PyTorch.

@han-xiangyu
Copy link

han-xiangyu commented Jul 23, 2024

Hi,

I met the same problem and I was running it on a RTX 6000 Ada and ubuntu 24.04 with cuda 11.8 and driver version 550.90.07. Thanks for help!

The error mesage is:

$ python scripts/full_train.py --project_dir dataset/example_dataset/

creating output dir: dataset/example_dataset/output
Optimizing dataset/example_dataset/output/scaffold
Output folder: dataset/example_dataset/output/scaffold [23/07 01:21:16]
Converting point3d.bin to .ply, will happen only the first time you open the scene. [23/07 01:21:16]
Reading camera 1158/1158 [23/07 01:21:17]
0 test images [23/07 01:21:17]
1158 train images [23/07 01:21:17]
Making Training Dataset [23/07 01:21:17]
Making Test Dataset [23/07 01:21:17]
Number of points at initialisation :  329992 [23/07 01:21:17]
Training progress:   0%|                                                                            | 0/30000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/xiangyu/Projects/hierarchical-3d-gaussians/train_coarse.py", line 190, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/home/xiangyu/Projects/hierarchical-3d-gaussians/train_coarse.py", line 110, in training
    gaussians.max_radii2D[visibility_filter] = torch.max(gaussians.max_radii2D[visibility_filter], radii)
                                                         ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Training progress:   0%|                                                                            | 0/30000 [00:00<?, ?it/s]
Error executing train_coarse: Command 'python train_coarse.py -s dataset/example_dataset/camera_calibration/aligned --save_iterations -1 -i ../rectified/images --skybox_num 100000 --model_path dataset/example_dataset/output/scaffold --alpha_masks ../rectified/masks ' returned non-zero exit status 1.`

@ameuleman
Copy link
Collaborator

Hi,
We are still working on a fix. In the meantime, I ran it without issue with Ubuntu 22.04 and CUDA 12.5.
Here is the corresponding Dockerfile:

FROM nvidia/cuda:12.5.1-cudnn-devel-ubuntu22.04
ARG USER_ID=1000
ARG GROUP_ID=1000
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends git wget unzip bzip2 sudo build-essential ca-certificates openssh-server vim ffmpeg libsm6 libxext6 python3-opencv gcc-11 g++-11 cmake

# conda
ENV PATH /opt/conda/bin:$PATH 
RUN wget --quiet \
    https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
    /bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    rm -rf /tmp/*

# Create the user
RUN addgroup --gid $GROUP_ID user
RUN useradd --create-home -s /bin/bash --uid $USER_ID --gid $GROUP_ID docker
RUN adduser docker sudo
RUN echo "docker ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
USER docker

# Setup hierarchical_3d_gaussians
RUN /opt/conda/bin/python -m ensurepip
RUN /opt/conda/bin/python -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
RUN /opt/conda/bin/python -m pip install plyfile tqdm joblib exif scikit-learn timm==0.4.5 opencv-python==4.9.0.80 gradio_imageslider gradio==4.29.0 matplotlib

With 125.Dockerfile in hierarchical-3d-gaussians/:

DATASET_DIR=<Path to dataset>
docker build -t hierarchical_3d_gaussians125 -f 125.Dockerfile .
docker run -it --gpus=all --rm -v ${PWD}:/host -v ${DATASET_DIR}:/data --network=host --ipc=host hierarchical_3d_gaussians125 /bin/sh -c "cd /host; bash"
rm -r submodules/hierarchy-rasterizer/build submodules/simple-knn/build submodules/gaussianhierarchy/build
cd submodules/gaussianhierarchy
cmake . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release
cd ../..
/opt/conda/bin/python scripts/full_train.py --project_dir /data

@PLUS-WAVE
Copy link

PLUS-WAVE commented Jul 23, 2024

I installed CUDA 12.5, uninstalled the original PyTorch 2.3.0, and then reinstalled the latest version of PyTorch (2.3.1). After that, I reran pip install -r requirements.txt. Now it can run normally on SmallCity.

Commands executed:

conda remove pytorch torchvision torchaudio pytorch-cuda=12.1
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

pip install -r requirements.txt will reinstalled diff_gaussian_rasterization, gaussian_hierarchy-0.0.0, and simple_knn-0.0.0

I suspect it is because I reinstalled diff_gaussian_rasterization, gaussian_hierarchy-0.0.0, and simple_knn-0.0.0 after upgrading to CUDA 12.5. I did not solve the issue after upgrading to PyTorch 2.3.1, but it was resolved after running pip install -r requirements.txt

@Snosixtyboo
Copy link
Collaborator

There seems to be an issue associated with CUB, which is failing to compute the sum over a CUDA array, for no obvious reason. We are checking what can be done.

@Snosixtyboo
Copy link
Collaborator

There seem to be unspecified PyTorch/CUB compatibility issues on Ubuntu, we will try to figure out where they come from or if we can get a more robust alternative. In the meantime, if you can, combining PyTorch built for CUDA 12.1 with a CUDA Toolkit 12.5 installation (yes, this should be fine, minor version mismatches are allowed) seems like a good choice on Ubuntu, according to Docker.

@Linkersem
Copy link

Hi, I built docker (based on my graphics driver, I modified it appropriately) based on the provided dockerfile to run the code, and this is my dockerfile

FROM nvcr.io/nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
ARG USER_ID=1000
ARG GROUP_ID=1000
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends git wget unzip bzip2 sudo build-essential ca-certificates openssh-server vim ffmpeg libsm6 libxext6 python3-opencv gcc-11 g++-11 cmake

# conda
ENV PATH /opt/conda/bin:$PATH 
RUN wget --quiet \
    https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
    /bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    rm -rf /tmp/*

# Create the user
RUN addgroup --gid $GROUP_ID user
RUN useradd --create-home -s /bin/bash --uid $USER_ID --gid $GROUP_ID docker
RUN adduser docker sudo
RUN echo "docker ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
# USER docker

# Setup hierarchical_3d_gaussians
RUN /opt/conda/bin/python -m ensurepip
RUN /opt/conda/bin/python -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
RUN /opt/conda/bin/python -m pip install plyfile tqdm joblib exif scikit-learn timm==0.4.5 opencv-python==4.9.0.80 gradio_imageslider gradio==4.29.0 matplotlib

but it keeps getting stuck here
image

@Linkersem
Copy link

Hi, I built docker (based on my graphics driver, I modified it appropriately) based on the provided dockerfile to run the code, and this is my dockerfile

FROM nvcr.io/nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
ARG USER_ID=1000
ARG GROUP_ID=1000
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends git wget unzip bzip2 sudo build-essential ca-certificates openssh-server vim ffmpeg libsm6 libxext6 python3-opencv gcc-11 g++-11 cmake

# conda
ENV PATH /opt/conda/bin:$PATH 
RUN wget --quiet \
    https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
    /bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    rm -rf /tmp/*

# Create the user
RUN addgroup --gid $GROUP_ID user
RUN useradd --create-home -s /bin/bash --uid $USER_ID --gid $GROUP_ID docker
RUN adduser docker sudo
RUN echo "docker ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
# USER docker

# Setup hierarchical_3d_gaussians
RUN /opt/conda/bin/python -m ensurepip
RUN /opt/conda/bin/python -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
RUN /opt/conda/bin/python -m pip install plyfile tqdm joblib exif scikit-learn timm==0.4.5 opencv-python==4.9.0.80 gradio_imageslider gradio==4.29.0 matplotlib

but it keeps getting stuck here image

Well, same question, after about half an hour of waiting.

@Snosixtyboo
Copy link
Collaborator

Snosixtyboo commented Jul 24, 2024

@Linkersem

Hi,
would you mind trying to make a new docker image, but with cuda 12.5.1? There seem to be issues with Cuda 12.1

@Linkersem
Copy link

hi, I'm sorry, but this is a bit difficult for me, mainly because these operations on the workstation, if i modify the graphics card driver and CUDA(max available version is 12.2), it may affect other people.

@Linkersem
Copy link

Hello, it doesn't seem to have to be run in a cuda 12.5 environment, I have a cuda 12.3 pytroch 2.3.0 device that works fine, hope that helps.

@SunHongyang10
Copy link
Author

by far, cuda12.3 pytorch2.3.0 works

@kevintsq
Copy link

Currently CUDA 12.4 + PyTorch 2.4 works on Windows.

@ForeverAurorak
Copy link

Hi, I solved the problem.
The submodules/gaussianhierarchy/setup. py the "extra_compile_args" modified to
{"cxx": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "dependencies/eigen/")]}.
The submodules/hierarchy-rasterizer/setup. py the "extra_compile_args" modified to
{"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]}).
Then reinstall the hierarchy-rasterizer "pip install submodules/hierarchy-rasterizer"
Related issues:
graphdeco-inria/gaussian-splatting#41
graphdeco-inria/diff-gaussian-rasterization#10

@kevintsq
Copy link

Yes it works! Just a caveat, -fno-gnu-unique can only be used on Linux and -Xcompiler can only be passed to nvcc.

@GoroYeh-HRI
Copy link

I got the same error here (Ubuntu 18.04, nvcc -V 11.6)
I installed using command:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

image

After using @ForeverAurorak 's solution, it works and it's training now! Thank you so much!!!
image

Hi, I solved the problem. The submodules/gaussianhierarchy/setup. py the "extra_compile_args" modified to {"cxx": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "dependencies/eigen/")]}. The submodules/hierarchy-rasterizer/setup. py the "extra_compile_args" modified to {"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]}). Then reinstall the hierarchy-rasterizer "pip install submodules/hierarchy-rasterizer" Related issues: graphdeco-inria/gaussian-splatting#41 graphdeco-inria/diff-gaussian-rasterization#10

@rowellz
Copy link

rowellz commented Aug 1, 2024

Thank you to everyone in this thread for their awesome contributions, especially @ameuleman for providing a working Dockerfile. I was able to take it and put together a working docker-compose environment. Everything appears to be working but I still haven't figured out a way to connect to the remote viewer. If anyone is interested in running H3DGS via docker compose, here is the link to the complete diff: https://github.com/graphdeco-inria/hierarchical-3d-gaussians/pull/31/files

BTW I am running a RTX 3060 12GB with CUDA 12.3 installed on my host machine

@Gaaaavin
Copy link

Gaaaavin commented Aug 2, 2024

Hi, I solved the problem. The submodules/gaussianhierarchy/setup. py the "extra_compile_args" modified to {"cxx": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "dependencies/eigen/")]}. The submodules/hierarchy-rasterizer/setup. py the "extra_compile_args" modified to {"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]}). Then reinstall the hierarchy-rasterizer "pip install submodules/hierarchy-rasterizer" Related issues: graphdeco-inria/gaussian-splatting#41 graphdeco-inria/diff-gaussian-rasterization#10

I think it should be __file__ instead of file in the two lines.

Yes it works! Just a caveat, -fno-gnu-unique can only be used on Linux and -Xcompiler can only be passed to nvcc.

This is a good point. In conclusion, the following code modification works for me on Ubuntu:
line 29 in submodules/hierarchy-rasterizer/setup.py:

extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})

line 29 in submodules/gaussianhierarchy/setup.py:

extra_compile_args={"cxx": ["-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "dependencies/eigen/")]}

By modifying the two files above and reinstall via pip install -r requirements.txt, I got it to run smoothly.
I'm using a Ubuntu 20.04 machine, with PyTorch 2.3.0 (build for CUDA 12.1) and CUDA 12.1 runtime (nvcc --version = 12.1)
Note that this doesn't work if I'm using a CUDA 12.5 runtime as instructed above.

@alanvinx
Copy link
Collaborator

alanvinx commented Aug 7, 2024

Hi thank you for your feedbacks, I pushed the fix to https://github.com/graphdeco-inria/hierarchy-rasterizer, please update your rasterizer using git submodule update --remote.

Regarding gaussianhierarchy I could run full_train.py successfully without modifying submodules/gaussianhierarchy/setup.py using the following Dockerfile with cuda 11.8 and 12.1.

Hi, We are still working on a fix. In the meantime, I ran it without issue with Ubuntu 22.04 and CUDA 12.5. Here is the corresponding Dockerfile:

FROM nvidia/cuda:12.5.1-cudnn-devel-ubuntu22.04
ARG USER_ID=1000
ARG GROUP_ID=1000
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends git wget unzip bzip2 sudo build-essential ca-certificates openssh-server vim ffmpeg libsm6 libxext6 python3-opencv gcc-11 g++-11 cmake

# conda
ENV PATH /opt/conda/bin:$PATH 
RUN wget --quiet \
    https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
    /bin/bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    rm -rf /tmp/*

# Create the user
RUN addgroup --gid $GROUP_ID user
RUN useradd --create-home -s /bin/bash --uid $USER_ID --gid $GROUP_ID docker
RUN adduser docker sudo
RUN echo "docker ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
USER docker

# Setup hierarchical_3d_gaussians
RUN /opt/conda/bin/python -m ensurepip
RUN /opt/conda/bin/python -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
RUN /opt/conda/bin/python -m pip install plyfile tqdm joblib exif scikit-learn timm==0.4.5 opencv-python==4.9.0.80 gradio_imageslider gradio==4.29.0 matplotlib

With 125.Dockerfile in hierarchical-3d-gaussians/:

DATASET_DIR=<Path to dataset>
docker build -t hierarchical_3d_gaussians125 -f 125.Dockerfile .
docker run -it --gpus=all --rm -v ${PWD}:/host -v ${DATASET_DIR}:/data --network=host --ipc=host hierarchical_3d_gaussians125 /bin/sh -c "cd /host; bash"
rm -r submodules/hierarchy-rasterizer/build submodules/simple-knn/build submodules/gaussianhierarchy/build
cd submodules/gaussianhierarchy
cmake . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release
cd ../..
/opt/conda/bin/python scripts/full_train.py --project_dir /data

@haofengsiji
Copy link

haofengsiji commented Aug 28, 2024

Hi, I solved the problem. The submodules/gaussianhierarchy/setup. py the "extra_compile_args" modified to {"cxx": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "dependencies/eigen/")]}. The submodules/hierarchy-rasterizer/setup. py the "extra_compile_args" modified to {"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]}). Then reinstall the hierarchy-rasterizer "pip install submodules/hierarchy-rasterizer" Related issues: graphdeco-inria/gaussian-splatting#41 graphdeco-inria/diff-gaussian-rasterization#10

I think it should be __file__ instead of file in the two lines.

Yes it works! Just a caveat, -fno-gnu-unique can only be used on Linux and -Xcompiler can only be passed to nvcc.

This is a good point. In conclusion, the following code modification works for me on Ubuntu: line 29 in submodules/hierarchy-rasterizer/setup.py:

extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})

line 29 in submodules/gaussianhierarchy/setup.py:

extra_compile_args={"cxx": ["-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "dependencies/eigen/")]}

By modifying the two files above and reinstall via pip install -r requirements.txt, I got it to run smoothly. I'm using a Ubuntu 20.04 machine, with PyTorch 2.3.0 (build for CUDA 12.1) and CUDA 12.1 runtime (nvcc --version = 12.1) Note that this doesn't work if I'm using a CUDA 12.5 runtime as instructed above.

fix my promblem, thanks !!!

pytorch 2.3.0+cu121, nvcc 12.1

@alancneves
Copy link

Hi, I solved the problem. The submodules/gaussianhierarchy/setup. py the "extra_compile_args" modified to {"cxx": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "dependencies/eigen/")]}. The submodules/hierarchy-rasterizer/setup. py the "extra_compile_args" modified to {"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]}). Then reinstall the hierarchy-rasterizer "pip install submodules/hierarchy-rasterizer" Related issues: graphdeco-inria/gaussian-splatting#41 graphdeco-inria/diff-gaussian-rasterization#10

I think it should be __file__ instead of file in the two lines.

Yes it works! Just a caveat, -fno-gnu-unique can only be used on Linux and -Xcompiler can only be passed to nvcc.

This is a good point. In conclusion, the following code modification works for me on Ubuntu: line 29 in submodules/hierarchy-rasterizer/setup.py:

extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "third_party/glm/")]})

line 29 in submodules/gaussianhierarchy/setup.py:

extra_compile_args={"cxx": ["-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(__file__)), "dependencies/eigen/")]}

By modifying the two files above and reinstall via pip install -r requirements.txt, I got it to run smoothly. I'm using a Ubuntu 20.04 machine, with PyTorch 2.3.0 (build for CUDA 12.1) and CUDA 12.1 runtime (nvcc --version = 12.1) Note that this doesn't work if I'm using a CUDA 12.5 runtime as instructed above.

Fixed my problem on Ubuntu 22.04 + CUDA 11.8 + PyTorch 2.3.0 on docker!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests