Question for render feature? #7

SCUTykLin · 2024-03-23T07:50:18Z

Hello:
I previously attempted to render features with 256 dimensions, but CUDA indicated insufficient shared memory, allowing for a maximum of only 40 dimensions to be rendered. May I ask what changes you made to enable it to render 256 dimensions?

41xu · 2024-04-03T05:28:51Z

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

feature-3dgs/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu

Line 398 in 9e714ff

    
           cudaMalloc((void**)&collected_semantic_feature, NUM_SEMANTIC_CHANNELS * BLOCK_SIZE * sizeof(float));

If I misunderstand, please point me out.

JrMeng0312 · 2024-04-03T17:56:42Z

graphdeco-inria/gaussian-splatting#41 (comment) you can try this: adding "-Xcompiler -fno-gnu-unique" option in submodules/diff-gaussian-rasterization/setup.py: line 29 resolves the illegal memory access error in training.

extra_compile_args={"nvcc": ["-Xcompiler", "-fno-gnu-unique","-I" + os.path.join(os.path.dirname(os.path.abspath(file)), "third_party/glm/")]})

SCUTykLin · 2024-04-03T17:58:19Z

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

feature-3dgs/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu

Line 398 in 9e714ff

cudaMalloc((void**)&collected_semantic_feature, NUM_SEMANTIC_CHANNELS * BLOCK_SIZE * sizeof(float));

If I misunderstand, please point me out.

Thanks very very very much.

SCUTykLin · 2024-04-03T17:58:27Z

As far as I understand, in the rasterization process, they use shared memory for calculating the collected features/colors and for gradient calculation. The shared memory is limited by specific GPU. In this paper, they dynamically allocate a cuda array as a cache for the collected features to avoid using shared memory (of course it's the tradeoff between the need for dimension and shared memory issue). You can see the implementation here:

feature-3dgs/submodules/diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.cu

Line 398 in 9e714ff

cudaMalloc((void**)&collected_semantic_feature, NUM_SEMANTIC_CHANNELS * BLOCK_SIZE * sizeof(float));

If I misunderstand, please point me out.

Thanks very very very much.

Thanks

SCUTykLin closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question for render feature? #7

Question for render feature? #7

SCUTykLin commented Mar 23, 2024

41xu commented Apr 3, 2024

JrMeng0312 commented Apr 3, 2024

SCUTykLin commented Apr 3, 2024

SCUTykLin commented Apr 3, 2024

Question for render feature? #7

Question for render feature? #7

Comments

SCUTykLin commented Mar 23, 2024

41xu commented Apr 3, 2024

JrMeng0312 commented Apr 3, 2024

SCUTykLin commented Apr 3, 2024

SCUTykLin commented Apr 3, 2024