Below we explain the arg parameter determ
.
The atomicAdd
operation in CUDA allows different threads to add values to a shared memory location concurrently. However, because these threads may perform the addition in varying orders across different runs, contention for the memory location introduces a degree of non-determinism. This is particularly problematic when dealing with floating-point numbers in float32
format, which are not continuous across their range. For instance, given three float32
variables a
, b
and c
, the result of their summation can vary slightly at high decimal values depending on the order of addition (e.g., a + b + c
vs. a + c + b
), due to the inherent precision limitations of float32
arithmetic.
This issue becomes significant in our inter-Gaussian context models, where we assign Gaussians to CUDA threads. These threads may need to add values to the same memory location (i.e., the voxel's feature) during the creation of the grids. Unfortunately, the slight randomness introduced by different summation orders in atomicAdd
can cause small inconsistencies between encoding and decoding, potentially leading to difference in context, which in turn causes failures in the decoding process, despite these differences are minor.
To mitigate this, by setting determ
to True
, we switch from float32
to int32
for atomicAdd
. The values are scaled by a factor of 1e4
and added as integers in CUDA, then divided by 1e4
to recover the original scale in Python. Since int32
is discrete and does not suffer from the same precision issues as float32
, the addition order does not affect the result (i.e., a + b + c
always equals a + c + b
in int32
). This approach ensures consistent decoding.
While this trick can work well for most cases, minor chances (about 2%) still exist that a 3DGS scene cannot be properly decoded, we are seeking for help from community to better solve this issue.