Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 2.05 KB

atomic_statement.md

File metadata and controls

10 lines (6 loc) · 2.05 KB

Statement about AtomicAdd Operation

Below we explain the arg parameter determ.

The atomicAdd operation in CUDA allows different threads to add values to a shared memory location concurrently. However, because these threads may perform the addition in varying orders across different runs, contention for the memory location introduces a degree of non-determinism. This is particularly problematic when dealing with floating-point numbers in float32 format, which are not continuous across their range. For instance, given three float32 variables a, b and c, the result of their summation can vary slightly at high decimal values depending on the order of addition (e.g., a + b + c vs. a + c + b), due to the inherent precision limitations of float32 arithmetic. This issue becomes significant in our inter-Gaussian context models, where we assign Gaussians to CUDA threads. These threads may need to add values to the same memory location (i.e., the voxel's feature) during the creation of the grids. Unfortunately, the slight randomness introduced by different summation orders in atomicAdd can cause small inconsistencies between encoding and decoding, potentially leading to difference in context, which in turn causes failures in the decoding process, despite these differences are minor.

To mitigate this, by setting determ to True, we switch from float32 to int32 for atomicAdd. The values are scaled by a factor of 1e4 and added as integers in CUDA, then divided by 1e4 to recover the original scale in Python. Since int32 is discrete and does not suffer from the same precision issues as float32, the addition order does not affect the result (i.e., a + b + c always equals a + c + b in int32). This approach ensures consistent decoding.

While this trick can work well for most cases, minor chances (about 2%) still exist that a 3DGS scene cannot be properly decoded, we are seeking for help from community to better solve this issue.