[Bug] Precision issue when working with floating point constants (FloatImm) #17276
Labels
needs-triage
PRs or issues that need to be investigated by maintainers to find the right assignees to address it
type: bug
I have been working with TVM+Ansor (auto-scheduler) to generate code for a set of operators for both CPU (LLVM backend) and GPU (CUDA backend).
The operators use trigonometric functions in some steps, and I set the value of PI with
pi_const = te.const(np.pi, X.dtype)
.One thing I noticed was that CPU and GPU results were diverging.
I started to check what could be the source of this issue in my code and I found out that COS and SIN were yielding different values, which led me to believe it was a problem in the scheduling or code generation steps.
To check if the schedule exploration with Ansor was in some way causing this, I tested similar operators with AutoTVM, and the same problem was evident.
The only thing left to check was the code generation pipeline so I started to check the
codegen
source code and found out what I believe is the root cause for this behavior.When generating CUDA code, FloatImm are treated as shown in the code snipped extracted from codegen_cuda.cc
So Float32 and Float64 (double) are being treated the same and when generating the source code a value such as 3.141592653589793 is being reduced to 3.141593e+00. And this precision loss due to string conversion when generating the CUDA source code leads to the problem I am having. I tried changing the
case 64
rule to have something liketemp << std::fixed << std::setprecision(15) << op->value;
and the results start to converge.
I believe this issue also happens with other backends such as C, but when using the LLVM backend there is no issue.
Expected behavior
Close COS and SIN values for CPU (LLVM) and GPU (CUDA).
Actual behavior
Divergent values - only CPU results match results obtained with the NumPy ground truth.
The output below can be obtained using the code listed in Steps to reproduce
The main issue here for my use case is the COS of PI/2 which is resulting in a negative number. This value matches
np.cos(3.141593/2)
which is the value to which the float constant is being rounded to when printing it in scientific notation (3.141593e+00).Steps to reproduce
Here is a couple of simplified modules that reproduce this issue
Also the CUDA code produced is listed below, which shows that 3.141592653589793 (
np.pi
) is being changed to 3.141593e+00.Triage
The text was updated successfully, but these errors were encountered: