1e100 is considered convertible to int64 and the result is invalid #93

alaviss · 2021-12-04T09:59:04Z

Example

This was extracted from tcompiletime_range_checks.nim:45, where the declaration should have produced an error but didn't.

echo int64(1e100)

Current Output

Expected Output

A compile-time error

Possible Solution

There are two parts to this issue:

First is this conditional:

nimskull/compiler/semexprs.nim

Lines 183 to 186 in f2a9b0c

    
           elif src.kind in nkFloatLit..nkFloat64Lit and 
        
               (classify(src.floatVal) in {fcNan, fcNegInf, fcInf} or 
        
                 src.floatVal.int64 notin firstOrd(c.config, targetTyp)..lastOrd(c.config, targetTyp)): 
        
             result = convNotInRange

Since this value won't fit in int64, converting become low(int64), which of course, is a valid int64 value (though do note that in C standard this is undefined). I think either we do the comparison in floats, or int128 it, which leads to this second portion:

nimskull/compiler/semfold.nim

Lines 400 to 401 in f2a9b0c

    
               of tyFloat..tyFloat64: 
        
                 result = newIntNodeT(toInt128(getFloat(a)), n, idgen, g)

toInt128(1e100) produced 0 here, which is invalid, and also means we can't use int128 to perform our checks.

Tagging @krux02 here, as he is the author of this code. I'm not sure what is the best to do here, though I think we should code in a range check for numbers that won't fit in int128 (as a fallback in case the check prior fails).

Additional Information

Found while testing nimskull on ARM machines

$ nim -v
Nim Compiler Version 1.6.0 [Linux: amd64]
Copyright (c) 2006-2021 by Andreas Rumpf

Source hash: ffb7dc7223ff0eb6768add1e065e406af3397ed6
Source date: 2021-12-03

active boot switches: -d:release

The text was updated successfully, but these errors were encountered:

krux02 · 2021-12-05T12:03:05Z

diff --git a/compiler/int128.nim b/compiler/int128.nim
index ee88be97c..efe890d13 100644
--- a/compiler/int128.nim
+++ b/compiler/int128.nim
@@ -3,7 +3,7 @@
 ## hold all from ``low(BiggestInt)`` to ``high(BiggestUInt)``, This
 ## type is for that purpose.
 
-from math import trunc
+from math import trunc, pow
 
 type
   Int128* = object
@@ -541,7 +541,14 @@ proc ldexp(x: float64, exp: cint): float64 {.importc: "ldexp", header: "<math.h>
 
 template bitor(a,b,c: Int128): Int128 = bitor(bitor(a,b), c)
 
+proc inInt128Range*(arg: float64): bool =
+  const lowerBound = -pow(2'f64, 127) 
+  const upperBound = +pow(2'f64, 127) # 2^127-1 can't be represented with float64 precision
+  lowerBound <= arg and arg < upperBound
+
 proc toInt128*(arg: float64): Int128 =
+  assert(inInt128Range(arg), "out of range")
+
   let isNegative = arg < 0
   let v0 = ldexp(abs(arg), -100)
   let w0 = uint64(trunc(v0))
diff --git a/compiler/semexprs.nim b/compiler/semexprs.nim
index e3defe03a..9e82902e8 100644
--- a/compiler/semexprs.nim
+++ b/compiler/semexprs.nim
@@ -156,10 +156,11 @@ proc checkConvertible(c: PContext, targetTyp: PType, src: PNode): TConvStatus =
       if src.kind in nkCharLit..nkUInt64Lit and
           src.getInt notin firstOrd(c.config, targetTyp)..lastOrd(c.config, targetTyp):
         result = convNotInRange
-      elif src.kind in nkFloatLit..nkFloat64Lit and
-          (classify(src.floatVal) in {fcNan, fcNegInf, fcInf} or
-            src.floatVal.int64 notin firstOrd(c.config, targetTyp)..lastOrd(c.config, targetTyp)):
-        result = convNotInRange
+      elif src.kind in nkFloatLit..nkFloat64Lit:
+        if not src.floatVal.inInt128Range:
+          result = convNotInRange
+        elif src.floatVal.toInt128 notin firstOrd(c.config, targetTyp)..lastOrd(c.config, targetTyp):
+          result = convNotInRange
     elif targetBaseTyp.kind in tyFloat..tyFloat64:
       if src.kind in nkFloatLit..nkFloat64Lit and
           not floatRangeCheck(src.floatVal, targetTyp):
diff --git a/todo.org b/todo.org
index 51576fa2a..28a152335 100644

krux02 · 2021-12-09T10:40:11Z

here is more inconsistency:

proc foo1(arg: float64): string =
  let x = int64(arg)
  $x

proc foo2(arg: float64): string {.compileTime.} =
  let x = int64(arg)
  $x

echo foo1(1e100)
echo foo2(1e100)
echo int64(1e100)

output:

-9223372036854775808
-9223372036854775808
0

zacharycarter · 2022-01-12T16:24:49Z

I'm currently working on replacing the compiler's int128 implementation with a signed and unsigned int128 implementation. I know we want to revisit the design of the compiler overall, but for now I think providing a better implementation will make these casts easier and understandable and also simpler to spec. Once I have finished the implementations, and the existing tests are passing I will add new ones for the float64 to int128.

krux02 · 2022-01-13T15:28:17Z

@zacharycarter I've written the int128 type and of course I am biased here, but I don't think the root cause of this bug here is the int128 type or its implementation. I've worked a lot on this type, I know how it works, and I've written a lot of integer handling with different sizes in the VM. There is certainly some shared knowledge between this type and the VM type. Reimplementing int128 to a signed int128 type (sign + absolute value) would remove all of it and therefore I don't like the idea.

So I would really like to unstand your problems with the int128 type as it is.

Yes there are bugs, but they are to my knowledge not because of the int128 type, they are because of a problematic specification of Nim and developers pulling the language in different directions.

My suggestion for this fix would be the following:

Make damn sure that compile time and runtime behaves exactly the same.

This means int64(1e100) would become -9223372036854775808, because that is what it is at runtime, at compiletime (VM) and that can be implemented in semfold as well. Yes this means less error detection at semfold.

alaviss · 2022-01-13T16:03:20Z

This means int64(1e100) would become -9223372036854775808, because that is what it is at runtime, at compiletime (VM) and that can be implemented in semfold as well.

Note that this is not guaranteed. Nim does not have float range check so the behavior varies wildly between architectures. We should look into defining runtime checks for float conversions as well.

zacharycarter · 2022-01-13T21:45:02Z

@zacharycarter I've written the int128 type and of course I am biased here, but I don't think the root cause of this bug here is the int128 type or its implementation. I've worked a lot on this type, I know how it works, and I've written a lot of integer handling with different sizes in the VM. There is certainly some shared knowledge between this type and the VM type. Reimplementing int128 to a signed int128 type (sign + absolute value) would remove all of it and therefore I don't like the idea.

So I would really like to unstand your problems with the int128 type as it is.

@krux02 - thanks for the reply my dude! I can't say I necessarily have a problem with the implementation, I just don't understand parts of it, and since I needed some frame of reference to test my findings against, I thought the best way would be to port some implementation I knew that was correct and that I could test against.

Yes there are bugs, but they are to my knowledge not because of the int128 type, they are because of a problematic specification of Nim and developers pulling the language in different directions.

@krux02 - good to know! I can stop wasting my time here then xD

My suggestion for this fix would be the following:

Make damn sure that compile time and runtime behaves exactly the same.

This means int64(1e100) would become -9223372036854775808, because that is what it is at runtime, at compiletime (VM) and that can be implemented in semfold as well. Yes this means less error detection at semfold.

Okay, I see - thanks for the explanation. I don't know anything about the compiler backend really other than it's a compiler and I have uni textbook knowledge of them. I've barely done anything with Nim's, but I'll look into it.

This means int64(1e100) would become -9223372036854775808, because that is what it is at runtime, at compiletime (VM) and that can be implemented in semfold as well.

Note that this is not guaranteed. Nim does not have float range check so the behavior varies wildly between architectures. We should look into defining runtime checks for float conversions as well.

Okay first I will check out what Arne is suggesting regarding semfold and then I'll look into this.

saem · 2022-01-14T02:36:00Z

For those of us peripherally following along, it seems there are two lessons here:

literal typing/conversion is broken, we should track width and detect narrowing
VM math needs tighter specification, see below

The direction with VM math operations in the compiler/VM, especially floating point need to be agnostic of the compiler host platform.

Instead they should work as per some sort of canonical understanding of the target back-end or a single agreed upon understanding.

That way when I compile code on some funky CPU, all the compile time evaluation should be identical to some more popular CPU/mArchitecture for the same code and parameters.

This especially matters for floating point numbers, very wide types that are rarely natively supported (128bit integers), or targets that treat don't have real int support (JS).

That sounds about right?

zacharycarter · 2022-01-15T03:42:53Z

That sounds about right?

It does but I'm still not sure what to do here in order to make any progress. I think I'm in slightly over my head 😅

I might try to find something simpler to work on - lately I haven't had much time to work on anything and I don't want to commit to something I barely understand to begin with.

krux02 · 2022-01-15T16:14:15Z

Okay, I see - thanks for the explanation. I don't know anything about the compiler backend really other than it's a compiler and I have uni textbook knowledge of them. I've barely done anything with Nim's, but I'll look into it.

Well I can give you a brief overview of how Nim works here. There are three ways to compute an expression in Nim:

compile to C/C++/JS and execute at runtime
compile to byecode and execute at compile time
semfold

I think option 1 and two are pretty much clear. Not in detail, but generally enough for this conversation. Semfold though is a different thing. It is just a compiler pass that does ast substitutions for simple expressions that can be computed at compile time. This type of code execution is usually implemented in compilers (such as C) as an optimization pass, so that (1+5) becomes a literal 6. I think c++ template metaprogramming is based on something like this. But in Nim, semfold is completely redundant to the VM (and the semfold equivalent in the backend compiler). Except from the part where semfold does error generation for out of range values.

Currently I don't know what to do or if my suggestion is still a good suggestion. After all, maybe it is good if the convesion of one googol (yes 1e100 has a name) to int64 would raise an error at runtime and VM as well.

alaviss · 2022-01-15T22:24:16Z

Right now the only defined rules in Nim are:

intX <-> intY is range-checked.
intX -> uintY is not range-checked and is a bit-reinterpretation iirc.
uintX -> intY is range-checked.
floatX -> intY is undefined if the floatX with the part after the . dropped cannot be represented by intY (inherited from C).
intX -> floatX produces the nearest integer representable by the target floating point type (inherited from C). The resulting float might be larger or smaller than intX itself.

I think the right thing to do here is to make floatX -> intY defined by enforcing range check for it across runtime and compile time.

157: use improved range check of 128 for float64 r=alaviss a=krux02 * Add a range check function to test if a float64 is in rage of the `int128` type. * Use that range check function for `semexpr.checkConvertible` * fixes #93 While this fixes the original issue mentioned above, it does not fix or implement the discussions in that issue. They need to be ported to their own issue. Co-authored-by: Arne Döring <arne.doering@gmx.net>

krux02 mentioned this issue Jan 16, 2022

use improved range check of 128 for float64 #157

Merged

bors bot closed this as completed in 1cc5367 Jan 28, 2022

bors bot closed this as completed in #157 Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1e100 is considered convertible to int64 and the result is invalid #93

1e100 is considered convertible to int64 and the result is invalid #93

alaviss commented Dec 4, 2021 •

edited

Loading

krux02 commented Dec 5, 2021

krux02 commented Dec 9, 2021

zacharycarter commented Jan 12, 2022

krux02 commented Jan 13, 2022

alaviss commented Jan 13, 2022

zacharycarter commented Jan 13, 2022

saem commented Jan 14, 2022

zacharycarter commented Jan 15, 2022

krux02 commented Jan 15, 2022

alaviss commented Jan 15, 2022

1e100 is considered convertible to int64 and the result is invalid #93

1e100 is considered convertible to int64 and the result is invalid #93

Comments

alaviss commented Dec 4, 2021 • edited Loading

Example

Current Output

Expected Output

Possible Solution

Additional Information

krux02 commented Dec 5, 2021

krux02 commented Dec 9, 2021

zacharycarter commented Jan 12, 2022

krux02 commented Jan 13, 2022

alaviss commented Jan 13, 2022

zacharycarter commented Jan 13, 2022

saem commented Jan 14, 2022

zacharycarter commented Jan 15, 2022

krux02 commented Jan 15, 2022

alaviss commented Jan 15, 2022

alaviss commented Dec 4, 2021 •

edited

Loading