use dragonbox algorithm; alternative to #18008 #18139

Araq · 2021-05-31T20:42:19Z

Note: the string handling code is still nuts, impossible to verify. We should rewrite it to write into var openArray and benefit from the bounds checking.

timotheecour · 2021-06-01T03:25:28Z

after benchmarking this PR vs #18008 (with -d:danger), this results in slower code, loosing important benefits of using dragonbox for people interested in performance, float to string can be a bottleneck in some applications;

in addition, this causes significantly slower code in VM for addFloat and slight increase in compile times for processing system/dragonbox.nim, and we lose connection with original code in case it gets updates; there will be more such cases where we should handle external C, C++ depdendencies anyways.

I'd rather follow your suggestion from #18008 (comment)

We need an external "bundler" tool that is available at boot time (maybe by generating C sources for it).

and update the original PR to do that instead, deferring the addDependency approach to future work

(unrelated: the bootstrap failure in this PR is yet another reason why updating csources_v1 to a recent nim release reduces friction when writing compiler code; in this case >= 1.4.0 would make it work; refs timotheecour#251); at least now we have a better CI/build script integration that should correctly handle csources_v1 updates, and soon a nimdigger tool ready to be used

Araq · 2021-06-01T03:48:37Z

after benchmarking this PR vs #18008 (with -d:danger), this results in slower code

Which benchmark, how much is it slower and why? We'll be in the "omg, we need to use existing (risky) C(++) code for good because we lose performance otherwise" world for good. I like the NIH world better -- it's also what Go/D/Rust do btw.

Araq · 2021-06-01T03:52:49Z

there will be more such cases where we should handle external C, C++ dependencies anyways.

But not for this case. Code like

  var n: int = c_sprintf(addr buf, "%.16g", value)
  var hasDot = false
  for i in 0..n-1:
    if buf[i] == ',':
      buf[i] = '.'
      hasDot = true
    elif buf[i] in {'a'..'z', 'A'..'Z', '.'}:
      hasDot = true
  if not hasDot:
    buf[n] = '.'
    buf[n+1] = '0'
    buf[n+2] = '\0'
    result = n + 2
  else:
    result = n
  # On Windows nice numbers like '1.#INF', '-1.#INF' or '1.#NAN' or 'nan(ind)'
  # of '-1.#IND' are produced.
  # We want to get rid of these here:
  if buf[n-1] in {'n', 'N', 'D', 'd', ')'}:
    writeToBuffer(buf, "nan")
    result = 3
  elif buf[n-1] == 'F':
    if buf[0] == '-':
      writeToBuffer(buf, "-inf")
      result = 4
    else:
      writeToBuffer(buf, "inf")
      result = 3

does scream "we need to be in control of the library!"...

timotheecour · 2021-06-01T04:35:25Z

Which benchmark, how much is it slower and why?

1.2 to 1.6 X slower depending on distribution (in VM it's up to 2.8X slower, it matters less but still)
will share benchmarks tmrw after double checking

as to why, i don't know yet, but it could be due to something getting lost in c2nim/manual postprocessing translation ; i tried different things including replacing -d:danger with --passc:-O3 --stacktrace:off --checks:off to get closer to what original PR uses to compile dragonbox.cc, but no effect there

While 1.6X doesn't matter for most programs, it will matter for some programs where string to float is on the critical path, and it's easy enough to just erase the gap.

Araq · 2021-06-01T20:29:42Z

Independent CI failure, merging.

Varriount · 2021-06-01T21:29:34Z

If one must use the C++ version, would it be possible to use patchFile to substitute an alternate implementation?

timotheecour · 2021-06-02T01:19:36Z

@Araq here's the benchmark:

results

(taking best of 5 runs independently for CT vs RT, but it doesn't vary much between runs)
PR #18008:
CT: 0.195
RT: 2.188

PR #18139 (which just got merged in devel):
CT: 0.484
RT: 3.109

PR #18008 is ~1.4X faster at RT and ~2.5X faster at CT compared to #18139

details

# for PR #18008
nim r -o:/tmp/z03 -d:case_pr_18008 -d:danger --experimental:vmopsdanger -f t12341.nim
# for PR #18139
nim r -o:/tmp/z02 -d:danger --experimental:vmopsdanger -f t12341.nim

--experimental:vmopsdanger -f are only needed for static: main() so that cpuTime works, and -f to force recompiling so that VM code re-runs, you can remove those if only looking at non-VM code
n is taken to be smaller for VM to avoid having to pass --maxLoopIterationsVM
in RT code, I'm using the fastest float to string code available for each approach, ie taking an array buffer instead of calling addFloat on a var string, to avoid the corresponding (small but nonzero) overhead for applications caring about optimal performance achievable
getFloat can be changed to any other code generating floats, but this template was chosen to minimize the cost (which would affect benchmarking with unrelated code) as well as generate realistic floats (hence starting at 1.0 instead of at 0, which would just generate subnormal numbers); using rand(1.0) is possible too but would add to the timing because rand isn't as cheap;
cast[float](start + (i * 2038074743)) (with mult by 100000000'th prime) or similar is another sensible choice which would generate uniform log-spaced floats from 1.0 to 4e13; the RT factor in this case is 1.3X instead of 1.4X, presumably because of the extra cost for multiplication

benchmark code

# t12341.nim:
when true:
  import std/times
  when defined case_pr_18008:
    # PR https://github.com/nim-lang/Nim/pull/18008
    # 3080108d57ea553142036764a118921845c9005f, as of revert the diff in system.nim (moved it to another PR)
    import std/strfloats
    const N = 64
  else:
    # after 63db2b19bf0787db89ce89c8861e976d44e33cbd https://github.com/nim-lang/Nim/pull/18139
    import system/formatfloat
    const N = 65

  const start = cast[uint](1.0)
  template getFloat(i): float =
    cast[float](start + i) # consecutive floats
    # cast[float](start + (i * 2038074743)) # consecutive floats spaced by the 100000000'th prime

  proc main() =
    var s = ""
    var c = 0
    var n = 0'u
    when nimvm:
      n = 1_000_000'u # can increase if you also increase `--maxLoopIterationsVM`
    else:
      n = 100_000_000'u
    var buf: array[N, char]
    let t = cpuTime()
    for i in 0'u ..< n:
      let f = getFloat(i)
      when nimvm:
        s.setLen 0
        s.addFloat f
        c+=s.len
      else:
        # fastest float to string (avoiding addFloat overhead)
        when defined case_pr_18008:
          let m = toString(buf, f)
        else:
          let m = writeFloatToBuffer(buf, f)
        c+=m
    echo (c, cpuTime() - t)
  static: main() # requires # --experimental:vmopsdanger -f
  main()

conclusions

IMO this benchmark is fair, if you disagree please explain
the gap narrowed since the previous version of this PR (refs use dragonbox algorithm; alternative to #18008 #18139 (comment)), which is good, maybe the last few commits improved performance
the 2.5X slowdown wrt 10-70x faster $float, roundtrip + correct rounding via dragonbox #18008 in VM code should be mostly fixable by adding a vmops for addFloat (as was done in 10-70x faster $float, roundtrip + correct rounding via dragonbox #18008), so I'm not worried about that
the 1.4X slowdown at RT compared to 10-70x faster $float, roundtrip + correct rounding via dragonbox #18008 is fine for most applications, but not for applications where float to string is a hotspot, so we should IMO provide at least an opt-in way to swap in the C++ code via a -d: compiler flag, or at please provide an easy way for users (or narrow the gap in the nim port)
the float32 case still needs to be addressed; the dragonbox.cc code only handled float64:

var a = 1.1'f32
doAssert $a == "1.1", $a # fails

as explained in #18008 (comment), Drachennest-schubfach_32 seems like the simplest approach, as I did here: timotheecour#732 using lib/vendor/drachennest/schubfach_32.cc from https://github.com/abolz/Drachennest/blob/master/src/schubfach_32.cc; this would be the code to port/wrap
=> followup #18148

independently of all this, IMO addDependency is still a desirable feature for reasons described here 10-70x faster $float, roundtrip + correct rounding via dragonbox #18008 (comment); that's the simplest/most flexible way to handle complex dependencies like those needed for std/decimals, std/bigints, etc, which are significantly more involved than dragonbox.cc (for which c2nim required already manual patching)

With the float string representation fixed in Nim devel (See nim-lang/Nim#18139), that uncovered a bug in the logic for parsing float TOML values. For e.g. to parse "foo = 0.123", internally, 0.1 + 0.02 + 0.003 was done which would evaluate to 0.12300000000000001. Earlier float stringification bug caused that value to print out as "0.123" instead of "0.12300000000000001". This is now fixed by using parseutils.parsefloat, which parses the "0.123" float value as 0.123 and not 0.12300000000000001. Fixes NimParsers#45.

With the float string representation fixed in Nim devel (See nim-lang/Nim#18139), that uncovered a bug in the logic for parsing float TOML values. For e.g. to parse "foo = 0.123", internally, 0.1 + 0.02 + 0.003 was done which would evaluate to 0.12300000000000001. Earlier float stringification bug caused that value to print out as "0.123" instead of "0.12300000000000001". This is now fixed by using parseutils.parsefloat, which parses the "0.123" float value as 0.123 and not 0.12300000000000001. Fixes #45.

Fixes nim-lang#14407 . This issue was fixed by nim-lang#18139.

Fixes #14407 . This issue was fixed by #18139.

lib/system/dragonbox.nim

* use dragonbox algorithm; alternative to nim-lang#18008 * removed unsafe code

Fixes nim-lang#14407 . This issue was fixed by nim-lang#18139.

use dragonbox algorithm; alternative to #18008

92de0cc

Araq added 7 commits June 1, 2021 13:54

progress

cf6ef7e

progress

ec55bd1

progress

1b540d3

progress

dca0edc

progress

953a560

removed unsafe code

a74887e

progress

6ed07bf

Araq merged commit 63db2b1 into devel Jun 1, 2021

Araq deleted the araq-dragonbox branch June 1, 2021 20:29

This was referenced Jun 2, 2021

10-70x faster $float, roundtrip + correct rounding via dragonbox #18008

Closed

string to float not roundtrip for float32 #18148

Closed

kaushalmodi mentioned this pull request Jun 7, 2021

Fix float-parsing logic NimParsers/parsetoml#46

Merged

kaushalmodi added a commit to kaushalmodi/Nim that referenced this pull request Jun 10, 2021

float parsing: Add test for a fixed issue

76ca2fd

Fixes nim-lang#14407 . This issue was fixed by nim-lang#18139.

kaushalmodi mentioned this pull request Jun 10, 2021

float parsing: Add test for a fixed issue #18232

Merged

timotheecour mentioned this pull request Jun 21, 2021

Fast and accurate float to string conversion (ryu) nim-lang/needed-libraries#104

Closed

kaushalmodi added a commit to kaushalmodi/Nim that referenced this pull request Jun 23, 2021

float parsing: Add test for a fixed issue

915be25

Fixes nim-lang#14407 . This issue was fixed by nim-lang#18139.

kaushalmodi added a commit to kaushalmodi/Nim that referenced this pull request Jun 24, 2021

float parsing: Add test for a fixed issue

3a6b7f2

Fixes nim-lang#14407 . This issue was fixed by nim-lang#18139.

Araq pushed a commit that referenced this pull request Jun 24, 2021

float parsing: Add test for a fixed issue (#18232)

f128f79

Fixes #14407 . This issue was fixed by #18139.

arnetheduck reviewed Jul 16, 2021

View reviewed changes

lib/system/dragonbox.nim Show resolved Hide resolved

PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022

use dragonbox algorithm; alternative to nim-lang#18008 (nim-lang#18139)

4a1c4b5

* use dragonbox algorithm; alternative to nim-lang#18008 * removed unsafe code

PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022

float parsing: Add test for a fixed issue (nim-lang#18232)

0032615

Fixes nim-lang#14407 . This issue was fixed by nim-lang#18139.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use dragonbox algorithm; alternative to #18008 #18139

use dragonbox algorithm; alternative to #18008 #18139

Araq commented May 31, 2021 •

edited

Loading

timotheecour commented Jun 1, 2021 •

edited

Loading

Araq commented Jun 1, 2021

Araq commented Jun 1, 2021

timotheecour commented Jun 1, 2021 •

edited

Loading

Araq commented Jun 1, 2021

Varriount commented Jun 1, 2021

timotheecour commented Jun 2, 2021 •

edited

Loading

use dragonbox algorithm; alternative to #18008 #18139

use dragonbox algorithm; alternative to #18008 #18139

Conversation

Araq commented May 31, 2021 • edited Loading

timotheecour commented Jun 1, 2021 • edited Loading

Araq commented Jun 1, 2021

Araq commented Jun 1, 2021

timotheecour commented Jun 1, 2021 • edited Loading

Araq commented Jun 1, 2021

Varriount commented Jun 1, 2021

timotheecour commented Jun 2, 2021 • edited Loading

results

details

benchmark code

conclusions

Araq commented May 31, 2021 •

edited

Loading

timotheecour commented Jun 1, 2021 •

edited

Loading

timotheecour commented Jun 1, 2021 •

edited

Loading

timotheecour commented Jun 2, 2021 •

edited

Loading