Add some optimizations to utf8 decoding #1948

johnynek · 2020-07-09T20:22:35Z

This does four things:

adds Chunk.Queue.startsWith so we can be a bit more precise when checking for utf8 byte order mark
be miserly in internal methods with allocations and use null instead of Option (avoid allocation of Some on a loop).
use a Builder to avoid having to return a tuple and to reverse a list in an internal method.
avoid fold on an internal loop and just write out the while loop.

non

This looks good. Hopefully you got some great performance improvements from fewer allocations and less memory pressure?

I had a few small suggestions but none of them really affect the correctness of the PR, so take them or leave them.

non · 2020-07-10T00:44:46Z

core/shared/src/main/scala/fs2/text.scala

+          if (c == counter)
+            res = 0
+          else
+            res = counter + 1


I'm pretty sure res is already 0 so i think you just want if (c != counter) res = counter + 1 don't you?

non · 2020-07-10T00:47:18Z

core/shared/src/main/scala/fs2/text.scala

+          else
+            res = counter + 1
+          // exit the loop
+          idx = 0


If bs.size is 1 then what would happen? It seems like minIdx would be set to 0 (0.max(-2)) and idx would also be set to 0 (1-1). In that case, idx = 0 won't break you out of the loop. I think idx = -1 is a safer way to do this (although I would just use return myself).

Oh I see, you're counting on the decrement operator that occurs later. That makes sense, although IMO using -1 or return here might be a bit more reassuring to the reader. (That said, there's no bug so maybe it's not a big deal.)

non · 2020-07-10T00:57:02Z

core/shared/src/main/scala/fs2/text.scala

-          new String(allBytes.take(splitAt).toArray, utf8Charset) :: output,
-          Chunk.bytes(allBytes.drop(splitAt))
-        )
+      if (splitAt == allBytes.size) {


It's a small nit but I'd use length instead of size when working with arrays:

scala> val arr = Array(1,2,3) val arr: Array[Int] = Array(1, 2, 3) scala> u.reify { arr.size } val res1: reflect.runtime.universe.Expr[Int] = Expr[Int](Predef.intArrayOps(arr).size) scala> u.reify { arr.length } val res2: reflect.runtime.universe.Expr[Int] = Expr[Int](arr.length)

johnynek · 2020-07-11T18:17:26Z

Benchmark results:

on main:
[info] Benchmark                  (asciiStringSize)   Mode  Cnt       Score       Error  Units
[info] TextBenchmark.asciiDecode                128  thrpt    6  195334.730 ± 18599.880  ops/s
[info] TextBenchmark.asciiDecode               1024  thrpt    6  153746.264 ± 61576.197  ops/s
[info] TextBenchmark.asciiDecode               4096  thrpt    6  118761.218 ±  4695.784  ops/s

on this branch:
[info] Benchmark                  (asciiStringSize)   Mode  Cnt       Score      Error  Units
[info] TextBenchmark.asciiDecode                128  thrpt    6  218236.471 ± 1145.261  ops/s
[info] TextBenchmark.asciiDecode               1024  thrpt    6  192552.031 ± 1689.683  ops/s
[info] TextBenchmark.asciiDecode               4096  thrpt    6  146657.477 ±  589.035  ops/s

So this is 11% to 25% faster depending on the size, for this simple benchmark.

johnynek added 3 commits July 9, 2020 10:18

Add some optimizations to utf8 decoding

35c5e7c

remove a useless toArray

b1b1f27

optimize lastIncompleteBytes

3f9974d

non reviewed Jul 10, 2020

View reviewed changes

johnynek added 2 commits July 11, 2020 07:21

some review comments from Erik

353280e

add benchmark

8a321cc

run scalafmt

893c52f

mpilquist merged commit 150d846 into typelevel:main Jul 12, 2020

mpilquist added this to the 2.4.3 milestone Aug 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some optimizations to utf8 decoding #1948

Add some optimizations to utf8 decoding #1948

johnynek commented Jul 9, 2020

non left a comment

non Jul 10, 2020

non Jul 10, 2020

non Jul 10, 2020

johnynek commented Jul 11, 2020

Add some optimizations to utf8 decoding #1948

Add some optimizations to utf8 decoding #1948

Conversation

johnynek commented Jul 9, 2020

non left a comment

Choose a reason for hiding this comment

non Jul 10, 2020

Choose a reason for hiding this comment

non Jul 10, 2020

Choose a reason for hiding this comment

non Jul 10, 2020

Choose a reason for hiding this comment

johnynek commented Jul 11, 2020