-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some optimizations to utf8 decoding #1948
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Hopefully you got some great performance improvements from fewer allocations and less memory pressure?
I had a few small suggestions but none of them really affect the correctness of the PR, so take them or leave them.
if (c == counter) | ||
res = 0 | ||
else | ||
res = counter + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure res
is already 0
so i think you just want if (c != counter) res = counter + 1
don't you?
else | ||
res = counter + 1 | ||
// exit the loop | ||
idx = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If bs.size
is 1
then what would happen? It seems like minIdx
would be set to 0
(0.max(-2)
) and idx
would also be set to 0
(1-1
). In that case, idx = 0
won't break you out of the loop. I think idx = -1
is a safer way to do this (although I would just use return
myself).
Oh I see, you're counting on the decrement operator that occurs later. That makes sense, although IMO using -1
or return
here might be a bit more reassuring to the reader. (That said, there's no bug so maybe it's not a big deal.)
new String(allBytes.take(splitAt).toArray, utf8Charset) :: output, | ||
Chunk.bytes(allBytes.drop(splitAt)) | ||
) | ||
if (splitAt == allBytes.size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a small nit but I'd use length
instead of size
when working with arrays:
scala> val arr = Array(1,2,3)
val arr: Array[Int] = Array(1, 2, 3)
scala> u.reify { arr.size }
val res1: reflect.runtime.universe.Expr[Int] = Expr[Int](Predef.intArrayOps(arr).size)
scala> u.reify { arr.length }
val res2: reflect.runtime.universe.Expr[Int] = Expr[Int](arr.length)
So this is 11% to 25% faster depending on the size, for this simple benchmark. |
This does four things: