Specialize nextind and prevind for String #16648

TotalVerb · 2016-05-29T17:25:16Z

Now that there is only one String type in Base, it might be worth optimizing it. The specializations here get around two-fold performance boost compared to the generic variants:

In the benchmarks below, Base.{next|prev}ind refers to the old version, and {next|prev}ind to the new version.

julia> @benchmark sum(prevind("Hello World", i) for i in -1:11)
Trial(130.00 ns)

julia> @benchmark sum(Base.prevind("Hello World", i) for i in -1:11)
Trial(238.00 ns)

julia> @benchmark sum(nextind("Hello World", i) for i in -1:11)
Trial(122.00 ns)

julia> @benchmark sum(Base.nextind("Hello World", i) for i in -1:11)
Trial(286.00 ns)

julia> @benchmark sum(prevind("αβγδϵζ😄🍕", i) for i in 0:21)
Trial(224.00 ns)

julia> @benchmark sum(Base.prevind("αβγδϵζ😄🍕", i) for i in 0:21)
Trial(490.00 ns)

julia> @benchmark sum(nextind("αβγδϵζ😄🍕", i) for i in 0:21)
Trial(199.00 ns)

julia> @benchmark sum(Base.nextind("αβγδϵζ😄🍕", i) for i in 0:21)
Trial(556.00 ns)

TotalVerb · 2016-05-29T17:30:55Z

Despite tests passing locally, it seems that some behaviour is broken. Closing temporarily.

TotalVerb · 2016-05-29T17:37:24Z

I am reopening because I am of the opinion that the changed behaviour is probably inconsequential. There isn't a strong reason to prefer the current behaviour over the new behaviour. In fact, the new behaviour is monotonic, which might even be more elegant (not that it matters in these cases).

julia> const test = "🍕"
"🍕"

julia> test.data
4-element Array{UInt8,1}:
 0xf0
 0x9f
 0x8d
 0x95

julia> Base.nextind(test, 1)
5

julia> Base.nextind(test, 2)
3

julia> Base.nextind(test, 3)
4

julia> Base.nextind(test, 4)
5

julia> nextind(test, 1)
5

julia> nextind(test, 2)
5

julia> nextind(test, 3)
5

julia> nextind(test, 4)
5

nalimilan · 2016-05-29T18:13:35Z

At least, the current behavior is consistent in returning i+1 when i > endof(s). OTOH, the one you suggest will always return the end of underlying array (which is an implementation detail), except when i is higher than that.

Regarding performance, I would have thought the two definition would be essentially identical after inlining. It would be interesting to compare the generated code. Also, why are you taking the sum in your benchmarks?

TotalVerb · 2016-05-29T20:35:22Z

I can't see what the advantage is in returning end+1. The end of the underlying array is already revealed by nextind on endof.

I think part of the speed bonus comes from avoiding unnecessary bounds checks in isvalid and not doing an expensive endof computation unless necessary. Computing endof is potentially expensive for strings that end in a long character, and has locality issues.

As for sum, it's a way to prevent the compiler from optimizing out computations that aren't used. I don't know if this optimization is actually performed in this case.

nalimilan · 2016-05-29T20:55:04Z

I can't see what the advantage is in returning end+1. The end of the underlying array is already revealed by nextind on endof.

I'm not saying that it matters a lot, but you said the new behavior was better.

I think part of the speed bonus comes from avoiding unnecessary bounds checks in isvalid and not doing an expensive endof computation unless necessary. Computing endof is potentially expensive for strings that end in a long character, and has locality issues.

Makes sense.

nalimilan · 2016-05-29T20:56:21Z

base/strings/basic.jl

+    if i > stop
+        return endof(s)
+    end
+    i -= oftype(i, 1)


I don't think you should call oftype here. The index type for String is Int at the moment, we only accept Integer as input for convenience. Anyway, it doesn't make sense to be more general here than for AsbtractString below.

At least the functions should be made type stable. Currently if a bigger type than machine Int is passed in, the return type is a Union and that's not good. I'll make both functions (including the generic ones) return Int always.

nalimilan · 2016-05-29T21:06:07Z

I even wonder whether it's a good idea to provide default nextind and prevind methods for AbstractString, since in many cases they will have the same performance issue due to endof. We could only make this both generic and efficient by introducing a companion to endof which would return the last possible index (since this is generally faster than endof).

TotalVerb · 2016-05-30T01:07:14Z

I think the current nextind and prevind are reasonably fast. They should be made faster for String because it's the standard type, but they're good enough for user-defined types.

Variable-length encodings can be efficiency problems in different ways. Our strlen (length) is as fast as C's, for example, which means for large strings it's very, very slow. I don't think we can avoid the programmer having to keep in mind the complexity of every operation.

tkelman · 2016-05-30T05:13:50Z

base/strings/basic.jl

-nextind(s::DirectIndexString, i::Integer) = i+1
-nextind(s::AbstractArray   , i::Integer) = i+1
+prevind(s::DirectIndexString, i::Integer) = convert(Int, i)-1
+prevind(s::AbstractArray   , i::Integer) = convert(Int, i)-1


Int(i-1) would be a little more concise - the alignment also looks funny here, though it was that way before your change

I was hoping to do ::Int on the function instead but this doesn't seem to work yet. Used Int(x)-1 as a stopgap in the meantime.

~~Int(i-1) would have better behavior near overflow corner cases IMO~~

actually could go either way, the smaller sizes would be better to convert before subtracting, larger sizes would be better to convert after. smaller sizes are probably more likely to be seen near overflow

Reasonable to keep it like this then. If a big integer type would overflow then it's not a good index anyhow.

Could you change convert(Int, ...) to Int so that we can merge the PR?

TotalVerb · 2016-06-04T01:32:28Z

Thanks for the review. The issues have been addressed and I have squashed commits. Anything more?

TotalVerb closed this May 29, 2016

TotalVerb reopened this May 29, 2016

nalimilan reviewed May 29, 2016
View reviewed changes

tkelman reviewed May 30, 2016
View reviewed changes

Specialize nextind and prevind for String

9c47bd2

TotalVerb force-pushed the fast-utf8string branch from 3e63a0e to 9c47bd2 Compare June 1, 2016 00:33

ViralBShah added the strings "Strings!" label Jun 4, 2016

tkelman mentioned this pull request Jun 7, 2016

misc. benchmark regressions since 0.4 #16128

Closed

JeffBezanson merged commit 11e4031 into JuliaLang:master Jun 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize nextind and prevind for String #16648

Specialize nextind and prevind for String #16648

TotalVerb commented May 29, 2016 •

edited

Loading

TotalVerb commented May 29, 2016

TotalVerb commented May 29, 2016 •

edited

Loading

nalimilan commented May 29, 2016

TotalVerb commented May 29, 2016 •

edited

Loading

nalimilan commented May 29, 2016

nalimilan May 29, 2016

TotalVerb May 30, 2016

nalimilan commented May 29, 2016

TotalVerb commented May 30, 2016

tkelman May 30, 2016

TotalVerb May 30, 2016

tkelman May 30, 2016 •

edited

Loading

TotalVerb May 30, 2016

nalimilan May 31, 2016

TotalVerb commented Jun 4, 2016

Specialize nextind and prevind for String #16648

Specialize nextind and prevind for String #16648

Conversation

TotalVerb commented May 29, 2016 • edited Loading

TotalVerb commented May 29, 2016

TotalVerb commented May 29, 2016 • edited Loading

nalimilan commented May 29, 2016

TotalVerb commented May 29, 2016 • edited Loading

nalimilan commented May 29, 2016

nalimilan May 29, 2016

Choose a reason for hiding this comment

TotalVerb May 30, 2016

Choose a reason for hiding this comment

nalimilan commented May 29, 2016

TotalVerb commented May 30, 2016

tkelman May 30, 2016

Choose a reason for hiding this comment

TotalVerb May 30, 2016

Choose a reason for hiding this comment

tkelman May 30, 2016 • edited Loading

Choose a reason for hiding this comment

TotalVerb May 30, 2016

Choose a reason for hiding this comment

nalimilan May 31, 2016

Choose a reason for hiding this comment

TotalVerb commented Jun 4, 2016

TotalVerb commented May 29, 2016 •

edited

Loading

TotalVerb commented May 29, 2016 •

edited

Loading

TotalVerb commented May 29, 2016 •

edited

Loading

tkelman May 30, 2016 •

edited

Loading