-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize nextind and prevind for String #16648
Conversation
Despite tests passing locally, it seems that some behaviour is broken. Closing temporarily. |
I am reopening because I am of the opinion that the changed behaviour is probably inconsequential. There isn't a strong reason to prefer the current behaviour over the new behaviour. In fact, the new behaviour is monotonic, which might even be more elegant (not that it matters in these cases). julia> const test = "🍕"
"🍕"
julia> test.data
4-element Array{UInt8,1}:
0xf0
0x9f
0x8d
0x95
julia> Base.nextind(test, 1)
5
julia> Base.nextind(test, 2)
3
julia> Base.nextind(test, 3)
4
julia> Base.nextind(test, 4)
5
julia> nextind(test, 1)
5
julia> nextind(test, 2)
5
julia> nextind(test, 3)
5
julia> nextind(test, 4)
5 |
At least, the current behavior is consistent in returning Regarding performance, I would have thought the two definition would be essentially identical after inlining. It would be interesting to compare the generated code. Also, why are you taking the sum in your benchmarks? |
I can't see what the advantage is in returning end+1. The end of the underlying array is already revealed by nextind on endof. I think part of the speed bonus comes from avoiding unnecessary bounds checks in isvalid and not doing an expensive endof computation unless necessary. Computing endof is potentially expensive for strings that end in a long character, and has locality issues. As for sum, it's a way to prevent the compiler from optimizing out computations that aren't used. I don't know if this optimization is actually performed in this case. |
I'm not saying that it matters a lot, but you said the new behavior was better.
Makes sense. |
if i > stop | ||
return endof(s) | ||
end | ||
i -= oftype(i, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should call oftype
here. The index type for String
is Int
at the moment, we only accept Integer
as input for convenience. Anyway, it doesn't make sense to be more general here than for AsbtractString
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the functions should be made type stable. Currently if a bigger type than machine Int
is passed in, the return type is a Union
and that's not good. I'll make both functions (including the generic ones) return Int
always.
I even wonder whether it's a good idea to provide default |
I think the current Variable-length encodings can be efficiency problems in different ways. Our |
nextind(s::DirectIndexString, i::Integer) = i+1 | ||
nextind(s::AbstractArray , i::Integer) = i+1 | ||
prevind(s::DirectIndexString, i::Integer) = convert(Int, i)-1 | ||
prevind(s::AbstractArray , i::Integer) = convert(Int, i)-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int(i-1)
would be a little more concise - the alignment also looks funny here, though it was that way before your change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to do ::Int
on the function instead but this doesn't seem to work yet. Used Int(x)-1
as a stopgap in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int(i-1)
would have better behavior near overflow corner cases IMO
actually could go either way, the smaller sizes would be better to convert before subtracting, larger sizes would be better to convert after. smaller sizes are probably more likely to be seen near overflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable to keep it like this then. If a big integer type would overflow then it's not a good index anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change convert(Int, ...)
to Int
so that we can merge the PR?
3e63a0e
to
9c47bd2
Compare
Thanks for the review. The issues have been addressed and I have squashed commits. Anything more? |
Now that there is only one
String
type inBase
, it might be worth optimizing it. The specializations here get around two-fold performance boost compared to the generic variants:In the benchmarks below,
Base.{next|prev}ind
refers to the old version, and{next|prev}ind
to the new version.