Skip to content

Commit

Permalink
strings: fix performance of nextind
Browse files Browse the repository at this point in the history
The recursion (for invalid bytes) was preventing inlining, as was the
length of the function. For ASCII data, the cost of the call far exceeds
the cost of decoding the data.

Closes #51624
  • Loading branch information
vtjnash committed Oct 11, 2023
1 parent e10c8f5 commit 48fed49
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions base/strings/string.jl
Original file line number Diff line number Diff line change
Expand Up @@ -177,9 +177,16 @@ end
@boundscheck between(i, 1, n) || throw(BoundsError(s, i))
@inbounds l = codeunit(s, i)
(l < 0x80) | (0xf8 l) && return i+1
(@noinline function _nextind_continued(s, i, n, l) # mark the rest of the function as a slow-path
if l < 0xc0
# handle invalid codeunit index by scanning back to the start of this index
# (which may be the same as this index)
i′ = @inbounds thisind(s, i)
return i′ < i ? @inbounds(nextind(s, i′)) : i+1
i′ >= i && return i+1
i = i′
@inbounds l = codeunit(s, i)
(l < 0x80) | (0xf8 l) && return i+1
@assert l >= 0xc0
end
# first continuation byte
(i += 1) > n && return i
Expand All @@ -192,7 +199,8 @@ end
((i += 1) > n) | (l < 0xf0) && return i
# third continuation byte
@inbounds b = codeunit(s, i)
ifelse(b & 0xc0 0x80, i, i+1)
return ifelse(b & 0xc0 0x80, i, i+1)
end)(s, i, n, l)
end

## checking UTF-8 & ACSII validity ##
Expand Down

0 comments on commit 48fed49

Please sign in to comment.