-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throw ArgumentError when accessing an invalid UTF-8 indice, rather th… #11423
Conversation
👍 |
I had just forotten about this. It should be fine to merge |
I'm not really qualified to hit merge on anything unicode-related, could someone else review and merge if you agree with this change? |
@@ -185,10 +185,9 @@ function search(s::AbstractString, c::Chars, i::Integer) | |||
return 1 <= i <= nextind(s,endof(s)) ? i : | |||
throw(BoundsError(s, i)) | |||
end | |||
if i < 1 | |||
if i < 1 || i > nextind(s,endof(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like you're catching all invalid indices here, are you? Shouldn't this be i > endof(s)
?
julia> s = "café"
"café"
julia> nextind(s,endof(s))
6
julia> s[5]
ERROR: ArgumentError: invalid UTF-8 character index
in next at ./utf8.jl:80
in getindex at string.jl:62
From what I can tell it's OK (but I may have missed details). Regarding my remarks about bounds checking, do as you prefer! |
i == sizeof(s) + 1 && return 0 | ||
(i < 1 || i > sizeof(s)) && throw(BoundsError(s, i)) | ||
return c < Char(0x80) ? search(s.data,c%UInt8,i) : 0 | ||
end | ||
rsearch(s::ASCIIString, c::Char, i::Integer) = c < Char(0x80) ? rsearch(s.data,c%UInt8,i) : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this also do the bounds checking that was added for search? Is it handled in the other rsearch method?
I think the bounds check needs to be done before you return 0 (for c < 0x80
)
Anybody concerned about the issue in rsearch? |
@tkelman 😀 How about addressing the issues I raised reviewing your PR? Seriously, it would be nice to get these sorts of fixes in... |
I don't actually know anything about what this code is doing. I just wanted to ping @vtjnash by opening this so he wouldn't forget about it. |
Oh, sorry, I saw the "Author" tag, and had forgotten that you'd just put his branch into a PR... |
@ScottPJones kindly respect others' time by not pinging them frivolously. This is a borderline violation of our community standards, viz.
|
yes, but i was not prepared when making this PR to extend it further and test rsearch as well. that should be a separate commit, and perhaps a separate PR anyways. |
@jiahao That was NOT a frivolous ping. It was simply an honest mistake, pinging tkelman instead of vtjnash, because of the line at the top: "tkelman wants to merge...", which I'd already said I was sorry for. Your ping could be considered frivolous, OTOH...
|
@ScottPJones it is counterproductive to try to point blame back at another volunteer in the community.
Is this in the context of rsearch? I believe I covered this case for search. i'm definitely trying to get back to the PR, but still trying to get caught back up on everything else right now. |
@vtjnash I wasn't trying to "point blame". I was a bit taken aback by being told it was a "frivolous" ping, when pinging tkelman... which was an honest mistake, because it was he who brought me into this thread in the first place! I just wanted him to remember this issue, since nobody had responded to my comment here, and he was in contact with me constantly on a couple other threads, reviewing the Unicode PRs I've been trying to get in. Hence the smiley face. If I'd known that this PR was really you, I wouldn't have said anything, assuming you could be too busy to respond yet. Looking again at it (and not from on my phone), that is rsearch, you have taken care of it for search. |
Arguing about pinging back and forth is indeed unproductive. Given the new unicode error handling framework from #11573, the patch in this PR would need to be adapted to fit into it to resolve the conflict. |
de8f898
to
6bd2aeb
Compare
…n attempting to adjust it (in getindex, next, search). fix #7811
6bd2aeb
to
3e53553
Compare
I've pushed an update for that.
afaict, none of the rsearch methods do this. also, as a breadcrumb, i have not attempted to add validation the endpoint index passed to a few of these methods ( |
Ok, good point about the other places lacking validation. 👍 to this change |
throw ArgumentError when accessing an invalid UTF-8 indice
I think I see a problem with part of this change, in |
…an attempting to adjust it. fix #7811
cc @vtjnash (any changes to make here? it's your branch, i'm just turning it into a PR) @ScottPJones