-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make getindex for String check if indices are valid #22572
Conversation
Actually some library functions used indexing at invalid points. I will work on fixing it. |
I hope I have fixed all positions in code where 'getindex' used non-strict bounds. @nalimilan As REPL is a very sensitive part of Julia to mess with do you know whom to ask for advice how to proceed (which in the end might be to abandon this change)? |
Yeah, fixing the REPL code is going to be tedious. But don't worry too much, if you break it and the tests don't catch it, people will report it soon enough. :-) Anyway, it would be nice to clean this code from all undue assumptions about string indexing. I guess for cases which are really too obscure, you could simply call |
base/strings/string.jl
Outdated
@inbounds si = codeunit(s, i) | ||
if is_valid_continuation(si) | ||
throw(UnicodeError(UTF_ERR_INVALID_INDEX, i, si)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better continue throwing a UnicodeError
rather than a BoundsError
, as the former gives more details about the problematic code point. This can be helpful if you import invalid Unicode data and want to understand what's the problem.
I have done blame and that is why I refrained from starting the fixes - the code was written by core Julia developers who probably understand the whole ecosystem way beyond my level.
Additionally my thinking was that if we have problems in REPL it is possible that many packages were developed also using this assumption (which is actually sometimes convenient - especially when you use byte buffers - you pass In summary - it is a breaking change and it might break a lot of code based on previous behavior. That is why I started having second thoughts. |
It'd hard to contribute to a project if you feel intimidated when you see that the code has been written by core developers. :-) I can't guarantee that the PR will be merged, so if you're afraid you could waste your time don't do it, but in general the best way to find out whether a change would be too breaking is to have it work in Base (which is admittedly the most complex and ancient Julia codebase around). Then if it looks acceptable it may be merged. If the only places to fix are the ones you listed, I would say that's pretty reasonable. The ability to write |
Great, looks like you managed to pass the bootstrap phase! The remaining failures should be easier to fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, modulo a few comments. The PR is surprisingly small, which is a good sign that the stricter indexing should shouldn't break to much code.
Please add a mention of this breaking change in NEWS.md.
base/repl/LineEdit.jl
Outdated
@@ -222,7 +222,7 @@ function refresh_multi_line(termbuf::TerminalBuffer, terminal::UnixTerminal, buf | |||
# in this case, we haven't yet written the cursor position | |||
line_pos -= slength # '\n' gets an extra pos | |||
if line_pos < 0 || !moreinput | |||
num_chars = (line_pos >= 0 ? llength : strwidth(l[1:(line_pos + slength)])) | |||
num_chars = (line_pos >= 0 ? llength : strwidth(l[1:prevind(l, line_pos + slength+1)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, should add spaces around +
.
base/repl/REPL.jl
Outdated
@@ -853,7 +853,7 @@ function setup_interface( | |||
end | |||
# Check if input line starts with "julia> ", remove it if we are in prompt paste mode | |||
jl_prompt_len = 7 | |||
if (firstline || isprompt_paste) && (oldpos + jl_prompt_len <= sizeof(input) && input[oldpos:oldpos+jl_prompt_len-1] == JULIA_PROMPT) | |||
if (firstline || isprompt_paste) && (oldpos + jl_prompt_len <= sizeof(input) && input[oldpos:prevind(input, oldpos+jl_prompt_len)] == JULIA_PROMPT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cleaner way of doing this check, which wouldn't involve playing with indices, would be startswith(SubString(input, oldpos), JULIA_PROMPT)
.
base/strings/string.jl
Outdated
l = sizeof(s) | ||
if i < 1 || i > l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a styling issue, but I don't see the point of changing these from if
to &&
, especially since you add a new if
block for UnicodeError
.
@StefanKarpinski @stevengj Good to go? If so, we should probably run CI again in case changes since the PR was opened broke the tests. |
Closes JuliaLang#22548 fixes a bug with use of prevind in dates/io.jl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased to pickup any local changes and re-run CI.
Looks like a new failure related to this PR has been introduced. |
@@ -534,10 +536,11 @@ function completions(string, pos) | |||
# <Mod>/src/<Mod>.jl | |||
# <Mod>.jl/src/<Mod>.jl | |||
if isfile(joinpath(dir, pname)) | |||
endswith(pname, ".jl") && push!(suggestions, pname[1:end-3]) | |||
endswith(pname, ".jl") && push!(suggestions, | |||
pname[1:prevind(pname, end-2)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been OK, since we know the three last characters are ASCII. -2
has no reason to be more correct than -3
, right? Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not:
julia> x="α.jl"
"α.jl"
julia> x[end-3]
ERROR: UnicodeError: invalid character index
Stacktrace:
[1] slow_utf8_next(::Ptr{UInt8}, ::UInt8, ::Int64, ::Int64) at .\strings\string.jl:172
[2] next at .\strings\string.jl:204 [inlined]
[3] getindex(::String, ::Int64) at .\strings\basic.jl:32
I can use end-2
exactly because I know that last three characters are ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, carry on, I need to learn how to count up to three. :-)
Unfortunately, there's yet another test failure. |
@nalimilan Sorry for that. Unfortunately the errors pop up at random places (and even different CI builds print different stack traces for errors). I will try to be yet more careful and make the next commit pass :). Actually the pain with fixing this PR made me submit #23765 as strict Of course I still believe we need this PR, because otherwise bugs will silently go through. |
base/repl/REPLCompletions.jl
Outdated
@@ -11,7 +11,11 @@ function completes_global(x, name) | |||
end | |||
|
|||
function appendmacro!(syms, macros, needle, endchar) | |||
append!(syms, s[2:end-sizeof(needle)]*endchar for s in filter(x -> endswith(x, needle), macros)) | |||
r = Regex("^.(.*)$needle\$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Regex("^.(.*)\\Q$needle\\E\$")
since you don't want special characters to have special interpretation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :). I will fix after I am sure that CI goes through correcty (as it seems now will go through without errors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI looks fine now, failures are unrelated.
Unfortunately wrapping regexp with When I look how |
Let's merge? |
Great PR. This will probably find a lot of invalid use cases in user code. |
Make
getindex
forString
check if indices are valid. See #22548 for discussion.Benchmark code (
new_getindex
is the proposed implementation):produces: