-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add colno to invalid-char parse error #28373
Conversation
if (c == '\n') | ||
s->u_colno = 0; | ||
else | ||
s->u_colno += utf8proc_charwidth(*pwc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be a little worried about the performance of calling this on every character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a little benchmark parsing s = "begin;\n" * "i = 1\n"^10000 * "end\n"
and @time Meta.parse(s)
:
- Julia 0.6: ~0.38 seconds
- Julia 0.7 master: ~0.42 seconds
- this PR: ~0.43 seconds
The difference is probably in the noise, and I would think that parsing more realistic code would be even more work (so that the overhead of the column count would be even less).
@@ -1107,6 +1109,10 @@ int ios_getutf8(ios_t *s, uint32_t *pwc) | |||
c0 = (char)c; | |||
if ((unsigned char)c0 < 0x80) { | |||
*pwc = (uint32_t)(unsigned char)c0; | |||
if (c == '\n') | |||
s->u_colno = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It only becomes colno = 1
when it reads the first char.
For example, if the first character read is an invalid character, you want the error to say that the invalid character is in column 1, not column 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think assigning colno = 1
on a newline was an error in #9579.)
42bf7c1
to
e813022
Compare
Rebased. Hopefully this should be good to go once CI runs again. |
test/strings/basic.jl
Outdated
@@ -150,7 +150,7 @@ end | |||
@test String(sym) == string(Char(0xdcdb)) | |||
@test Meta.lower(Main, sym) === sym | |||
res = string(Meta.parse(string(Char(0xdcdb)," = 1"),1,raise=false)[1]) | |||
@test res == """\$(Expr(:error, "invalid character \\\"\\udcdb\\\"\"))""" | |||
@test res == """\$(Expr(:error, "invalid character \\\"\\udcdb\\\" near column 0\"))""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this say "near column 1"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
@@ -518,6 +518,8 @@ | |||
(and (char>=? c #\u200c) (char<=? c #\u200f)) | |||
(memv c '(#\u00ad #\u2061 #\u115f)))) | |||
|
|||
(define (scolno port) (string " near column " (input-port-column port))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have line number too: " near line 2 column 3" ("near line " (input-port-line port) " column " (input-port-column port)
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already get a line number from the LoadError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example:
julia> include_string(Main, string(Char(0xdcdb)," = 1"))
ERROR: LoadError: syntax: invalid character "???" near column 0
Stacktrace:
[1] include_string(::Module, ::String, ::String) at ./loading.jl:1008
[2] include_string(::Module, ::String) at ./loading.jl:1012
[3] top-level scope at none:0
in expression starting at string:1
The line number is already there (at string:1
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. I see that was added into #16748
Although that also only applies to cases where it is using include
. Direct uses of the parser still would need to handle that for themselves:
julia> Meta.parse(string("\n\n\n", Char(0xdcdb), " = 1"), 1, raise=false)
($(Expr(:error, "invalid character \"\udcdb\""), 7)
Although I guess those cases might be better served by CSTParser.jl anyways.
Fixed the colno to add 1 for in the error message for width=0 chars. |
Fixes #28339. Now we get error messages like:
(Colno-tracking code adapted somewhat from #9579, but accumulates unicode charwidths rather than codeunits/bytes.)