Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add colno to invalid-char parse error #28373

Merged
merged 4 commits into from
Nov 2, 2018
Merged

add colno to invalid-char parse error #28373

merged 4 commits into from
Nov 2, 2018

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jul 31, 2018

Fixes #28339. Now we get error messages like:

julia> -0.6626458266981849E−01
ERROR: syntax: invalid character "−" near column 21

(Colno-tracking code adapted somewhat from #9579, but accumulates unicode charwidths rather than codeunits/bytes.)

@stevengj stevengj added error handling Handling of exceptions by Julia or the user parser Language parsing and surface syntax labels Jul 31, 2018
if (c == '\n')
s->u_colno = 0;
else
s->u_colno += utf8proc_charwidth(*pwc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be a little worried about the performance of calling this on every character.

Copy link
Member Author

@stevengj stevengj Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little benchmark parsing s = "begin;\n" * "i = 1\n"^10000 * "end\n" and @time Meta.parse(s):

  • Julia 0.6: ~0.38 seconds
  • Julia 0.7 master: ~0.42 seconds
  • this PR: ~0.43 seconds

The difference is probably in the noise, and I would think that parsing more realistic code would be even more work (so that the overhead of the column count would be even less).

@@ -1107,6 +1109,10 @@ int ios_getutf8(ios_t *s, uint32_t *pwc)
c0 = (char)c;
if ((unsigned char)c0 < 0x80) {
*pwc = (uint32_t)(unsigned char)c0;
if (c == '\n')
s->u_colno = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 1?

Copy link
Member Author

@stevengj stevengj Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It only becomes colno = 1 when it reads the first char.

For example, if the first character read is an invalid character, you want the error to say that the invalid character is in column 1, not column 2.

Copy link
Member Author

@stevengj stevengj Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think assigning colno = 1 on a newline was an error in #9579.)

@stevengj
Copy link
Member Author

stevengj commented Oct 22, 2018

Rebased. Hopefully this should be good to go once CI runs again.

@@ -150,7 +150,7 @@ end
@test String(sym) == string(Char(0xdcdb))
@test Meta.lower(Main, sym) === sym
res = string(Meta.parse(string(Char(0xdcdb)," = 1"),1,raise=false)[1])
@test res == """\$(Expr(:error, "invalid character \\\"\\udcdb\\\"\"))"""
@test res == """\$(Expr(:error, "invalid character \\\"\\udcdb\\\" near column 0\"))"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this say "near column 1"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@@ -518,6 +518,8 @@
(and (char>=? c #\u200c) (char<=? c #\u200f))
(memv c '(#\u00ad #\u2061 #\u115f))))

(define (scolno port) (string " near column " (input-port-column port)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have line number too: " near line 2 column 3" ("near line " (input-port-line port) " column " (input-port-column port))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already get a line number from the LoadError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

julia> include_string(Main, string(Char(0xdcdb)," = 1"))
ERROR: LoadError: syntax: invalid character "???" near column 0
Stacktrace:
 [1] include_string(::Module, ::String, ::String) at ./loading.jl:1008
 [2] include_string(::Module, ::String) at ./loading.jl:1012
 [3] top-level scope at none:0
in expression starting at string:1

The line number is already there (at string:1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I see that was added into #16748

Although that also only applies to cases where it is using include. Direct uses of the parser still would need to handle that for themselves:

julia> Meta.parse(string("\n\n\n", Char(0xdcdb), " = 1"), 1, raise=false)
($(Expr(:error, "invalid character \"\udcdb\""), 7)

Although I guess those cases might be better served by CSTParser.jl anyways.

@stevengj
Copy link
Member Author

stevengj commented Oct 29, 2018

Fixed the colno to add 1 for in the error message for width=0 chars.

@stevengj stevengj merged commit 05fb47d into master Nov 2, 2018
@stevengj stevengj deleted the sgj/invalidchar branch November 2, 2018 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants