-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clearer parse errors for invalid characters #28339
Comments
See also #26193; it's not completely crazy to normalize U+2212 (minus) to hyphen, but it would require some additional parser changes discussed in #25157, so it's almost certain not to happen for Julia 1.0. I'm not exactly certain what a better error message would look like. Maybe
although it is somewhat tricky to get the |
Hello,
in my opinion it would be sufficient to give the number of the column on the line, I assume most editors display the cursor column. So it would be easier to figure out which character it is. Something like Also, indicating that it is from the viewpoint of the parser an UTF and not some other formatting issue (caused by some typo earlier in a source file, we all know those) would be helpful when reading from a file obviously. Concerning 26193: I have no opinion on that. EDIT: I should add, that of course the error when using the module was first and let to quite a bit of confusion. The REPL test was just to demonstrate the issue. Having the column number there is only of limited value, but still better than nothing. Best Regards |
That also depends on Julia's (utf8proc's) charwidth matching the editors charwidths. (Some Unicode characters take up 0 columns, some take up 1 column, and some take up 2 columns, but there isn't universal agreement on which is which — see e.g. #3721.) However, it will match for most text, and I suppose we could say "near column 21" for the rare cases where the charwidths don't agree. |
We could just approximate column number with character number. After all, most people code in fixed-width fonts and for a lot of code this approximation will be correct. It seem better than not providing any information for fear of doing so imperfectly. |
Summing the charwidths is a better approximation, and is easy to compute … One point of confusion here, @StefanKarpinski: charwidths (as computed by e.g. the Julia |
c.f. #9579, which also added support for the necessary column number tracking in the parser |
#9579 counts code units, which is even cruder than counting characters… I wonder why the count wasn't incremented in |
Thank you ! |
Hello,
I would like to file an issue about the current (0.7.0-beta2.0, Linux) error reporting with UTF characters. Copy and Pasting the following numeric value from a pdf (https://arxiv.org/pdf/cond-mat/0110585) via the X11 clipboard into julia or reading the pasted numerical value from a file gives an error:
Reading from a file this becomes
I see two issues with this error message:
Considering the fact that julia explicitely allows a lot of UTF characters in normal source makes this error message even more unsatisfying. Note: I do not at all propose that the parser should accept this UTF character as a valid "minus" ! But it would be very helpfull if the error message could at least indicate the actual character (position on input line).
Best Regards
The text was updated successfully, but these errors were encountered: