-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random error using variables with unicode characters #5712
Comments
Is your terminal set to a utf8 locale?
|
I've also just noticed this error happening sporadically in my IJulia notebook instance running locally. In[173]: versioninfo()
Julia Version 0.3.0-prerelease+1388
Commit 9fa2d17* (2014-02-04 20:15 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin13.0.2)
CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm |
sure, otherwise I wouldn't be able to accurately copy the character anyway...
One thing I forgot to mention was that I generally invoked and edited the last executed lines from history (using the keyboard arrows) instead of always retyping everything. So I would also consider anything related to how history is implemented in repl... |
This just happened for me too, but then I restarted the notebook and couldn't reproduce it. Does anyone have a reproducible test case? |
I wonder if it is a bug introduced somehow by the unicode normalization in #5462? If you have a reproducible problem, maybe try adding a line
before |
I couldn't reproduce the problem in a controlled way yet. It happens eventually if I insist enough with a variable. I even thought it could have been a repl bug related to backspace operating on half of a utf8 character but couldn't confirm that. I'll be on vacation for the following weeks. On my return I maybe able to give it a try with #define normalize(s) |
I'm pretty sure it isn't a REPL bug, because both jiahao and I have seen it in IJulia. |
This is really, really annoying. Do we have any idea what's going on? |
fwiw, I suspect that some sort of memory corruption is resulting in characters not being parsed correctly and thus being normalized to the generic Unicode replacement character � = '\ufffd' |
The question is if it's a utf8proc error, error in how utf8proc is being used, or an unrelated memory corruption. |
I have been unable to reproduce with one unicode character, and intermittently the problem shows up with a second character. |
The most frequent error I'm seeing is "malformed expression". I just came across some code that works when loaded from a file I edited in sublime, but fails when executed from IJulia in chrome.
When loaded from a file:
Note that I literally copy-pasted this from chrome into sublime and it started working. The code is in this gist: https://gist.github.com/loladiro/9221793. (Github wouldn't allow me to post it). I don't have much time right now to debug but maybe this is helpful. |
IJulia notebook bug is fixed in ipython. See ipython/ipython#5618. I also haven't seen the original REPL bug anymore and I do use unicode a lot (but feel free to reopen if it does happen). |
Actually I just encountered this bug again yesterday when introducing the empty set. I haven't been able to reproduce it with a debugger attached though... |
I have been getting this with some frequency lately. No minimal working example as it seems nondeterministic, but it's only appears at the REPL (not when running a script with a
Maybe should be reopened? |
I noticed that if the Unicode character is sandwiched between ASCII then the error won't occur
|
This last looks like an error I've made in the past, assuming the byte index of the last character == the byte index of the last byte for UTF-8. |
This could be nothing but I noticed that so far the error has only occurred on my 32-bit desktop but not my 64-bit laptop. Edit: n/m I got it to happen |
I also see this on my (64bit) mac. |
Findings so far: The replacement character is introduced by u8_toutf8 directly when called from flisp. It's being passed junk value (they seem to currently always look like 0xff65bxxx[x], i.e. the ff65b is always there, but it differs in position and the random junk that follows), which I can't make sense of. |
Curiously, it also seems to sometimes evaluate correctly even when hitting the replacement char case (I did verify that the character gets introduced there, by replacing the replacement character with a different one, which did indeed show up in the error message. |
Valgrind with MEMDEBUG2 is very vocal: https://gist.github.com/Keno/6c52aad3b1b3a17f407e |
@JeffBezanson could the problem be that we are peeking into unallocated memory, which may look like a continuation byte, hence giving us the wrong character? |
That sounds possible, but it does check |
Why do you compare against seqlen-1? |
I probably wrote that because the code had already looked at one byte, but it doesn't consume that byte, so yes that looks wrong. Definitely try changing that. |
💯 |
The bug is still present with for instance "a subscript t".
|
That's a different issue, I believe: #7582 |
Thank you @Keno, you're right. |
At first I thought it could have been an issue related to how I copied and pasted the pi character because after pasting it again it simply worked but after playing with multiplications I get seemingly random errors like:
ERROR: syntax: invalid character "�"
Notice how 2 * π evaluates as expected but 2π raises an exception...
Any clues?
EDIT: adding versioninfo
The text was updated successfully, but these errors were encountered: