Random error using variables with unicode characters #5712

nlebedenco · 2014-02-07T02:36:03Z

At first I thought it could have been an issue related to how I copied and pasted the pi character because after pasting it again it simply worked but after playing with multiplications I get seemingly random errors like:

ERROR: syntax: invalid character "�"

Notice how 2 * π evaluates as expected but 2π raises an exception...

notroot@dev-mint ~ $ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.0-prerelease
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org release
|__/                   |  i686-linux-gnu

julia> ☃ = 1
1

julia> ☃
1

julia> ☃ 
1

julia> ☃ 
1

julia> ☃
1

julia> 3☃
3

julia> 5☃
5

julia> 5☃8
ERROR: ☃8 not defined

julia> 5☃
5

julia> 5☃*2
10

julia> s = "This is a string."
"This is a string."

julia> s = "âThis is a string."
"âThis is a string."

julia> s[1]
'â'

julia> s[2]
ERROR: invalid UTF-8 character index
 in getindex at utf8.jl:63

julia> s[3]
'T'

julia> @printf "%d is less than %f" 4.5 5.3 # casa
5 is less than 5.300000
julia> bla! = 2
2

julia> Bla! = 2
2

julia> Bla! = 6
6

julia> bla!
2

julia> Bla!
6

julia> 2 * π
6.283185307179586

julia> 2π
ERROR: syntax: invalid character "�"

julia> ☃
ERROR: syntax: invalid character "�"

julia> π
π = 3.1415926535897...

julia> ☃
ERROR: syntax: invalid character "�"

julia> 5☃
5

julia> ☃
1

julia> 5☃*2
10

julia> 5☃
ERROR: syntax: invalid character "�"

julia> 5☃*2
10

julia> 5☃
5

julia> 5☃*2
10

julia> 5☃
5

julia> 5☃*2
10

julia> 5☃
5

julia> ☃
1

julia> π
π = 3.1415926535897...

julia>

Any clues?

EDIT: adding versioninfo

julia> versioninfo()
Julia Version 0.3.0-prerelease
Platform Info:
  System: Linux (i686-linux-gnu)
  CPU: Intel(R) Core(TM) i5 CPU       M 450  @ 2.40GHz
  WORD_SIZE: 32
  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2014-02-07T02:39:50Z

Is your terminal set to a utf8 locale?
On Feb 6, 2014 9:36 PM, "Nícolas Lebedenco" notifications@github.com
wrote:

At first I thought it could have been an issue related to how I copied and
pasted the pi character because after pasting it again it simply worked but
after playing with multiplications I get seemingly random errors like:

ERROR: syntax: invalid character "�"

Notice how 2 * π evaluates as expected but 2π raises an exception...

notroot@dev-mint ~ $ julia
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | || | | | (| | | Version 0.3.0-prerelease
/ |_'|||__'| | Official http://julialang.org release
|__/ | i686-linux-gnu

julia> ☃ = 1
1

julia> ☃
1

julia> ☃
1

julia> ☃
1

julia> ☃
1

julia> 3☃
3

julia> 5☃
5

julia> 5☃8
ERROR: ☃8 not defined

julia> 5☃
5

julia> 5☃*2
10

julia> s = "This is a string."
"This is a string."

julia> s = "âThis is a string."
"âThis is a string."

julia> s[1]
'â'

julia> s[2]
ERROR: invalid UTF-8 character index
in getindex at utf8.jl:63

julia> s[3]
'T'

julia> @printf "%d is less than %f" 4.5 5.3 # casa
5 is less than 5.300000
julia> bla! = 2
2

julia> Bla! = 2
2

julia> Bla! = 6
6

julia> bla!
2

julia> Bla!
6

julia> 2 * π
6.283185307179586

julia> 2π
ERROR: syntax: invalid character "�"

julia> ☃
ERROR: syntax: invalid character "�"

julia> π
π = 3.1415926535897...

julia> ☃
ERROR: syntax: invalid character "�"

julia> 5☃
5

julia> ☃
1

julia> 5☃*2
10

julia> 5☃
ERROR: syntax: invalid character "�"

julia> 5☃*2
10

julia> 5☃
5

julia> 5☃*2
10

julia> 5☃
5

julia> 5☃*2
10

julia> 5☃
5

julia> ☃
1

julia> π
π = 3.1415926535897...

julia>

Any clues?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/5712
.

jiahao · 2014-02-07T02:43:20Z

I've also just noticed this error happening sporadically in my IJulia notebook instance running locally.

In[173]: versioninfo()
Julia Version 0.3.0-prerelease+1388
Commit 9fa2d17* (2014-02-04 20:15 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.0.2)
  CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

nlebedenco · 2014-02-07T02:43:48Z

sure, otherwise I wouldn't be able to accurately copy the character anyway...

notroot@dev-mint ~ $ echo $LANG
pt_BR.UTF-8

One thing I forgot to mention was that I generally invoked and edited the last executed lines from history (using the keyboard arrows) instead of always retyping everything. So I would also consider anything related to how history is implemented in repl...

stevengj · 2014-02-07T22:55:53Z

This just happened for me too, but then I restarted the notebook and couldn't reproduce it. Does anyone have a reproducible test case?

stevengj · 2014-02-07T22:59:08Z

I wonder if it is a bug introduced somehow by the unicode normalization in #5462? If you have a reproducible problem, maybe try adding a line

#define normalize(s) s

before static symbol_t *mk_symbol(const char *str) in src/flisp/flisp.c and see if the problem goes away?

nlebedenco · 2014-02-10T19:48:25Z

I couldn't reproduce the problem in a controlled way yet. It happens eventually if I insist enough with a variable. I even thought it could have been a repl bug related to backspace operating on half of a utf8 character but couldn't confirm that. I'll be on vacation for the following weeks. On my return I maybe able to give it a try with #define normalize(s)

stevengj · 2014-02-10T20:21:16Z

I'm pretty sure it isn't a REPL bug, because both jiahao and I have seen it in IJulia.

Keno · 2014-02-18T02:03:05Z

This is really, really annoying. Do we have any idea what's going on?

jiahao · 2014-02-18T03:13:30Z

fwiw, I suspect that some sort of memory corruption is resulting in characters not being parsed correctly and thus being normalized to the generic Unicode replacement character � = '\ufffd'

StefanKarpinski · 2014-02-18T04:03:13Z

The question is if it's a utf8proc error, error in how utf8proc is being used, or an unrelated memory corruption.

jiahao · 2014-02-18T04:10:08Z

I have been unable to reproduce with one unicode character, and intermittently the problem shows up with a second character.

Keno · 2014-02-26T01:47:48Z

The most frequent error I'm seeing is "malformed expression". I just came across some code that works when loaded from a file I edited in sublime, but fails when executed from IJulia in chrome.
I diffed the raw bytes and there a difference in how chi is encoded. When posted via the browser:

julia> a[254:260]
0xed
 0xa0
 0xb5
 0xed
 0xbc
 0x92
 0x20

When loaded from a file:

julia> b[254:258]
 0xf0
 0x9d
 0x9c
 0x92
 0x20

Note that I literally copy-pasted this from chrome into sublime and it started working. The code is in this gist: https://gist.github.com/loladiro/9221793. (Github wouldn't allow me to post it). I don't have much time right now to debug but maybe this is helpful.

Keno · 2014-04-20T08:56:27Z

IJulia notebook bug is fixed in ipython. See ipython/ipython#5618. I also haven't seen the original REPL bug anymore and I do use unicode a lot (but feel free to reopen if it does happen).

jiahao · 2014-04-20T11:45:42Z

Actually I just encountered this bug again yesterday when introducing the empty set. I haven't been able to reproduce it with a debugger attached though...

joehuchette · 2014-04-24T05:58:13Z

I have been getting this with some frequency lately. No minimal working example as it seems nondeterministic, but it's only appears at the REPL (not when running a script with a julia foo.jl invocation). E.g.

julia> (1-ɛ)/ɛ
ERROR: syntax: invalid character "�"

julia> (1-ɛ)/ɛ
ERROR: syntax: invalid character "�"

julia> (1-ɛ)/ɛ
18.999999999999996

julia> (1-ɛ)/ɛ
18.999999999999996

julia> (1-ɛ)/ɛ
18.999999999999996

Maybe should be reopened?

lstagner · 2014-05-23T03:02:59Z

I noticed that if the Unicode character is sandwiched between ASCII then the error won't occur

julia> e₁e = 2
2

julia> e₁ = 2
2

julia> e₁
ERROR: syntax: invalid character "�"

julia> e₁e
2

julia> e₁e
2

julia> e₁e
2
.
.
.

elextr · 2014-05-23T07:09:13Z

This last looks like an error I've made in the past, assuming the byte index of the last character == the byte index of the last byte for UTF-8.

lstagner · 2014-05-24T08:28:34Z

This could be nothing but I noticed that so far the error has only occurred on my 32-bit desktop but not my 64-bit laptop.

Edit: n/m I got it to happen

Keno · 2014-05-24T08:34:33Z

I also see this on my (64bit) mac.

Keno · 2014-06-20T04:02:35Z

Findings so far: The replacement character is introduced by u8_toutf8 directly when called from flisp. It's being passed junk value (they seem to currently always look like 0xff65bxxx[x], i.e. the ff65b is always there, but it differs in position and the random junk that follows), which I can't make sense of.

Keno · 2014-06-20T04:05:42Z

Curiously, it also seems to sometimes evaluate correctly even when hitting the replacement char case (I did verify that the character gets introduced there, by replacing the replacement character with a different one, which did indeed show up in the error message.

Keno · 2014-06-20T04:41:01Z

Valgrind with MEMDEBUG2 is very vocal: https://gist.github.com/Keno/6c52aad3b1b3a17f407e

Keno · 2014-06-20T05:25:35Z

@JeffBezanson could the problem be that we are peeking into unallocated memory, which may look like a continuation byte, hence giving us the wrong character?

JeffBezanson · 2014-06-20T05:41:23Z

That sounds possible, but it does check u8_seqlen to make sure enough bytes are available.

Keno · 2014-06-20T05:51:45Z

Why do you compare against seqlen-1?

JeffBezanson · 2014-06-20T13:44:29Z

I probably wrote that because the code had already looked at one byte, but it doesn't consume that byte, so yes that looks wrong. Definitely try changing that.

joehuchette · 2014-06-20T21:22:57Z

💯

juliohm · 2014-08-01T16:27:24Z

The bug is still present with for instance "a subscript t".

Julia Version 0.3.0-rc1+260
Commit 727733d (2014-07-29 22:14 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3

Keno · 2014-08-01T16:28:37Z

That's a different issue, I believe: #7582

juliohm · 2014-08-01T16:35:29Z

Thank you @Keno, you're right.

JeffBezanson added the unicode label Feb 24, 2014

jiahao mentioned this issue Mar 3, 2014

Creating symbols from invalid Unicode characters causes segfault #6027

Closed

Keno closed this as completed Apr 20, 2014

Keno reopened this Apr 24, 2014

JeffBezanson mentioned this issue May 22, 2014

add unicode superscripts and subscripts to latex substitutions #6927

Merged

stevengj mentioned this issue May 23, 2014

Intermittent error with unicode characters in REPL #6934

Closed

stevengj added the bug label May 23, 2014

jakebolewski mentioned this issue Jun 3, 2014

Fix remaining unicode issues JuliaLang/JuliaParser.jl#1

Closed

jakebolewski mentioned this issue Jun 20, 2014

Julia 0.3.0-prerelease on XP x32 can't start #7303

Closed

Keno closed this as completed in 9ae4c94 Jun 20, 2014

vasile-c mentioned this issue Sep 4, 2019

Iteration fails for strings with unicode characters #33157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random error using variables with unicode characters #5712

Random error using variables with unicode characters #5712

nlebedenco commented Feb 7, 2014

JeffBezanson commented Feb 7, 2014

jiahao commented Feb 7, 2014

nlebedenco commented Feb 7, 2014

stevengj commented Feb 7, 2014

stevengj commented Feb 7, 2014

nlebedenco commented Feb 10, 2014

stevengj commented Feb 10, 2014

Keno commented Feb 18, 2014

jiahao commented Feb 18, 2014

StefanKarpinski commented Feb 18, 2014

jiahao commented Feb 18, 2014

Keno commented Feb 26, 2014

Keno commented Apr 20, 2014

jiahao commented Apr 20, 2014

joehuchette commented Apr 24, 2014

lstagner commented May 23, 2014

elextr commented May 23, 2014

lstagner commented May 24, 2014

Keno commented May 24, 2014

Keno commented Jun 20, 2014

Keno commented Jun 20, 2014

Keno commented Jun 20, 2014

Keno commented Jun 20, 2014

JeffBezanson commented Jun 20, 2014

Keno commented Jun 20, 2014

JeffBezanson commented Jun 20, 2014

joehuchette commented Jun 20, 2014

juliohm commented Aug 1, 2014

Keno commented Aug 1, 2014

juliohm commented Aug 1, 2014

Random error using variables with unicode characters #5712

Random error using variables with unicode characters #5712

Comments

nlebedenco commented Feb 7, 2014

JeffBezanson commented Feb 7, 2014

jiahao commented Feb 7, 2014

nlebedenco commented Feb 7, 2014

stevengj commented Feb 7, 2014

stevengj commented Feb 7, 2014

nlebedenco commented Feb 10, 2014

stevengj commented Feb 10, 2014

Keno commented Feb 18, 2014

jiahao commented Feb 18, 2014

StefanKarpinski commented Feb 18, 2014

jiahao commented Feb 18, 2014

Keno commented Feb 26, 2014

Keno commented Apr 20, 2014

jiahao commented Apr 20, 2014

joehuchette commented Apr 24, 2014

lstagner commented May 23, 2014

elextr commented May 23, 2014

lstagner commented May 24, 2014

Keno commented May 24, 2014

Keno commented Jun 20, 2014

Keno commented Jun 20, 2014

Keno commented Jun 20, 2014

Keno commented Jun 20, 2014

JeffBezanson commented Jun 20, 2014

Keno commented Jun 20, 2014

JeffBezanson commented Jun 20, 2014

joehuchette commented Jun 20, 2014

juliohm commented Aug 1, 2014

Keno commented Aug 1, 2014

juliohm commented Aug 1, 2014