Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode crash #56

Open
dvarkin opened this issue Jul 25, 2011 · 15 comments
Open

unicode crash #56

dvarkin opened this issue Jul 25, 2011 · 15 comments
Assignees
Labels

Comments

@dvarkin
Copy link

dvarkin commented Jul 25, 2011

We we're try to send unicode data to websocket, but had a crash with this error

** Generic server <0.2107.0> terminating
** Last message in was {websocket,
[126,109,126,50,49,126,109,126,126,106,126,123,34,
116,101,120,116,34,58,34,209,139,209,132,208,178,
208,176,208,178,209,139,208,176,34,125],
{misultin_ws,
{ws,#Port<0.5164>,http,false,
{10,1,5,51},
61776,undefined,
{'draft-hixie',76},
"http://localhost:7879","10.1.5.51:7879",
"/socket.io/websocket",
[{'Upgrade',"WebSocket"},
{'Connection',"Upgrade"},
{'Host',"10.1.5.51:7879"},
{"Origin","http://localhost:7879"},
{"Sec-Websocket-Key1",
"3U4 06< <86 0O0 ) 48"},
{"Sec-Websocket-Key2",
"3 6 8g. )r2 1 4 W_2,370"}],
false},
<0.2105.0>}}
** When Server state == {state,"e95bc077-5f92-4bf4-9bbc-ba98e4636d5a",
undefined,socketio_http_misultin,
{websocket,
{misultin_ws,
{ws,#Port<0.5164>,http,false,
{10,1,5,51},
61776,undefined,
{'draft-hixie',76},
"http://localhost:7879","10.1.5.51:7879",
"/socket.io/websocket",
[{'Upgrade',"WebSocket"},
{'Connection',"Upgrade"},
{'Host',"10.1.5.51:7879"},
{"Origin","http://localhost:7879"},
{"Sec-Websocket-Key1",
"3U4 06< <86 0O0 ) 48"},
{"Sec-Websocket-Key2","3 6 8g. )r2 1 4 W_2,370"}],
false},
<0.2105.0>}},
1,
{#Ref<0.0.0.4154>,10000},
<0.2108.0>,<0.82.0>}
** Reason for termination ==
** {badarg,[{jsx_eep0018,collect,3},
{socketio_data,json,2},
{socketio_transport_websocket,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}

After insert this stupid code, every thing goes fine:

json(_Length1, Body) ->
Length = erlang:length(Body),
io:format("~n++++ Length ~p Body pn", [Length, Body]),
{Object, Rest} = lists:split(Length, Body),
io:format("~n++++ Object ~p Rest pn", [Object, Rest]),
[#msg{content=jsx:json_to_term(list_to_binary(Object), [{strict,false}]), json=true} |
handle_rest(Rest)].

socketio_data:json/2

@yrashk
Copy link
Owner

yrashk commented Jul 25, 2011

@ferd, any comments?

@ferd
Copy link
Collaborator

ferd commented Jul 25, 2011

I don't know what the original code seemed to be or why the output fails. Trying this:

8> socketio_data:encode(#msg{content=[126,109,126,50,49,126,109,126,126,106,126,123,34, 116,101,120,116,34,58,34,209,139,209,132,208,178,08,176,208,178,209,139,208,176,34,125], json=true}).   
"~m~139~m~~j~[126,109,126,50,49,126,109,126,126,106,126,123,34,116,101,120,116,34,58,34,209,139,209,132,208,178,8,176,208,178,209,139,208,176,34,125]"

Yields valid output when encoding, which tells me the conversions we build ourselves aren't an issue. However, it appears that the string itself has some invalid characters when we try to input it. A valid JSON string representation stops at character 21-22 of the string:

23> lists:sublist([126,109,126,50,49,126,109,126,126,106,126,123,34, 116,101,120,116,34,58,34,209,139,209,132,208,178,08,176,208,178,209,139,208,176,34,125], 21).
"~m~21~m~~j~{\"text\":\"Ñ"

However, if we print it and let the Erlang drivers figure out whatever encoding:

73> io:format("~s~n",[[126,109,126,50,49,126,109,126,126,106,126,123,34, 116,101,120,116,34,58,34,209,139,209,132,208,178,08,176,208,178,209,139,208,176,34,125]]).  
~m~21~m~~j~{"text":"Ñ�Ñ�в^H°Ð²Ñ�Ð

has a valid string representation (in my shell. It looks broken here). This, to me, seems to point to a problem within jsx's JSON conversion when reading from unicode strings given the shell is able to figure something out that jsx doesn't seem able to. There is a big HOWEVER there.

The unicode string given is quite a mess. I'm wondering if it's possible that some of the characters given are control characters (I strongly suspect some are). Such control characters must be encoded using \u and 4 hex digits. So if you're having control characters through that unicode string, it's quite normal for jsx to die on it -- they need to be properly escaped. dvarkin, can you check to see if you have any of these?

@dvarkin
Copy link
Author

dvarkin commented Jul 26, 2011

hi! thanks for reply.

This json has not any control symbols, only some cyrillic.

I think, the problem is in incorrect calc size of unicode string, that socketio_data:json/2 receive as "Length" argument.

Maybe this code, has an incorrect match:

header(?FRAME ++ Rest=[|], Acc)->
Length = list_to_integer(lists:reverse(Acc)),
body(Length, Rest);

%%% I have m21
header([N|Rest], Acc) when N >= $0, N =< $9 ->
header(Rest, [N|Acc]).

I don't know the best solution in this case, maybe working with unicode as binary?

@ferd
Copy link
Collaborator

ferd commented Jul 26, 2011

That could be something related to that. The way code points are seen, it sounds like a very good candidate for the issue where we'd need to read a binary character per character (pattern matching on a utf8 type).
We'd need to test it a good bit more just to make sure.

Sadly, everything related to unicode in Erlang has to be switched to binaries. The server (misultin) as it is uses lists by default and it then becomes rather unclear what we should do. For JSON strings, we can likely switch things to binary and accumulate data until we have the right length. For regular text though, we'd have no way to know what encoding the user had and then we risk interpreting them in a type they didn't intend.

We could force utf8 by default, but I'm not sure it's the best of ideas. In any case, we need to resolve this. Any opinions?

Sidenote: the fix dvarkin posted doesn't seem safe to me as it drops the parsed message size. The issue is that the socket.io client will sometimes concatenate two messages (m3abcm3def or something like that), but it seems to drop the length of the message header and instead just pick whatever's left after that. If two or more messages are appended together, the fix will break the app.

@dvarkin
Copy link
Author

dvarkin commented Jul 26, 2011

yes, it's not a fix. don't use that! We are using QuotedPrintable but this unsafe to, because of client side.

@ferd
Copy link
Collaborator

ferd commented Jul 26, 2011

I'll try to take some time during my lunch break to generate a good failing test case for the decoding to try and fix the lenght issue. I'll be waiting for comments, but for the meantime, it feels that assuming UTF8 is the safest option -- it'll play well with ASCII and ISO-8859-1 users, although it might be problematic for the UCS-2 and UCS-4 (Windows, Python), UTF16, UTF32, etc. We'll at least always be safe with JSON parsing.

@ferd
Copy link
Collaborator

ferd commented Jul 29, 2011

I studied the problem a bit. The character that causes problems is a unicode control character (U+0084 => 139). There are dozens of them, and they can be inserted, before, between and after any set of characters to modify them into a single visible character.

To me, this would be the reason as to why the length of the string gets messed up.

If I just take the subsequence [209,139], I get the following reasoning:

The resulting character under a binary representation (ы) is the direct two list entries ([209,139]) as a 2-bytes binary (<<209,139>>). However, if I take [209,139] as their literal meaning, it is the "Ñ" character plus a hidden value.

This means that while in both cases I need to get somewhat very clever to solve the issue, I first have to figure out if the faulty input you have was meant to be [209, 139] vs. <<209,139>>. Dvarkin, if you could tell me which one of the two it is, I can know the format in which we do things (choosing between unicode:characters_to_list(list_to_binary(Str)) vs. leaving it as it is vs. list_to_binary(Str) vs. unicode:characters_to_binary(Str), etc.) and try to fix stuff.

Then I'll need to figure out how the hell we're supposed to find the length of a string based on its control characters. Usually, languages have functions for that, but it seems we don't in Erlang. I'll have to see whatever Javascript does and try to re-implement it here, given that the length given in the serialized string is likely based on that. This is going to prove challenging for bare messages, but I could just let jsx do some stream parsing if I find a json structure, which should be way, way easier.

I'm not sure what the performance impacts might be.

@ghost ghost assigned ferd Sep 16, 2011
@ferd
Copy link
Collaborator

ferd commented Sep 18, 2011

Hi @dvarkin, Check out my branch (https://github.com/ferd/socket.io-erlang) (or wait until it is merged) to see if the changes I brought fix your issues. Hopefully they will.

@sinnus
Copy link

sinnus commented Sep 26, 2011

Hi @ferd, I tried your fix to send the following message via xhr-polling transport:
socket.send("Привет!");
and got big exception stack trace.

@yrashk
Copy link
Owner

yrashk commented Sep 26, 2011

@sinnus, can you share that stack trace, please?

@sinnus
Copy link

sinnus commented Sep 26, 2011

=ERROR REPORT==== 26-Sep-2011::12:24:16 ===
** Generic server <0.86.0> terminating
** Last message in was {'xhr-polling',data,
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}
*
When Server state == {state,"73a69a1d-8ce3-4843-ba1d-ed818d1bab78",[],
socketio_http_misultin,
{'xhr-polling',connected},
{misultin_req,
{req,#Port<0.5033>,http,
{127,0,0,1},
47761,undefined,keep_alive,undefined,
{1,1},
'GET',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/1317025455364"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"}],
false,<<>>},
<0.83.0>},
{<0.103.0>,#Ref<0.0.0.1534>},
undefined,
{#Ref<0.0.0.1535>,20000},
8000,<0.87.0>,<0.57.0>}
*
Reason for termination ==
** {function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}

=CRASH REPORT==== 26-Sep-2011::12:24:16 ===
crasher:
initial call: socketio_transport_polling:init/1
pid: <0.86.0>
registered_name: []
exception exit: {function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
in function gen_server:terminate/6
ancestors: [socketio_client_sup,socketio_listener_sup,
socketio_listener_sup_sup,socketio_sup,<0.54.0>]
messages: []
links: [<0.74.0>,<0.87.0>,<0.60.0>,#Port<0.5033>]
dictionary: [{{grapheme_break_property,grapheme_break_property},
#Fun<ux_unidata_filelist.2.99711654>}]
trap_exit: true
status: running
heap_size: 46368
stack_size: 24
reductions: 86376
neighbours:

=ERROR REPORT==== 26-Sep-2011::12:24:16 ===
** Generic server <0.60.0> terminating
** Last message in was {request,'POST',
["send","73a69a1d-8ce3-4843-ba1d-ed818d1bab78",
"xhr-polling","socket.io"],
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}
*
When Server state == {state,demo_erl__escript__1317__25445__902576,53276,
<0.58.0>,<0.57.0>,#Ref<0.0.0.158>,
socketio_http_misultin,
["socket.io"]}
** Reason for termination ==
* {{function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.86.0>,
{'xhr-polling',data,
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',"windows-1251,utf-8;q=0.7,
;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}]}}

=CRASH REPORT==== 26-Sep-2011::12:24:16 ===
crasher:
initial call: socketio_http:init/1
pid: <0.60.0>
registered_name: []
exception exit: {{function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.86.0>,
{'xhr-polling',data,
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,*;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}]}}
in function gen_server:terminate/6
ancestors: [socketio_listener_sup,socketio_listener_sup_sup,
socketio_sup,<0.54.0>]
messages: [{'EXIT',<0.86.0>,
{function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}}]
links: [<0.61.0>,<0.57.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 4181
stack_size: 24
reductions: 558
neighbours:

=CRASH REPORT==== 26-Sep-2011::12:24:16 ===
crasher:
initial call: gen_event:init_it/6
pid: <0.87.0>
registered_name: []
exception exit: {function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
in function gen_event:terminate_server/4
ancestors: [<0.86.0>,socketio_client_sup,socketio_listener_sup,
socketio_listener_sup_sup,socketio_sup,<0.54.0>]
messages: []
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 136
neighbours:

=SUPERVISOR REPORT==== 26-Sep-2011::12:24:16 ===
Supervisor: {local,socketio_listener_sup}
Context: child_terminated
Reason: {{function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.86.0>,
{'xhr-polling',data,
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',"windows-1251,utf-8;q=0.7,*;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}]}}
Offender: [{pid,<0.60.0>},
{name,socketio_http},
{mfargs,
{socketio_http,start_link,
[socketio_http_misultin,7878,
["socket.io"],
undefined,demo_erl__escript__1317__25445__902576,
<0.57.0>]}},
{restart_type,permanent},
{shutdown,5000},
{child_type,worker}]

=SUPERVISOR REPORT==== 26-Sep-2011::12:24:16 ===
Supervisor: {local,socketio_client_sup}
Context: child_terminated
Reason: {function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
Offender: [{pid,<0.86.0>},
{name,socketio_client},
{mfargs,
{socketio_client,start_link,
[<0.57.0>,socketio_transport_polling,
"73a69a1d-8ce3-4843-ba1d-ed818d1bab78",
socketio_http_misultin,
{'xhr-polling',
{misultin_req,
{req,#Port<0.5033>,http,
{127,0,0,1},
47761,undefined,keep_alive,undefined,
{1,1},
'GET',
{abs_path,"/socket.io/xhr-polling//1317025455177"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',"windows-1251,utf-8;q=0.7,*;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"}],
false,<<>>},
<0.83.0>}}]}},
{restart_type,transient},
{shutdown,5000},
{child_type,worker}]

=PROGRESS REPORT==== 26-Sep-2011::12:24:16 ===
supervisor: {<0.109.0>,misultin}
started: [{pid,<0.111.0>},
{name,server},
{mfargs,{misultin_server,start_link,[{1024}]}},
{restart_type,permanent},
{shutdown,60000},
{child_type,worker}]

=CRASH REPORT==== 26-Sep-2011::12:24:16 ===
crasher:
initial call: supervisor:misultin_acceptors_sup/1
pid: <0.112.0>
registered_name: []
exception exit: {bad_return,
{misultin_acceptors_sup,init,{error,eaddrinuse}}}
in function gen_server:init_it/6
ancestors: [<0.109.0>,<0.107.0>,socketio_listener_sup,
socketio_listener_sup_sup,socketio_sup,<0.54.0>]
messages: []
links: [<0.109.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 316
neighbours:

=SUPERVISOR REPORT==== 26-Sep-2011::12:24:16 ===
Supervisor: {<0.109.0>,misultin}
Context: start_error
Reason: {bad_return,{misultin_acceptors_sup,init,{error,eaddrinuse}}}
Offender: [{pid,undefined},
{name,acceptors_sup},
{mfargs,
{misultin_acceptors_sup,start_link,
[<0.109.0>,7878,
[binary,
{packet,raw},
{ip,{0,0,0,0}},
{reuseaddr,true},
{active,false},
{backlog,128},
inet],
10,30000,http,
{custom_opts,4194304,2000,false,
#Fun<socketio_http_misultin.0.67680361>,true,
#Fun<socketio_http_misultin.1.39101188>,false,
false,false}]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]

=CRASH REPORT==== 26-Sep-2011::12:24:16 ===
crasher:
initial call: socketio_http:init/1
pid: <0.107.0>
registered_name: []
exception exit: {{badmatch,{error,shutdown}},
[{socketio_http,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
in function gen_server:init_it/6
ancestors: [socketio_listener_sup,socketio_listener_sup_sup,
socketio_sup,<0.54.0>]
messages: [{'EXIT',<0.109.0>,shutdown}]
links: [<0.57.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 233
stack_size: 24
reductions: 156
neighbours:

=SUPERVISOR REPORT==== 26-Sep-2011::12:24:16 ===
Supervisor: {local,socketio_listener_sup}
Context: start_error
Reason: {{badmatch,{error,shutdown}},
[{socketio_http,init,1},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]}
Offender: [{pid,<0.60.0>},
{name,socketio_http},
{mfargs,
{socketio_http,start_link,
[socketio_http_misultin,7878,
["socket.io"],
undefined,demo_erl__escript__1317__25445__902576,
<0.57.0>]}},
{restart_type,permanent},
{shutdown,5000},
{child_type,worker}]

=ERROR REPORT==== 26-Sep-2011::12:24:16 ===
module: misultin_http
line: 543
error in custom loop: {{{function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.86.0>,
{'xhr-polling',data,
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}]}},
{gen_server,call,
[<0.60.0>,
{request,'GET',
["1317025455364",
"73a69a1d-8ce3-4843-ba1d-ed818d1bab78",
"xhr-polling","socket.io"],
{misultin_req,
{req,#Port<0.5033>,http,
{127,0,0,1},
47761,undefined,keep_alive,undefined,
{1,1},
'GET',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/1317025455364"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,
/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,
;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"}],
false,<<>>},
<0.83.0>}},
infinity]}} serving request: {req,#Port<0.5033>,
http,
{127,0,0,1},
47761,undefined,
keep_alive,undefined,
{1,1},
'GET',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/1317025455364"},
[],
[{'Host',
"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',
"en-us,en;q=0.5"},
{'Accept-Encoding',
"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,*;q=0.7"},
{'Connection',
"keep-alive"},
{'Cookie',
"socketio=xhr-polling"}],
false,<<>>}

=ERROR REPORT==== 26-Sep-2011::12:24:16 ===
module: misultin_socket
line: 106
error sending data: closed

=ERROR REPORT==== 26-Sep-2011::12:24:16 ===
module: misultin_server
line: 221
http process <0.83.0> has died with reason: kill, removing from references of open connections and websockets

=ERROR REPORT==== 26-Sep-2011::12:24:16 ===
module: misultin_http
line: 543
error in custom loop: {{{function_clause,
[{socketio_data,header,[[178,208,181,209,130,33]]},
{socketio_data,message,3},
{socketio_transport_polling,
'-handle_call/3-lc$^0/1-0-',1},
{socketio_transport_polling,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.86.0>,
{'xhr-polling',data,
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}}]}},
{gen_server,call,
[<0.60.0>,
{request,'POST',
["send","73a69a1d-8ce3-4843-ba1d-ed818d1bab78",
"xhr-polling","socket.io"],
{misultin_req,
{req,#Port<0.5379>,http,
{127,0,0,1},
47763,undefined,keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,
/;q=0.8"},
{'Accept-Language',"en-us,en;q=0.5"},
{'Accept-Encoding',"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,
;q=0.7"},
{'Connection',"keep-alive"},
{'Cookie',"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>},
<0.105.0>}},
infinity]}} serving request: {req,#Port<0.5379>,
http,
{127,0,0,1},
47763,undefined,
keep_alive,59,
{1,1},
'POST',
{abs_path,
"/socket.io/xhr-polling/73a69a1d-8ce3-4843-ba1d-ed818d1bab78/send"},
[],
[{'Host',
"localhost:7878"},
{'User-Agent',
"Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"},
{'Accept',
"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},
{'Accept-Language',
"en-us,en;q=0.5"},
{'Accept-Encoding',
"gzip, deflate"},
{'Accept-Charset',
"windows-1251,utf-8;q=0.7,*;q=0.7"},
{'Connection',
"keep-alive"},
{'Cookie',
"socketio=xhr-polling"},
{'Content-Type',
"application/x-www-form-urlencoded; charset=utf-8"},
{'Cache-Control',
"no-cache"},
{'Pragma',"no-cache"},
{'Content-Length',
"59"}],
false,
<<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>}

@sinnus
Copy link

sinnus commented Sep 26, 2011

My hotfix for the issue
sinnus@ec4b305 (sorry, forget to remove logger:)
I think the best solution will be to write own misultin_req:parse_post function.

@ferd
Copy link
Collaborator

ferd commented Sep 26, 2011

OK, looking at the stack trace, I see: <<"data=%7Em%7E7%7Em%7E%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%21">>. Decoding it with misultin, it returns:

3> misultin_utility:parse_qs(S).
[{"data",
   [126,109,126,55,126,109,126,208,159,209,128,208,184,208,178,208,181,209,130,33]}]
4> io:format("~ts~n",[[126,109,126,55,126,109,126,208,159,209,128,208,184,208,178,208,181,209,130,33]]).              
~m~7~m~��иве�!
ok
5> io:format("~ts~n", [list_to_binary([126,109,126,55,126,109,126,208,159,209,128,208,184,208,178,208,181,209,130,33])]).
~m~7~m~Привет!
ok

Socket.io-erlang handles unicode correctly. The problem has to do with how misultin does it. The issue, as far as I can see, is that the unicode is parsed as a binary, which turns all code points in bytes (0..255). Then the binaries are blindly turned into lists, but they don't have the same unicode format in Erlang -- you actually need to convert them to codepoints greater than 255, and then output them with a ~ts combination instead of just ~s.

This is covered on our side, but visibly not on Ostinelli's side (misultin). We can likely hot-patch it either in the data parser the way you did, but I'll be filing a bug report with Ostinelli to see if he can make Misultin right in the first place instead.

Here'S the issue on misultin: ostinelli/misultin#61

@ferd
Copy link
Collaborator

ferd commented Sep 27, 2011

Yes, this is related to how utf-8 is encoded and the size of bytes it uses.
I'm working with Roberto Ostinelli to fix misultin and make sure things are
fine for unicode in general.

@sinnus
Copy link

sinnus commented Sep 28, 2011

To summarize my fixes:
sinnus@ec4b305
sinnus@c5abb9b
sinnus@e4d01df
sinnus@4991ba5

sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Sep 30, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants