Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests getting lost when handle_request receives non-selectively #42

Open
ferd opened this issue May 31, 2011 · 14 comments
Open

Requests getting lost when handle_request receives non-selectively #42

ferd opened this issue May 31, 2011 · 14 comments
Labels

Comments

@ferd
Copy link
Collaborator

ferd commented May 31, 2011

TL;DR: When we're using 'handle_request/5' to deal with a request, the control of the socketio_http process is given up to the user and queries can get lost when doing a non-selective receive during quick successions of client requests


I used the handle_request/5 call and had it communicating with non-socket.io processes. As part of its task, the callback had to do a non-selective receive and digest everything it received in a while to handle them later (much like a gen_server would do it).

However, socketio_http runs the 'handle_request' function calls within its own process, so whenever socketio_http_misultin or some other instance calls the socketio_http server for the current connection more than once (can be simulated by reloading many tabs of the same browser really fast), the 'handle_request' currently running is the one getting the {request, 'GET', ...} message bypassing the socketio_http process altogether. The process model used in this case should be modified to avoid such issues.

In the meantime (and for external readers), I solved the issue by basically spawning a new process in handle_request, then monitoring it and waiting only for the monitor of that process to be done. During that time the other process can send the data to the socket as much as it wants, but we're forcing keep-alive/pipelined calls to be serialized when they're in the same section. This exact model should probably be moved from my own application to socket.io-erlang itself (to be discussed).

@ferd
Copy link
Collaborator Author

ferd commented Jun 1, 2011

One question I meant to ask is that if we go with my solution for the whole thing, do we accept to duplicate the 'Req' value with all of its contents? I know misultin already does sandboxing that way -- we'd end up with 3 copies of the socket by acting that way, but I see no other good fix that doesn't involve doing things that way. Is it worth it for the corner cases where people do it my way and punish everyone, or punish the few people who have that issue by needing them to look at documentation (which we will need to write at some point).

@sinnus
Copy link

sinnus commented Sep 29, 2011

@ferd, could you copy paste solution?
Thank you!

@ferd
Copy link
Collaborator Author

ferd commented Sep 29, 2011

Here you go, @sinnus. The code is a bit customized for some closed-source app I had, and you can disregard some options such as SessionMods and SessionMethod. They were application-specific things.

%% How to start the connection and whatnot
start_link(Options) ->
    SessionMods = proplists:get_value(sessions, Options, []),
    SessionMethod = proplists:get_value(session_method, Options, [{http_cookie, "sessionid"}]),
    Port = proplists:get_value(port, Options),
    {ok, Pid} = socketio_listener:start([{http_port, Port},
            {default_http_handler, {?MODULE, [SessionMods, SessionMethod]}}]),
    EventMgr = socketio_listener:event_manager(Pid),
    ok = gen_event:add_handler(EventMgr, ?MODULE, [SessionMods, SessionMethod]).

%% Default HTTP Handler
 handle_request(_Method, _Path, Req, SessionMods, SessionMethod) ->
    %% Call some function to handle this HTTP call!
    %% This depends on messy knowledge abouy misultin sadly.
    %% This spawning business is necessary due to the following
    %% socket.io issue: https://github.com/yrashk/socket.io-erlang/issues/42
    Pid = spawn(fun() -> (my_module:loop(SessionMods, SessionMethod))(Req) end),
    Ref = erlang:monitor(process,Pid),
    receive
        {'DOWN', Ref, process, _, _} -> ok
    end.

You can see the trick is starting at Pid = spawn(....), and finishes with the receive block. Let me know if you need more explanations.

@sinnus
Copy link

sinnus commented Sep 29, 2011

@ferd, I can't understand how messages can get lost?

@ferd
Copy link
Collaborator Author

ferd commented Sep 29, 2011

New socket calls are received as messages by the TCP socket controlling process. Misultin dispatches a process to do that, which socket.io's code sits in to redirect the messages to the right spot.

So if the handle_request/N function is a long running one (could take a few seconds), sits in the same process as socket.io (because it's a callback) and thus in the misultin process, and uses selective receives, then when more data is coming form the socket (say, pipelined calls) or whatever, the long-running function will be consuming these messages instead of leaving them to the socket.io process that usually deals with them.

These messages were likely meant for the socketio process in charge of handling queries, but might have been consumed or discarded by your handler. That means that on the other end of the request (the browser), the query will never be replied to and it will time out.

This can also be true of messages sent to the socketio process in charge of handling queries, unrelated to sockets (internal communications). Because of this, I had to split my queries into two processes. One one hand, the handle_request function will be guaranteed not to consume messages it wasn't meant to, and by spawning a new process to handle the actual request, I allow myself to do and receive whatever I want in my_module:loop/N.

@sinnus
Copy link

sinnus commented Sep 29, 2011

@ferd, is it Misultin issue to bypass messages if a function takes a lot of time?

@ferd
Copy link
Collaborator Author

ferd commented Sep 29, 2011

I don't think we could pinpoint the problem to a single place. The problem comes from the interaction of many logical actors (Misultin's side, socket.io-erlang's side, the callback's side) running with the same mailbox. You can fix the issue on any side. I don't know of a way to do it that will not require adding processes, though.

@sinnus
Copy link

sinnus commented Sep 29, 2011

@ferd, thank you for your answers! As I can understand the problem is that mailbox is "shared" between other sides,

@ferd
Copy link
Collaborator Author

ferd commented Sep 29, 2011

Exactly. Then ownership of messages for each query or worker becomes extremely vague. Misultin doesn't read messages, but you can have conflicts between your callbacks and socket.io-erlang. Using processes the way I did it solves the problem.

@sinnus
Copy link

sinnus commented Sep 30, 2011

Hello, @ferd! I tested "long running" callback for longpolling and websocket using timer:sleep:

-module(socketio_transport_websocket).

    ...
%% Websockets
handle_call({websocket, Data, _Ws}, _From, #state{ heartbeat_interval = Interval, event_manager = EventManager } = State) ->
    error_logger:info_msg("Before timer~n", []),
    timer:sleep(1000),
    error_logger:info_msg("After timer~n", []),
    ...

-module(socketio_transport_polling).

    %% Incoming data
handle_call({_TransportType, data, Req}, From, #state{ server_module = ServerModule,
                                                       event_manager = EventManager,
                                                       polling_duration = Interval,
                                                       sup = Sup } = State) ->
    error_logger:info_msg("Before timer~n", []),
    timer:sleep(1000),
    error_logger:info_msg("After timer~n", []),
    ...

I used flash socket.io client to send data to server and all messages was processed by server side. I only faced with longpolling timeout when sleep:timer > longpolling timeout.
Can you help me to write test to reproduce issue?

@ferd
Copy link
Collaborator Author

ferd commented Sep 30, 2011

I don't sure I understand? You're trying to reproduce the issue I had?

@sinnus
Copy link

sinnus commented Sep 30, 2011

Yes, I'm trying to reproduce the issue without "reloading many tabs of the same browser really fast".

@sinnus
Copy link

sinnus commented Sep 30, 2011

@ferd, I have reproduced your issue. My test code above was in wrong place because of misunderstanding, sorry.

My steps to reproduce:

  1. Open index.html
  2. Send data from flash client
  3. No echo on client side
  4. After 10 seconds (timeout) echo will be received
handle_request('GET', [], Req) ->
    error_logger:info_msg("Before timer", []),
    timer:sleep(10000),
    error_logger:info_msg("after timer", []),
    Req:file(filename:join([filename:dirname(code:which(?MODULE)), "index.html"]));

@sinnus
Copy link

sinnus commented Sep 30, 2011

As we can see handle_request('GET', [], Req) blocks socket.io server. All clients will be wait handle_request function.

sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Oct 2, 2011
sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Oct 3, 2011
sinnus added a commit to sinnus/socket.io-erlang that referenced this issue Oct 3, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants