Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shotgun FSM fixes #100

Merged
merged 3 commits into from
Oct 23, 2015
Merged

shotgun FSM fixes #100

merged 3 commits into from
Oct 23, 2015

Conversation

kennethlakin
Copy link
Contributor

This PR replaces PR #99 and completely addresses issue #96 , but there are a few vaguely troubling issues surrounding timed-out request handling.

566676f adds a work queue to the shotgun FSM. Details and caveats are in the commit message. I'm not yet sure how to handle cancellation of timed-out requests, or if I should at all. Also, note that the work queue is implemented as a list, rather than a queue. It wouldn't be difficult to switch to a queue, if needed.

46ddcdc does two things:

  1. Allows one to pass Ranch transport options along with the call to shotgun:open.
  2. Make shotgun:open block until gun:await_up returns, either the user-specified (or default 5 second) timeout is reached, or gun shuts down due to failure.

11bcbd7 bumps the version number to 0.1.13.

e0625fc makes it possible to perform requests on URIs that contain query strings, or URIs that point to files or whatever.

08245d6 and e63b2e7 are housekeeping commits.

#{
transport_opts => [],
timeout => pos_integer() | infinity }.
%% transport_opts are passed to Ranch's TCP transport, which is -itself- a thin layer over gen_tcp. <br/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Elvis:

Line 91 is too long: %% transport_opts are passed to Ranch's TCP transport, which is -itself- a thin layer over gen_tcp.
.

@@ -443,7 +543,8 @@ clean_state() ->
headers => undefined,
async => false,
async_mode => binary,
buffer => <<"">>
buffer => <<"">>,
pending_requests => Reqs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mention in the comment, I would use a queue here instead of a list. We would lose the ability to pattern match on function heads and use guards but appending new work would be O(1) amortized.

@jfacorro
Copy link
Contributor

@kennethlakin This is great work, thank you! You have implemented issue #21 (Queue operations into shotgun) which we've had pending for quite a while.

Could you please check out the comments from elvis and the ones I've left? They are mostly just style related.

Thanks again for your contribution!

@kennethlakin
Copy link
Contributor Author

Okay. Elvis appears to be appeased, the work queue is actually a queue, and -where appropriate- there are spaces around all '='.

Three things:

  1. How do you feel about e0625fc ? Neither the commit that introduced check_uri ( 5d1419c ) nor the associated issue ( Warn user about missing slash when using the verb functions #33 ) nor PR ( [Closes #33] Throw error for missing leading slash #50 ) was very clear on why that functionality was needed. Additionally, my changes to check_uri mean that it no longer searches for a trailing slash, but is happy with URIs that contain any slashes.
  2. The potential for spurious messages in process mailboxes still concerns me. If I were to -at some point in the near future- change shotgun:request to spawn_link a child that performs the gen_fsm call and either reports back or is killed by the timeout, would this be a welcome change? Or would that be overcomplicating things?
  3. The code duplication in the three-argument versions of wait_response, receive_data, receive_chunk bothers me. Would replacing them sometime in the near future with a stringifying macro be acceptable, or is that a step too far?

Anyway. Glad you're pleased with the work! This was a fun little project.

@jfacorro
Copy link
Contributor

@kennethlakin You can find the replies inline:

  1. How do you feel about e0625fc ? Neither the commit that introduced check_uri ( 5d1419c ) nor the associated issue ( Warn user about missing slash when using the verb functions #33 ) nor PR ( [Closes #33] Throw error for missing leading slash #50 ) was very clear on why that functionality was needed. Additionally, my changes to check_uri mean that it no longer searches for a trailing slash, but is happy with URIs that contain any slashes.

I had actually missed that change in check_uri. Even though I don't quite remember the reason at that time, not including a leading slash in the URI results in cowlib generating an invalid HTTP request. According to the HTTP spec the Request-URI should be one of Request-URI = "*" | absoluteURI | abs_path | authority. So it seems neither our approaches is 100% correct. If you could please revert your changes on the check_uri function we can open an issue so that it complies with the HTTP Request spec.

  1. The potential for spurious messages in process mailboxes still concerns me. If I were to -at some point in the near future- change shotgun:request to spawn_link a child that performs the gen_fsm call and either reports back or is killed by the timeout, would this be a welcome change? Or would that be overcomplicating things?

I think the current approach is enough for most cases. Doing something more sophisticated would probably introduce unnecessary (for now) complexity.

  1. The code duplication in the three-argument versions of wait_response, receive_data, receive_chunk bothers me. Would replacing them sometime in the near future with a stringifying macro be acceptable, or is that a step too far?

We try to avoid macros at all costs since they make the code harder to read. But we can have a function that is called in all those cases.

@kennethlakin
Copy link
Contributor Author

... not including a leading slash in the URI results in cowlib generating an invalid HTTP request.... If you could please revert your changes on the check_uri function we can open an issue so that it complies with the HTTP Request spec.

Based on a little bit of testing, it seems that gun, cowlib, and/or ranch no longer do not attempt to ensure that the URI passed in is valid.

To test, I started a Cowboy server that answered all requests with the same file. Here's the Cowboy route:

{"/[...]", cowboy_static, {file, "static/index.html"}}

I then changed shotgun:check_uri to return 'ok', regardless of the URI passed in, and recompiled.

I ran the program at the end of this comment in a shell with shotgun loaded and got the following result:

Erlang/OTP 18 [erts-7.0] [source] [smp:2:2] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.0  (abort with ^G)
1> c(test).
{ok,test}
2> test:run().
ok
3> 

It looks like none of shotgun's dependencies do any URI validity checking. It makes more sense to me to leave such validation out of the transport libraries. I would like to remove check_uri, but -ultimately- you're the boss. :)

Edit: And if the practical concern is that cowlib generates an invalid request when the URI is missing a leading slash, then it would be easy enough to change check_uri to only check for that. (Cowboy returns 400 for all of the cases in my little test program where the URI is missing a leading slash, so I can see how throwing an error in shotgun could save the programmer a fair bit of time while debugging.)

I think the current approach is enough for most cases.

Understood. Maybe I'll get bored and amuse myself by coding it up anyway, if it turns out to be not too complicated. :D

We try to avoid macros at all costs since they make the code harder to read.

Even if the macro invocaton reads:

?enqueue_work_and_stay_in_state(wait_response).

? (I have spent many years as a C++ programmer, so I do understand that my tolerance for both tracking down convoluted definitions and making sense of unwieldy code is probably substantially higher than most folks. I'm fairly sure that this trait is not exactly advantageous.)

The little test program mentioned above follows:

-module(test).
-compile(export_all).

start() ->
  application:ensure_all_started(shotgun),
  {ok, Pid} = shotgun:open("localhost", 8080),
  put(pid, Pid),
  ok.

q(Uri) ->
  case shotgun:get(get(pid), Uri) of
    {ok, R} ->
      case maps:find(body, R) of
        {ok, _} -> {ok, body};
        error -> {ok, no_body}
      end;
    Err -> {error, Err}
  end.

run() ->
  case(get(pid)) of
    undefined -> start();
    _ -> ok
  end,
  lists:foreach(fun({U, Expected}) ->
                    case q(U) of
                      Expected -> ok;
                      Err -> 
                        io:format("~p unexpected result ~p~n", [U, Err])
                    end
                end,
                [{"/", {ok, body}},
                 {"/thing", {ok, body}},
                 {"/thing/&q", {ok, body}},
                 {"/th&in;g/", {ok, body}},
                 {"/th&in;g", {ok, body}},
                 {"", {ok, no_body}},
                 {"thing", {ok, no_body}},
                 {"th&in;g", {ok, no_body}}]).

@jfacorro
Copy link
Contributor

@kennethlakin As you say, the check_uri is there so that shotgun users don't bang their head against the wall when the response they get is not what they expected and it turns out they were missing the leading slash all along 😛. Let's keep check_uri function unchanged and address that problem in issue #101.

This commit changes shotgun:open to optionally accept a map
containing Ranch transport options (keyed to transport_opts)
and/or a gun:await_up timeout (keyed to timeout). The default
values are [] and 5000, respectively.

shotgun:open now *blocks* until either gun establishes a connection,
or the user-specified timeout. This makes it more difficult to
send a request to gun before gun has established a connection with
the remote host.

If the timeout is reached, the FSM will be stopped with reason
'gun_open_timeout'. If gun fails while attempting to establish a connection,
the FSM will be stopped with reason 'gun_open_failed'. See the comments near
the code that handles this stuff for some somewhat important information.

In addition to the changes surrounding the use of gun:await_up, one can now
pass options through to the underlying Ranch transport. This allows you to
-for instance- bind a particular shotgun connection to a particular interface
on a multi-homed machine.

Existing users of open/2 or open/3 will not be broken by this change.
@kennethlakin
Copy link
Contributor Author

Sorry. I got off to a late start today.

So. Because I was going to be rebasing to remove my changes to check_uri, I took the liberty of reordering and merging some commits, replacing my hand-defined queue type with the queue:queue() type, and moving the redundant code in the $FSM_STATE/3 functions into one place (as you were gently suggesting that I do).

My little test program mentioned here #96 (comment) still works as expected.

I will be fixing up the text of the PR to reflect the new changes in a few moments.

I had assumed that one could use the Github branch diffing tool to easily review the differences between the current request-fixes branch and its previous state (saved as the request-fixes-old-PR-100 branch). But, despite Github telling me that the tip of both branches is the same as what's on my local machine, the results of its diff tool are radically different than the diff that git gives me. I have no idea what I'm doing wrong.

EDIT: Bah. I forgot to put spaces around the '=' character in a function head that I added. Fixed.

So, here's the diff between the two branches:

$ git diff request-fixes-old-PR-100 request-fixes
diff --git a/src/shotgun.erl b/src/shotgun.erl
index e1a8d10..9c16fbf 100644
--- a/src/shotgun.erl
+++ b/src/shotgun.erl
@@ -64,8 +64,6 @@
          receive_chunk/3
         ]).

--type queue() :: [].
-
 -type response() ::
         #{
            status_code => integer(),
@@ -405,24 +403,12 @@ at_rest({HttpVerb, Args, From}, State = #{pid := Pid}) ->

 -spec at_rest(term(), pid(), term()) -> term().
 at_rest(Event, From, State) ->
-  case create_work(Event, From) of
-    {ok, Work} ->
-      NewState = append_work(Work, State),
-      {next_state, at_rest, NewState, 0};
-    not_work ->
-      {stop, {unexpected, Event}, State}
-  end.
+  enqueue_work_or_stop(at_rest, Event, From, State).

 %% @private
 -spec wait_response(term(), pid(), term()) -> term().
 wait_response(Event, From, State) ->
-  case create_work(Event, From) of
-    {ok, Work} ->
-      NewState = append_work(Work, State),
-      {next_state, wait_response, NewState};
-    not_work ->
-      {stop, {unexpected, Event}, State}
-  end.
+  enqueue_work_or_stop(wait_response, Event, From, State).

 %% @private
 -spec wait_response(term(), term()) -> term().
@@ -468,13 +454,7 @@ wait_response(Event, State) ->
 %% @private
 -spec receive_data(term(), pid(), term()) -> term().
 receive_data(Event, From, State) ->
-  case create_work(Event, From) of
-    {ok, Work} ->
-      NewState = append_work(Work, State),
-      {next_state, receive_data, NewState};
-    not_work ->
-      {stop, {unexpected, Event}, State}
-  end.
+  enqueue_work_or_stop(receive_data, Event, From, State).

 %% @private
 %% @doc Regular response
@@ -502,13 +482,7 @@ receive_data({gun_error, _Pid, StreamRef, _Reason},
 %% @private
 -spec receive_chunk(term(), pid(), term()) -> term().
 receive_chunk(Event, From, State) ->
-  case create_work(Event, From) of
-    {ok, Work} ->
-      NewState = append_work(Work, State),
-      {next_state, receive_chunk, NewState};
-    not_work ->
-      {stop, {unexpected, Event}, State}
-  end.
+  enqueue_work_or_stop(receive_chunk, Event, From, State).

 %% @private
 %% @doc Chunked data response
@@ -536,7 +510,7 @@ clean_state() ->

 %% @private
 -spec clean_state(map()) -> map();
-                 (queue()) -> map().
+                 (queue:queue()) -> map().
 clean_state(State) when is_map(State) ->
   clean_state(get_pending_reqs(State));

@@ -649,6 +623,26 @@ sse_events(IsFin, Data, State = #{buffer := Buffer}) ->
     end.

 %% @private
+check_uri([$/ | _]) -> ok;
+check_uri(_) -> throw(missing_slash_uri).
+
+%% @private
+enqueue_work_or_stop(FSM = at_rest, Event, From, State) ->
+  enqueue_work_or_stop(FSM, Event, From, State, 0);
+enqueue_work_or_stop(FSM, Event, From, State) ->
+  enqueue_work_or_stop(FSM, Event, From, State, infinity).
+
+%% @private
+enqueue_work_or_stop(FSM, Event, From, State, Timeout) ->
+  case create_work(Event, From) of
+    {ok, Work} ->
+      NewState = append_work(Work, State),
+      {next_state, FSM, NewState, Timeout};
+    not_work ->
+      {stop, {unexpected, Event}, State}
+  end.
+
+%% @private
 create_work({M = get_async, {HandleEvent, AsyncMode}, Args}, From) ->
   {ok, {M, {HandleEvent, AsyncMode}, Args, From}};
 create_work({M, Args}, From)
@@ -682,9 +676,3 @@ append_work(Work, State) ->
 get_pending_reqs(State) ->
   maps:get(pending_requests, State).

-%% @private
-check_uri(U) ->
-  case string:chr(U, $/) of
-    0 -> throw(missing_slash_uri);
-    _ -> ok
-  end.

This commit adds a request queue to the shotgun FSM and three argument
versions of each FSM state function whose only purpose is to ether
stick valid client request data into the work queue and get back to
waiting, or stop the FSM if the client request data was invalid.
create_work/2 describes valid and invalid requests.

This lets us have multiple callers -directly or indirectly- call
shotgun:request while a gun request is still pending. Previously,
if the call happened when we were in any state other than at_rest,
the FSM would probably crash. This also means that if one's request
times out, one can immediately re-submit that request without
crashing shotgun. However, there *are* a few caveats:

* shotgun only processes one request at a time.
* Your new request is put on the end of the request queue.
* Your *previous* request is not cancelled. If shotgun never got
  around to servicing it, or is not done servicing it, then your
  old request that you no longer care about will be serviced before
  your new request.
* If the process that initiated a timed-out request is still alive,
  it will get a message in its mailbox when the request eventually
  succeeds. It also might get a message when the request eventually
  fails, I'm not sure.

The guts of at_rest/3 have been moved to at_rest/2. at_rest/3 now only
sticks client request data into the work queue and triggers the call
of at_rest/2.

Every transition from another state to at_rest now includes a timeout
value of 0. The purpose of at_rest(timeout, State) is to check for work
and either dispatch it to the FSM, or go back to the idle at_rest state.

There are also a few code style changes to appease Elvis.
jfacorro added a commit that referenced this pull request Oct 23, 2015
@jfacorro jfacorro merged commit 1b5bb0d into inaka:master Oct 23, 2015
@jfacorro
Copy link
Contributor

@kennethlakin Merged! 🎉 Thanks for your contribution.

We need to create tests for this library as soon as possible since it is getting bigger and quite complex.

@unbalancedparentheses
Copy link

👍

@kennethlakin kennethlakin deleted the request-fixes branch November 4, 2015 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants