-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hackney hangs in connect #183
Comments
It appears to be blocked in
|
...but the pool appears to have not noticed the call -- still sitting in
|
Any ideas how I can debug this further? |
Hmmm. If I reset hackney in
I'd still like to have a proper solution, though... Update
Spoke too soon 😞 Update update Could be a red herring. I've been playing with CT timetrap and various other things that timeout. Restarting hackney appears to work. |
Are you correctly releasing the sockets in your tests? Ie. by reading the body? Anyway there is a new pool system coming among other changes. I had to put that work in parentheses these last 2 weeks but I expect to release the new hackney version at the end of the week. Amon changes the number of connection is not limited, the pool is a lot more simple on that point. Also lt improve the handling of ssl connections. |
I believe so. Is there any way to find out which sockets are still open / bodies still unread by poking around in the state? I've got a lot of tests, and it'd be useful to figure out which ones aren't cleaning up. It'd be nice if -- for simple scenarios -- I didn't need to worry about it; #184 |
I guess my question wrt debugging this is:
|
Sorry somehow I didn't notice ay notification on that topic. Pooling works that way in 1.x.x:
If the process handling the request goes down the socket is closed. To debug you can use the trace facility available in 1.1.0 or the metrics to check the numbe of sockets running, waiting in the pool, ..
|
This is a temporary hackey workaround for benoitc/hackney#183
Temp workaround for benoitc/hackney#183
@benoitc Is using the
|
@zyro using this option will have the same result as getting the body. So yes once the body is fetch the socket is released to the pool. |
I had a similar problem, not getting bodies for all requests thus consuming the whole pool eventually. Would it be possible to raise an exception in this case in stead of timing out forever? That would have helped a lot in troubleshooting. |
I've also had similar problems. I've worked around it by doing all requests without a pool. |
I think we need to use anything else instead of The problem comes from |
@lexmag That's not a solution to the underlying problem (if I understand the problem that is). The problem is that all sockets in the pool are busy because they hang somewhere, the pool is essentially empty forever. Adding a timeout to the checkout wont help because the checkout would never work until the pool is restarted. Non-infinity timeouts are almost always the best idea, but with this design infinity is needed. If the call timeouts just as the pool replies with a socket you will have a leak. |
@ericmj I'm actually not saying it'll solve this problem anyhow. 😄 |
right now the pool expect that all requests release or close their sockets . the max limit really means that you can't have more sockets out of the pool until you release them or notify the pool to ignore them (close). this is how it was redesigned years ago to answer to some users needs (not running out of fds). I do not think it should be handle that way today. There are probably better solutions to fix it directly in your app. The coming version (first bits will land tonight) is fixing it, removing the barrier and only solving the following:
I will commit a branch later in the day. any tests will be much appreciated :) |
Hello, Any news on this problem or workaround . We are using the library in production and we have found a lot of processes blocked by this issue Silviu |
@silviucpp do you let any requsts open? (ie not reading the socket? i have a coming patch but it took more time than expected. |
No I don't :) I just call : hackney:request(post, Url, Headers, Payload, Options), And this method is never return time to time. Looking to the process where is blocked I see Silviu |
|
fwiw, I finally found one spot in the system tests (mentioned in the original description) where we weren't correctly reading the body. Being able to control the timeout, or being able to inspect the pool statistics would be incredibly useful. |
Thanks for feedback ! make sens we are reading the body only in case the result was 20x but not for errors :) I will review my code |
I checked my code and looks like: {ok, StatusCode, _RespHeaders, ClientRef} = hackney:request(Method, Url, Headers, PayloadJson, Options), {ok, Body} = hackney:body(ClientRef), So the body is not executed only if: 1.the hackney throw an exception I don't think in this case you need to call body/1... |
@rlipscombe you can use the metrics but it can be definitely improved. Which kind of timeout? the connect timeout? @silviucpp how many connections do you requests at the same time? Current implementation block at max pool size until a connection become available. i will have a new connection system available for tests by tomorrow anyway. will update the thread asap with it |
Hello, My pool looks like:
I see processes freezed in Also I see other connections running fine so only some of the get freezed.. The big timeout might be the problem ? Silviu |
@silviucpp I was seeing exactly the same with hex. |
which timeout are you talking about? |
The one from start_tool |
the default keepalive timeout can be changed in the application environment. It's the "timeout" setting. |
@silviucpp did you try to change it? also did you enable the metrics? i would be interrested by the pool metrics. |
No I didn't tried to change this. I have metrics enabled in production using folsom. Let me know what exactly do you need. Silviu |
if you can provide me a sample of the connect time and the metrics from the I will be blocked in a flight during the next 11h anyway so will have On Tue, 6 Oct 2015 at 08:54, Silviu Caragea notifications@github.com
|
Hello, On the default pool I see at this moment on the node where the stats are generated around 10 frezeed processes. Stats are: io:format(<<"~p">>,[folsom_metrics:get_metric_value([hackney_pool, default, take_rate ])]).
[]
io:format(<<"~p ~n">>,[folsom_metrics:get_metric_value([hackney_pool, default, no_socket ])]).
74
io:format(<<"~p">>,[folsom_metrics:get_metric_value([hackney_pool, default, in_use_count])]).
[11,16,9,5,30,49,40,38,36,14,31,16,26,38,7,42,17,13,29,0,27,28,37,10,40,18,15,
10,8,26,22,45,34,42,33,44,24,4,21,23,2,36,32,22,46,28,48,6,6,12,34,12,19,14,
18,48,30,25,2,32,41,47,11,43,44,46,24,4,1,3,20,20,39,35]
io:format(<<"~p ~n">>,[folsom_metrics:get_metric_value([hackney_pool, default, free_count ])]).
[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
io:format(<<"~p ~n">>,[folsom_metrics:get_metric_value([hackney_pool, default, queue_counter ])]).
[] |
Something interesting as wel. I removed my node from production and just looking to the following stats: folsom_metrics:get_metric_value([hackney, nb_requests]).
63
folsom_metrics:get_metric_value([hackney, total_requests]).
62408
folsom_metrics:get_metric_value([hackney, finished_requests]).
62345 I'm sure at this moment there is no request in progress and I was expecting nb_request to be 0... |
supersed by #276 . Thanks for all the feedback. WIP :) |
I'm using hackney in my system tests. Each suite starts a web server (implemented in node.js) and then I use hackney to talk to that web server.
After a number of tests have run, hackney simply hangs, and I have to wait for CT to kill the test suite.
I use httpc:get() to ensure that the web server is up before running the tests, so that's not the problem.
The text was updated successfully, but these errors were encountered: