Resource leak with aleph.http/create-connection and unreachable hosts #152

benmoss · 2015-02-17T21:58:05Z

I'm not sure if this is really a problem with aleph or with manifold, but posting it here.

The original problem I had was in a test, where I spin up an http server in a daemon, run some tests against it, and then shut it down. The problem first manifested itself as the tests passing the first time, and then failing with timing out http requests on subsequent runs with the same JVM process. A lot of tracing later, and I've boiled it down to just this gist: https://gist.github.com/benmoss/79acf300d8d2ba573648.

When you run lookup on an unreachable host, like in the comments at the bottom, you'll see the traced aleph.http/create-connection getting called repeatedly with pauses in between. If you use a reachable host, it doesn't show this problem. My hunch is that something in timeout! isn't working properly in fully cancelling the repeat attempts on http/request.

The text was updated successfully, but these errors were encountered:

ztellman · 2015-02-17T22:12:01Z

Yes, timeout! will affect the underlying response deferred, but not necessarily the entire request machinery. Have you tried setting the :pool-timeout, :connection-timeout, or :request-timeout on the request itself?

benmoss · 2015-02-19T01:01:01Z

I hadn't, but trying them now with a timeout value of 1 on all 3 of them I am still seeing requests continue to go out via create-connection long after the function was called and the deferred resolved.

ztellman · 2015-02-19T01:06:12Z

Okay, I'll investigate further, thanks.
On Feb 18, 2015 5:01 PM, "Ben Moss" notifications@github.com wrote:

I hadn't, but trying them now with a timeout value of 1 on all 3 of them I
am still seeing requests continue to go out via create-connection long
after the function was called and the deferred resolved.

—
Reply to this email directly or view it on GitHub
#152 (comment).

benmoss · 2015-02-19T03:23:42Z

From my investigation so far:

Triggering it by doing a get request to "0.0.0.0:6666", any local port with nothing running works
Doesn't actually have anything to do with manifold, all the code I have is now just

(defn lookup [url]
  (http/get url {:pool-timeout 1
                 :request-timeout 1
                 :connection-timeout 1}))

Seems to only occur upon repeated failed requests
the shouldIncrement and the generate fns seem to be called once per request the first two times my lookup fn is called, but then seem to be called 3-4 times on subsequent calls. shouldIncrement is always true. After some number of calls (6-7), it seems shouldIncrement and generate just keep getting called.

ztellman · 2015-02-19T03:30:26Z

I'm very close to having a fix for this, thanks for digging into it, though.

ztellman · 2015-02-19T03:38:46Z

This looks to have been some fallout from #140 which I should have caught. I've made it so that your original code, which applied timeout! to the response, will short-circuit acquiring a connection and sending the request if you do it in time.

I don't see any further room for this sort of issue, but it's possible there's still some corner case lurking. Please let me know if you see anything else.

benmoss · 2015-02-19T04:27:20Z

Great, thank you!

Gonzih · 2016-03-03T10:35:01Z

Hi guys,

Was this change released? I just spotted another issue that workers are leaking in the connection pool if in the middle of request I remove target server (so connection is not closed properly). After that num-workers stat fluctuates between two values constantly (even when there are no requests coming to the server). So in stats logs i see something like that:

connections.localhost-9002 -> 620
connections.localhost-9002 -> 557
connections.localhost-9002 -> 620
connections.localhost-9002 -> 557
connections.localhost-9002 -> 620
connections.localhost-9002 -> 557
connections.localhost-9002 -> 620
connections.localhost-9002 -> 557

I tried to use all timeout options without any success.
I was able to reproduce that by just creating a lot of http/get requests to some simple web server that was just doing Thread/sleep and blocking them. Before timeout happens I just send kill -9 to the target server to simulate failure and after some time not all connections are released. I started digging in to that because I spotted that resources are slowly leaking from my server until there ane no available workers left. Is this related to this issue? Should I create a separate issue? Am I missing something here?

Thanks a lot!

UPD

This is minimal code snippet that I was able to reproduce the issue with:

(defonce client-connection-pool
  (aleph/connection-pool
   {:response-executor
    (flow/utilization-executor
      0.9 512
      {:stats-callback (partial stats-callback :client)})
    :connections-per-host 7000
    :total-connections 8000
    :target-utilization 0.9
    :stats-callback connections-stats-callback ; dumps stats in to a file
    :connection-options {:keep-alive? false}}))

(->> 15000
     range
     (map (fn [_]
            (aleph/get "http://localhost:4567"
                       {:follow-redirects false
                        :throw-exceptions false
                        :pool client-connection-pool
                        :pool-timeout 1e4
                        :connection-timeout 1e4
                        :request-timeout 4e4})))
     doall
     (map (fn [_] nil)))

Server:

require 'sinatra'

get '/' do
  sleep 120
  "HI"
end

Killing ruby process in the middle of execution leaks workers in the pool.

ztellman · 2016-03-03T20:03:13Z

This fix was included in 0.4.0, so I don't think these two issues are the same.

I'm a little unclear on the nature of your issue, though. You mention num-workers, which I think relates to the response executor, but also the number of connections, which is unrelated to the threads that are actively processing the requests. Are the total connections for the pool being exhausted, or threads in the response executor?

Gonzih · 2016-03-03T20:05:47Z

Connections for the connection pool are being exhausted.

ztellman · 2016-03-03T21:34:51Z

Okay, can you open a new issue for this? I'll take a look.

Gonzih · 2016-03-03T21:38:28Z

created #217, thanks!

ztellman closed this as completed in cb46791 Feb 19, 2015

DerGuteMoritz mentioned this issue Feb 12, 2024

Support cancellation of HTTP requests #712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource leak with aleph.http/create-connection and unreachable hosts #152

Resource leak with aleph.http/create-connection and unreachable hosts #152

benmoss commented Feb 17, 2015

ztellman commented Feb 17, 2015

benmoss commented Feb 19, 2015

ztellman commented Feb 19, 2015

benmoss commented Feb 19, 2015

ztellman commented Feb 19, 2015

ztellman commented Feb 19, 2015

benmoss commented Feb 19, 2015

Gonzih commented Mar 3, 2016

ztellman commented Mar 3, 2016

Gonzih commented Mar 3, 2016

ztellman commented Mar 3, 2016

Gonzih commented Mar 3, 2016

Resource leak with aleph.http/create-connection and unreachable hosts #152

Resource leak with aleph.http/create-connection and unreachable hosts #152

Comments

benmoss commented Feb 17, 2015

ztellman commented Feb 17, 2015

benmoss commented Feb 19, 2015

ztellman commented Feb 19, 2015

benmoss commented Feb 19, 2015

ztellman commented Feb 19, 2015

ztellman commented Feb 19, 2015

benmoss commented Feb 19, 2015

Gonzih commented Mar 3, 2016

ztellman commented Mar 3, 2016

Gonzih commented Mar 3, 2016

ztellman commented Mar 3, 2016

Gonzih commented Mar 3, 2016