Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool leak issue - 1.0 #486

Closed
salahqasem opened this issue Oct 16, 2019 · 2 comments
Closed

Connection pool leak issue - 1.0 #486

salahqasem opened this issue Oct 16, 2019 · 2 comments
Assignees
Labels

Comments

@salahqasem
Copy link
Contributor

in Styx 1.0 there is a connection pool leak issue, the pool sometime stuck in a state where is the borrowed connection count equal to the MaxConnectionPerHost count, in this state no new connections can be created and the styx will start returning 500s.

Steps to reproduce:

1- create a connection pool with:
a- low number of MaxConnectionPerHost & MaxPendingConnections
b- low number of MaxPendingConnectionTimeOut for example 1s

2- set a breakpoint in the SimpleConnectionPool class at line 77 Connection connection = dequeue(); inside the borrowConnection method.

3- open the browser and open an N number of tabs where N = MaxConnectionPerHost + MaxPendingConnections.

4- make a request for the origin in each tab.

5- start releasing the breakpoint one by one.

After these steps you should start seeing the behaviour of the 500s. any request you will make after this will return a 500 response code.

@mikkokar
Copy link
Contributor

mikkokar commented Oct 17, 2019

Root cause analysis

I see a problem in the connection pool (thanks for providing the necessary pointers @salahqasem ). There is definitely a window for a race condition, when:

Overview

  1. A waiting subscriber has been disposed (cancelled, pending connection timeout, etc), but not yet removed from the waitingSubscribers queue.
  2. A TCP connection becomes available, and it is handed out to this waiting subscriber. A borrowed count is incremented, but the connection itself leaks because the Mono subscriber is already in the terminal state.
  3. The leaked connection cannot be returned back to pool, therefore the borrowed count drifts up permanently.

A connection can become available in two ways:

  • Borrower returns a connection back
  • A TCP connection establishment completes

Threading model

borrowConnection is usually called from styx-proxy thread pool. In some corner cases it can be called from the styx-client threads.

When a TCP establishment completes, queueNewConnection is called from styx-client threads.

Max pending connection timeout timer runs from Reactor core computation threads.

Code

This is where a newly available connection is handed out to a waiting subscriber. If the subscriber is already in terminal state, the connection leaks, and the borrowedCount drifts up by one:

     private void queueNewConnection(Connection connection) {
         MonoSink<Connection> subscriber = waitingSubscribers.poll();
         if (subscriber == null) {
             availableConnections.add(connection);
>        } else {
>            borrowedCount.incrementAndGet();
>            subscriber.success(connection);
>        }
    }

@OwenLindsell
Copy link
Contributor

Fixed by PR #504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants