unexpected connection_lost() call when cancelling a direct_tls connection with an active request can lead to a connection leak

* **asyncpg version**: 0.30.0
* **PostgreSQL version**: 16
* **Do you use a PostgreSQL SaaS?  If so, which?  Can you reproduce
  the issue with a local PostgreSQL install?**: CloudSQL with [cloud sql connector](https://github.com/GoogleCloudPlatform/cloud-sql-python-connector), but it is possible to reproduce it locally with a direct_tls setup. 
* **Python version**: 3.12
* **Platform**: linux / macos
* **Do you use pgbouncer?**: no
* **Did you install asyncpg with pip?**: poetry
* **If you built asyncpg locally, which version of Cython did you use?**:
* **Can the issue be reproduced under both asyncio and
  [uvloop](https://github.com/magicstack/uvloop)?**:  the reproducer uses asyncio, but I hit the problem "in prod" with uvicorn / uvloop

Hi, I reported the issue first to sqlalchemy (https://github.com/sqlalchemy/sqlalchemy/issues/12099) but managed to reproduce it using only asyncpg.

From what I could observe, if a `direct_tls` connection is waiting for postgres to reply and the task within which this connection lives is cancelled, calling `connection.close()` in an exception handler will fail with `unexpected connection_lost() call`, leaving the connection open. (a call to `connection.terminate()` after that doesn't close the connection either)

The problem only occurs with `direct_tls=True`.
The issue was first observed with sqlalchemy with a connection pool of asyncpg connections.
SA thought the connections were closed and would then open new ones, this lead to using up all the available slots on the pg side.

This script can reproduce the issue, however, it needs a postgres setup where `direct_tls` can be used.
There a docker-compose file at https://github.com/brittlesoft/repro-starlette-sa-conn-leak that can be used to get a working setup quickly.

```
import asyncio
import asyncpg

import logging
logging.basicConfig(level=logging.DEBUG)

async def do(i):
    try:
        # connect using direct_tls leads to `unexpected connection_lost() call` when calling conn.close()
        conn = await asyncpg.connect('postgresql://postgres:postgres@localhost:5443/postgres', direct_tls=True)

        # connect using default params works fine
        #conn = await asyncpg.connect('postgresql://postgres:postgres@localhost:5432/postgres')

        # lock and simulate work (or block if lock already taken) -- using select pg_sleep(10) would also work
        await conn.execute("select pg_advisory_lock(1234)")
        await asyncio.sleep(10)

    except BaseException as e:
        print(i,"got exc:", e, type(e))
        try:
            await conn.close(timeout=2)
        except BaseException as e:
            print(i, "close got exc: ",e)

            # NOTE: aborting transport here seems to release the connection
            #conn._transport.abort()

            try:
                print(i, "calling terminate")
                conn.terminate()
            except BaseException as e:
                print(i, "terminate got exc: ",e)


async def main():
    ts = []
    for i in range(10):
        ts.append(asyncio.create_task(do(i)))

    async def timeouter():
        await asyncio.sleep(1)
        for t in ts:
            t.cancel()

    asyncio.create_task(timeouter())

    try:
        await asyncio.gather(*ts)
    except asyncio.CancelledError:
        print("cancelled")

    # Sleep so we can observe the state of the connections
    await asyncio.sleep(30)

if __name__ == '__main__':
    asyncio.run(main())
```


output:
```
DEBUG:asyncio:Using selector: KqueueSelector
0 got exc:  <class 'asyncio.exceptions.CancelledError'>
1 got exc:  <class 'asyncio.exceptions.CancelledError'>
2 got exc:  <class 'asyncio.exceptions.CancelledError'>
3 got exc:  <class 'asyncio.exceptions.CancelledError'>
4 got exc:  <class 'asyncio.exceptions.CancelledError'>
5 got exc:  <class 'asyncio.exceptions.CancelledError'>
6 got exc:  <class 'asyncio.exceptions.CancelledError'>
7 got exc:  <class 'asyncio.exceptions.CancelledError'>
8 got exc:  <class 'asyncio.exceptions.CancelledError'>
9 got exc:  <class 'asyncio.exceptions.CancelledError'>
1 close got exc:  unexpected connection_lost() call
1 calling terminate
7 close got exc:  unexpected connection_lost() call
7 calling terminate
0 close got exc:  unexpected connection_lost() call
0 calling terminate
4 close got exc:  unexpected connection_lost() call
4 calling terminate
5 close got exc:  unexpected connection_lost() call
5 calling terminate
3 close got exc:  unexpected connection_lost() call
3 calling terminate
9 close got exc:  unexpected connection_lost() call
9 calling terminate
6 close got exc:  unexpected connection_lost() call
6 calling terminate
8 close got exc:  unexpected connection_lost() call
8 calling terminate
```

And it postgres we see this:
```
 2024-11-15 13:29:02.164078+00 | 2024-11-15 13:29:02.164078+00 | 2024-11-15 13:29:02.164078+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.188851+00 | 2024-11-15 13:29:02.188851+00 | 2024-11-15 13:29:02.188852+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
                               | 2024-11-15 13:29:02.163992+00 | 2024-11-15 13:29:03.04934+00  | Client          | ClientRead          | idle   |             |              |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.163992+00 | 2024-11-15 13:29:02.163992+00 | 2024-11-15 13:29:02.163994+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.189558+00 | 2024-11-15 13:29:02.189558+00 | 2024-11-15 13:29:02.189558+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.188884+00 | 2024-11-15 13:29:02.188884+00 | 2024-11-15 13:29:02.188885+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.188826+00 | 2024-11-15 13:29:02.188826+00 | 2024-11-15 13:29:02.188829+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.189478+00 | 2024-11-15 13:29:02.189478+00 | 2024-11-15 13:29:02.189479+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
 2024-11-15 13:29:02.19116+00  | 2024-11-15 13:29:02.19116+00  | 2024-11-15 13:29:02.191161+00 | Lock            | advisory            | active |             |         1584 |          | select pg_advisory_lock(1234)    | client backend
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unexpected connection_lost() call when cancelling a direct_tls connection with an active request can lead to a connection leak #1211

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unexpected connection_lost() call when cancelling a direct_tls connection with an active request can lead to a connection leak #1211

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions