Skip to content

Prepared statements being recreated on every call of fetch #453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mklokocka opened this issue Jun 6, 2019 · 3 comments
Closed

Prepared statements being recreated on every call of fetch #453

mklokocka opened this issue Jun 6, 2019 · 3 comments

Comments

@mklokocka
Copy link

  • asyncpg version: 0.18.3
  • PostgreSQL version: 9.4
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : I use the official Docker image with custom schema and test data.
  • Python version: 3.7
  • Platform: MacOS
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: Yes
  • If you built asyncpg locally, which version of Cython did you use?: -
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : Yes

Hello,

I am facing a peculiar problem with the way prepared statements are handled. I use the following architecture:

aiohttp application, which initializes a pool of 1 to 20 db connections on init.

Data is periodically refreshed from the DB (once in a few minutes for most tables). I have a special class which handles the loading of data from DB and caches them to memory and to Redis (since multiple containers of the same app are running and I would like to minimize fetches from DB). This class is instantiated by a factory method which creates (besides other arguments) a load coroutine, which gets query passed into it by the factory.

The queries have no parameters and are static through out the runtime.

load functions works by getting a connection from the pool, and calling connection.fetch on the given query. As per my understanding, the query should then be turned into a prepared statement, cached into a builtin LRU cache, and reused in later calls. However, it seems that each call to load (which is periodic) gets a new LRU cache for some reason, creating the prepared statements anew. But when I run connection.fetch on SELECT * FROM pg_prepared_statements I see that the number of prepared statements held by the connection increases in each call of fetch.

Indeed, adding some prints to connection.py I found out that the statements get recreated and put into the cache on each call, since the cache is empty. I thought that perhaps it is because the connections I get from the pool differ, but since pg_prepared_statements is local to a session (a connection?) I think this is not the case. Indeed, limiting the size of the pool to max_size=1 did not solve this issue.

This causes my Postgres to slowly drain more and more memory until the connections are reset. Disabling the LRU cache with statement_cache_size=0 avoids this, but I believe that this behaviour is not intended.

I tried to make a minimal reproducer but haven't yet succeeded.

@mklokocka
Copy link
Author

Further experimentation proved that the problem lies in setting a custom type codec after getting a connection from the pool.

Minimal reproducer (it should not matter what postgres you connect to or which table you fetch from):

import asyncio
import asyncpg

dsn = "postgres://postgres:postgres@localhost:5432/postgres"

postgres_pool = None

async def init():
    print("init")
    global postgres_pool

    postgres_pool = await asyncpg.create_pool(dsn=dsn, min_size=1, max_size=20)

async def close():
    print("close")
    global postgres_pool

    await postgres_pool.close()

async def run_fetch():
    global postgres_pool

    while True:
        print("fetching")
        async with postgres_pool.acquire() as con:
            await con.set_type_codec("numeric", schema="pg_catalog", format="text", encoder=str, decoder=float)
            async with con.transaction():
                await con.fetch("SELECT * FROM test_table")
            stmts = await con.fetch("SELECT * FROM pg_prepared_statements")
            print(len(stmts))
        await asyncio.sleep(10)


async def run():
    await init()

    try:
        await run_fetch()
    except asyncio.CancelledError:
        await close()


if __name__ == "__main__":
    loop = asyncio.get_event_loop()

    loop.run_until_complete(run())

@mklokocka
Copy link
Author

The reason for this is

self._drop_local_statement_cache()

I see the solution to this is to use the init argument for a pool, as in #221 .

However, shouldn't the prepared statements already kept in Postgres be invalidated on codec change to avoid eating useless memory?

@un-def
Copy link
Contributor

un-def commented Jun 7, 2019

It has already been fixed, see issue #416 and PR #417.
However, a new version with that fix hasn't been released yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants