Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple query runs long when DB schema contains thousands of tables #186

Closed
sergeyspatar opened this issue Sep 7, 2017 · 7 comments
Closed
Assignees
Labels

Comments

@sergeyspatar
Copy link

  • asyncpg version: 0.12
  • PostgreSQL version: 9.6.4
  • Python version: 3.5.3
  • Platform: Linux x86_64 (Debian Stretch)
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: no
  • If you built asyncpg locally, which version of Cython did you use?: 0.25.2
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : yes

My DB schema contains thousands of tables inherited from one table. These tables store some logs or historical information and have around 100 columns. As a result, PostgreSQL information schema table pg_attribute contains enormous number of records.

I use my own DB user sergey to connect to PostgreSQL who is a superuser. When I connect as this user, some simple queries run very long in asyncpg.

Preparation. This code creates 5000 inherited tables with 100 columns to prepare similar environment for the test cases below:

import asyncpg
import asyncio

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='sergey')
    await conn.execute(
        'create table testlog ({})'.format(
            ','.join('c{:02} varchar'.format(n) for n in range(100))
        )
    )
    for n in range(5000):
        await conn.execute(
            'create table testlog_{:04} () inherits (testlog)'.format(n)
        )

loop.run_until_complete(run())

Example 1: Connect as sergey and run this simple query:

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='sergey')
    return await conn.fetch('select $1::integer[]', [1, 2, 3])

print(loop.run_until_complete(run()))

While it was running for about 2 minutes, I saw this query in pg_activity_stat:

WITH RECURSIVE typeinfo_tree(
    oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim,
    range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth)
AS (
    WITH composite_attrs
    AS (
    SELECT
        c.reltype                    AS comptype_oid,
        array_agg(ia.atttypid ORDER BY ia.attnum)    AS typoids,
        array_agg(ia.attname::text ORDER BY ia.attnum)   AS names
    FROM
        pg_attribute ia
        INNER JOIN pg_class c
        ON (ia.attrelid = c.oid)
    WHERE
        ia.attnum > 0 AND NOT ia.attisdropped
    GROUP BY
        c.reltype
    ),

    typeinfo
    AS (
    SELECT
        t.oid               AS oid,
        ns.nspname              AS ns,
        t.typname               AS name,
        t.typtype               AS kind,
        (CASE WHEN t.typtype = 'd' THEN
        (WITH RECURSIVE typebases(oid, depth) AS (

Unfortunately, pg_activity_stat does not show the whole query but only first part. I guess the slow execution has to do with the size of pg_attribute.

Example 2: Now connect as user postgres and run the same query. It is still not instant but much faster (~1 second):

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='postgres')
    return await conn.fetch('select $1::integer[]', [1, 2, 3])

print(loop.run_until_complete(run()))

Example 3: Not all queries execute long. This one is fast:

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='sergey')
    return await conn.fetch('select $1::integer', 1)

print(loop.run_until_complete(run()))

P.S. I tried vacuum full analyze pg_attribute and vacuum full analyze pg_class, it did not help.

@elprans
Copy link
Member

elprans commented Sep 7, 2017

Yes, seems to be a pathology in the type introspection query.

@elprans elprans added the bug label Sep 7, 2017
@elprans elprans self-assigned this Sep 7, 2017
elprans added a commit that referenced this issue Sep 7, 2017
Stop using CTEs in the type introspection query, otherwise it runs for
ages on databases with a large number of composite attributes (i. e.
tons of tables with tons of columns).

Fixes: #186
@sergeyspatar
Copy link
Author

@elprans, I tested your fix. Now it's blazing fast. Thank you for quick fixing!

@shvechikov
Copy link

I'm having the same issue with the last version of asyncpg (0.19.0). Still thinking on how to reproduce it and share.

@elprans
Copy link
Member

elprans commented Nov 8, 2019

@shvechikov Are you running postgres with jit = on? If so, that might be what's causing latency on cold connections.

@s-ilyas
Copy link

s-ilyas commented Apr 8, 2020

I'm on asyncpg (0.19.0) and also still seeing this in my auto_explain generated log.
I am investigating a slow query and have suspicions (not fully sure yet) that this may be a culprit.
Would be immensely grateful if @elprans you could assist!

LOG:  duration: 1.058 ms  plan:
	Query Text: WITH RECURSIVE typeinfo_tree(
	    oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim,
	    range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth)
	AS (
	    SELECT
	        ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.has_bin_io,
	        ti.elemtype, ti.elemdelim, ti.range_subtype, ti.elem_has_bin_io,
	        ti.attrtypoids, ti.attrnames, 0
	    FROM
	            (
	        SELECT
	            t.oid                           AS oid,
	            ns.nspname                      AS ns,
	            t.typname                       AS name,
	            t.typtype                       AS kind,
	            (CASE WHEN t.typtype = 'd' THEN
	                (WITH RECURSIVE typebases(oid, depth) AS (
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        0                   AS depth
	                    FROM
	                        pg_type t2
	                    WHERE
	                        t2.oid = t.oid
	
	                    UNION ALL
	
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        tb.depth + 1        AS depth
	                    FROM
	                        pg_type t2,
	                        typebases tb
	                    WHERE
	                       tb.oid = t2.oid
	                       AND t2.typbasetype != 0
	               ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
	
	               ELSE NULL
	            END)                            AS basetype,
	            t.typreceive::oid != 0 AND t.typsend::oid != 0
	                                            AS has_bin_io,
	            t.typelem                       AS elemtype,
	            elem_t.typdelim                 AS elemdelim,
	            range_t.rngsubtype              AS range_subtype,
	            (CASE WHEN t.typtype = 'r' THEN
	                (SELECT
	                    range_elem_t.typreceive::oid != 0 AND
	                        range_elem_t.typsend::oid != 0
	                FROM
	                    pg_catalog.pg_type AS range_elem_t
	                WHERE
	                    range_elem_t.oid = range_t.rngsubtype)
	            ELSE
	                elem_t.typreceive::oid != 0 AND
	                    elem_t.typsend::oid != 0
	            END)                            AS elem_has_bin_io,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.atttypid ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrtypoids,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.attname::text ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrnames
	        FROM
	            pg_catalog.pg_type AS t
	            INNER JOIN pg_catalog.pg_namespace ns ON (
	                ns.oid = t.typnamespace)
	            LEFT JOIN pg_type elem_t ON (
	                t.typlen = -1 AND
	                t.typelem != 0 AND
	                t.typelem = elem_t.oid
	            )
	            LEFT JOIN pg_range range_t ON (
	                t.oid = range_t.rngtypid
	            )
	    )
	 AS ti
	    WHERE
	        ti.oid = any($1::oid[])
	
	    UNION ALL
	
	    SELECT
	        ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.has_bin_io,
	        ti.elemtype, ti.elemdelim, ti.range_subtype, ti.elem_has_bin_io,
	        ti.attrtypoids, ti.attrnames, tt.depth + 1
	    FROM
	            (
	        SELECT
	            t.oid                           AS oid,
	            ns.nspname                      AS ns,
	            t.typname                       AS name,
	            t.typtype                       AS kind,
	            (CASE WHEN t.typtype = 'd' THEN
	                (WITH RECURSIVE typebases(oid, depth) AS (
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        0                   AS depth
	                    FROM
	                        pg_type t2
	                    WHERE
	                        t2.oid = t.oid
	
	                    UNION ALL
	
	                    SELECT
	                        t2.typbasetype      AS oid,
	                        tb.depth + 1        AS depth
	                    FROM
	                        pg_type t2,
	                        typebases tb
	                    WHERE
	                       tb.oid = t2.oid
	                       AND t2.typbasetype != 0
	               ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)
	
	               ELSE NULL
	            END)                            AS basetype,
	            t.typreceive::oid != 0 AND t.typsend::oid != 0
	                                            AS has_bin_io,
	            t.typelem                       AS elemtype,
	            elem_t.typdelim                 AS elemdelim,
	            range_t.rngsubtype              AS range_subtype,
	            (CASE WHEN t.typtype = 'r' THEN
	                (SELECT
	                    range_elem_t.typreceive::oid != 0 AND
	                        range_elem_t.typsend::oid != 0
	                FROM
	                    pg_catalog.pg_type AS range_elem_t
	                WHERE
	                    range_elem_t.oid = range_t.rngsubtype)
	            ELSE
	                elem_t.typreceive::oid != 0 AND
	                    elem_t.typsend::oid != 0
	            END)                            AS elem_has_bin_io,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.atttypid ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrtypoids,
	            (CASE WHEN t.typtype = 'c' THEN
	                (SELECT
	                    array_agg(ia.attname::text ORDER BY ia.attnum)
	                FROM
	                    pg_attribute ia
	                    INNER JOIN pg_class c
	                        ON (ia.attrelid = c.oid)
	                WHERE
	                    ia.attnum > 0 AND NOT ia.attisdropped
	                    AND c.reltype = t.oid)
	
	                ELSE NULL
	            END)                            AS attrnames
	        FROM
	            pg_catalog.pg_type AS t
	            INNER JOIN pg_catalog.pg_namespace ns ON (
	                ns.oid = t.typnamespace)
	            LEFT JOIN pg_type elem_t ON (
	                t.typlen = -1 AND
	                t.typelem != 0 AND
	                t.typelem = elem_t.oid
	            )
	            LEFT JOIN pg_range range_t ON (
	                t.oid = range_t.rngtypid
	            )
	    )
	 ti,
	        typeinfo_tree tt
	    WHERE
	        (tt.elemtype IS NOT NULL AND ti.oid = tt.elemtype)
	        OR (tt.attrtypoids IS NOT NULL AND ti.oid = any(tt.attrtypoids))
	        OR (tt.range_subtype IS NOT NULL AND ti.oid = tt.range_subtype)
	)
	
	SELECT DISTINCT
	    *
	FROM
	    typeinfo_tree
	ORDER BY
	    depth DESC
	
	Unique  (cost=264942.97..264970.66 rows=791 width=216)
	  CTE typeinfo_tree
	    ->  Recursive Union  (cost=8.84..264889.07 rows=791 width=216)
	          ->  Nested Loop Left Join  (cost=8.84..357.64 rows=1 width=216)
	                Join Filter: (t.oid = range_t.rngtypid)
	                ->  Nested Loop Left Join  (cost=8.84..23.79 rows=1 width=154)
	                      Join Filter: ((t.typlen = '-1'::integer) AND (t.typelem <> '0'::oid))
	                      ->  Nested Loop  (cost=8.57..15.48 rows=1 width=147)
	                            Join Filter: (t.typnamespace = ns.oid)
	                            ->  Seq Scan on pg_namespace ns  (cost=0.00..1.06 rows=6 width=68)
	                            ->  Materialize  (cost=8.57..14.25 rows=2 width=87)
	                                  ->  Bitmap Heap Scan on pg_type t  (cost=8.57..14.24 rows=2 width=87)
	                                        Recheck Cond: (oid = ANY ('{17456,17451}'::oid[]))
	                                        ->  Bitmap Index Scan on pg_type_oid_index  (cost=0.00..8.57 rows=2 width=0)
	                                              Index Cond: (oid = ANY ('{17456,17451}'::oid[]))
	                      ->  Index Scan using pg_type_oid_index on pg_type elem_t  (cost=0.28..8.29 rows=1 width=13)
	                            Index Cond: (t.typelem = oid)
	                ->  Seq Scan on pg_range range_t  (cost=0.00..1.06 rows=6 width=8)
	                SubPlan 2
	                  ->  Limit  (cost=214.85..214.85 rows=1 width=8)
	                        CTE typebases
	                          ->  Recursive Union  (cost=0.28..214.58 rows=11 width=8)
	                                ->  Index Scan using pg_type_oid_index on pg_type t2  (cost=0.28..8.29 rows=1 width=8)
	                                      Index Cond: (oid = t.oid)
	                                ->  Hash Join  (cost=0.33..20.61 rows=1 width=8)
	                                      Hash Cond: (t2_1.oid = tb.oid)
	                                      ->  Seq Scan on pg_type t2_1  (cost=0.00..20.25 rows=5 width=8)
	                                            Filter: (typbasetype <> '0'::oid)
	                                      ->  Hash  (cost=0.20..0.20 rows=10 width=8)
	                                            ->  WorkTable Scan on typebases tb  (cost=0.00..0.20 rows=10 width=8)
	                        ->  Sort  (cost=0.28..0.30 rows=11 width=8)
	                              Sort Key: typebases.depth DESC
	                              ->  CTE Scan on typebases  (cost=0.00..0.22 rows=11 width=8)
	                SubPlan 3
	                  ->  Index Scan using pg_type_oid_index on pg_type range_elem_t  (cost=0.28..8.30 rows=1 width=1)
	                        Index Cond: (oid = range_t.rngsubtype)
	                SubPlan 4
	                  ->  Aggregate  (cost=54.75..54.76 rows=1 width=32)
	                        ->  Nested Loop  (cost=0.28..54.72 rows=11 width=6)
	                              ->  Seq Scan on pg_class c  (cost=0.00..22.59 rows=2 width=4)
	                                    Filter: (reltype = t.oid)
	                              ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute ia  (cost=0.28..16.02 rows=5 width=10)
	                                    Index Cond: ((attrelid = c.oid) AND (attnum > 0))
	                                    Filter: (NOT attisdropped)
	                SubPlan 5
	                  ->  Aggregate  (cost=54.78..54.79 rows=1 width=32)
	                        ->  Nested Loop  (cost=0.28..54.72 rows=11 width=66)
	                              ->  Seq Scan on pg_class c_1  (cost=0.00..22.59 rows=2 width=4)
	                                    Filter: (reltype = t.oid)
	                              ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute ia_1  (cost=0.28..16.02 rows=5 width=70)
	                                    Index Cond: ((attrelid = c_1.oid) AND (attnum > 0))
	                                    Filter: (NOT attisdropped)
	          ->  Hash Left Join  (cost=28.32..26451.56 rows=79 width=216)
	                Hash Cond: (t_1.oid = range_t_1.rngtypid)
	                ->  Hash Left Join  (cost=27.19..165.69 rows=79 width=158)
	                      Hash Cond: (t_1.typelem = elem_t_1.oid)
	                      Join Filter: ((t_1.typlen = '-1'::integer) AND (t_1.typelem <> '0'::oid))
	                      ->  Nested Loop  (cost=1.14..139.44 rows=79 width=151)
	                            Join Filter: (((tt.elemtype IS NOT NULL) AND (t_1.oid = tt.elemtype)) OR ((tt.attrtypoids IS NOT NULL) AND (t_1.oid = ANY (tt.attrtypoids))) OR ((tt.range_subtype IS NOT NULL) AND (t_1.oid = tt.range_subtype)))
	                            ->  WorkTable Scan on typeinfo_tree tt  (cost=0.00..0.20 rows=10 width=44)
	                            ->  Materialize  (cost=1.14..24.10 rows=387 width=147)
	                                  ->  Hash Join  (cost=1.14..22.17 rows=387 width=147)
	                                        Hash Cond: (t_1.typnamespace = ns_1.oid)
	                                        ->  Seq Scan on pg_type t_1  (cost=0.00..18.80 rows=580 width=87)
	                                        ->  Hash  (cost=1.06..1.06 rows=6 width=68)
	                                              Buckets: 1024  Batches: 1  Memory Usage: 10kB
	                                              ->  Seq Scan on pg_namespace ns_1  (cost=0.00..1.06 rows=6 width=68)
	                      ->  Hash  (cost=18.80..18.80 rows=580 width=13)
	                            ->  Seq Scan on pg_type elem_t_1  (cost=0.00..18.80 rows=580 width=13)
	                ->  Hash  (cost=1.06..1.06 rows=6 width=8)
	                      ->  Seq Scan on pg_range range_t_1  (cost=0.00..1.06 rows=6 width=8)
	                SubPlan 7
	                  ->  Limit  (cost=214.85..214.85 rows=1 width=8)
	                        CTE typebases
	                          ->  Recursive Union  (cost=0.28..214.58 rows=11 width=8)
	                                ->  Index Scan using pg_type_oid_index on pg_type t2_2  (cost=0.28..8.29 rows=1 width=8)
	                                      Index Cond: (oid = t_1.oid)
	                                ->  Hash Join  (cost=0.33..20.61 rows=1 width=8)
	                                      Hash Cond: (t2_3.oid = tb_1.oid)
	                                      ->  Seq Scan on pg_type t2_3  (cost=0.00..20.25 rows=5 width=8)
	                                            Filter: (typbasetype <> '0'::oid)
	                                      ->  Hash  (cost=0.20..0.20 rows=10 width=8)
	                                            ->  WorkTable Scan on typebases tb_1  (cost=0.00..0.20 rows=10 width=8)
	                        ->  Sort  (cost=0.28..0.30 rows=11 width=8)
	                              Sort Key: typebases_1.depth DESC
	                              ->  CTE Scan on typebases typebases_1  (cost=0.00..0.22 rows=11 width=8)
	                SubPlan 8
	                  ->  Index Scan using pg_type_oid_index on pg_type range_elem_t_1  (cost=0.28..8.30 rows=1 width=1)
	                        Index Cond: (oid = range_t_1.rngsubtype)
	                SubPlan 9
	                  ->  Aggregate  (cost=54.75..54.76 rows=1 width=32)
	                        ->  Nested Loop  (cost=0.28..54.72 rows=11 width=6)
	                              ->  Seq Scan on pg_class c_2  (cost=0.00..22.59 rows=2 width=4)
	                                    Filter: (reltype = t_1.oid)
	                              ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute ia_2  (cost=0.28..16.02 rows=5 width=10)
	                                    Index Cond: ((attrelid = c_2.oid) AND (attnum > 0))
	                                    Filter: (NOT attisdropped)
	                SubPlan 10
	                  ->  Aggregate  (cost=54.78..54.79 rows=1 width=32)
	                        ->  Nested Loop  (cost=0.28..54.72 rows=11 width=66)
	                              ->  Seq Scan on pg_class c_3  (cost=0.00..22.59 rows=2 width=4)
	                                    Filter: (reltype = t_1.oid)
	                              ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute ia_3  (cost=0.28..16.02 rows=5 width=70)
	                                    Index Cond: ((attrelid = c_3.oid) AND (attnum > 0))
	                                    Filter: (NOT attisdropped)
	  ->  Sort  (cost=53.90..55.87 rows=791 width=216)
	        Sort Key: typeinfo_tree.depth DESC, typeinfo_tree.oid, typeinfo_tree.ns, typeinfo_tree.name, typeinfo_tree.kind, typeinfo_tree.basetype, typeinfo_tree.has_bin_io, typeinfo_tree.elemtype, typeinfo_tree.elemdelim, typeinfo_tree.range_subtype, typeinfo_tree.elem_has_bin_io, typeinfo_tree.attrtypoids, typeinfo_tree.attrnames
	        ->  CTE Scan on typeinfo_tree  (cost=0.00..15.82 rows=791 width=216)

@elprans
Copy link
Member

elprans commented Apr 8, 2020

duration: 1.058 ms

It doesn't seem to actually be slow for you.

@s-ilyas
Copy link

s-ilyas commented Apr 8, 2020

Ahhhh, I thought that was the time taken by the query planner! Turns out auto_explain measures duration even without analyze being on. Apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants