Change postgres `get_catalog` to not use `information_schema` #1540

elexisvenator · 2019-06-14T06:11:15Z

information_schema in Postgres is not very performant due to the complex views used to create it
use underlying pg_catalog tables/views instead
returns the same rows/columns as the information_schema version
order of rows is different, this is because there was only a partial sort on the information_schema version
column_type will return different values to before
- some arrays were ARRAY, will now be {type}[]
- user-defined types were previously USER_DEFINED, now will be the name of the user-defined type
  ^ main point of this PR
performance is 2-5x faster, depending on query caching

- `information_schema` in Postgres is not very performant due to the complex views used to create it - use underlying `pg_catalog` tables/views instead - returns the same rows/columns as the `information_schema` version - order of rows is different, this is because there was only a partial sort on the `information_schema` version - `column_type` will return different values to before - some arrays were `ARRAY`, will now be `type[]` - user-defined types were previously `USER_DEFINED`, now will be the name of the user-defined type <-- main point of this PR - performance is 2-5x faster, depending on query caching

elexisvenator · 2019-06-14T06:15:34Z

Docs with custom data types :D

elexisvenator · 2019-06-14T07:38:12Z

No sure what is failing in tests, looks like a date formatting issue? Doesn't seem related to the code I touched.

drewbanin · 2019-06-14T18:17:52Z

Thanks for the PR @elexisvenator - this looks really great! Don't worry about the integration tests - we'll kick those off for you.

I'd need to do some searching here, but I recall reading that different tables/views are present in pg_tables vs information_schema vs other system tables depending on the permissions that a given user has. Do you know anything about that? Is it something we'd need to consider here?

I'd like to play around with this one a little bit, but this looks very, very good :). Thanks again!

elexisvenator · 2019-06-15T00:11:22Z

It's interesting to check the where clauses of the other views to see how they work. For comparison, here is information_schema.tables:

  WHERE ((pg_class.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"])) 
    AND (NOT pg_is_other_temp_schema(npg_class.oid)) 
    AND (   pg_has_role(pg_class.relowner, 'USAGE'::text) 
         OR has_table_privilege(pg_class.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) 
         OR has_any_column_privilege(pg_class.oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)));

and here is the pg_catalog ones:

-- pg_catalog.pg_tables
  WHERE (pg_class.relkind = ANY (ARRAY['r'::"char", 'p'::"char"]));
-- pg_catalog.pg_views
  WHERE (pg_class.relkind = 'v'::"char");

So information_schema includes foreign tables, but then also does a bunch of privilege checks that the pg_catalog views dont. a relkind of p apparently means partitioned table but that wasn't actually documented in the postgres docs :/.

Happy to go back and add the permission checks and tweaks

drewbanin · 2019-06-17T15:56:01Z

I spent some time playing around with this and noticed that there are columns in the response that have names like ........pg.dropped.6......... Have you seen anything like that in your experiments? Looks like we can filter on col.attisdropped, but I'd be curious to know if there are any other subtle differences like this that might pop up

drewbanin · 2019-06-27T20:38:06Z

@elexisvenator I'd love to target this for a patch release like 0.14.1 -- is this something you think you'll have time to iterate on? I think it's very, very close, and @beckjake and I are big fans of your implementation :)

elexisvenator · 2019-06-27T22:55:22Z

Sure, ill tweak it on the weekend :)

I suspect the pg.dropped columns are getting cleaned up by the privilege checks, will experiment further

elexisvenator · 2019-07-01T23:13:50Z

So ran some more tests, these changes should stop the deleted columns appearing.

The one big thing that the information_schema views do that I left out is the table and column permission checks. They seem to be intended for a system where multiple users create their own isolated set of tables or even columns within the same database. This isn't really the type of scenario that dbt would be used in, and given that dbt ignores results for any schemas/tables that dont belong to things that dbt creates I dont see (the unlikely chance of) any extra results due to no permission checks being a problem.

drewbanin · 2019-07-02T14:26:20Z

Thanks @elexisvenator - I buy all of this. You're right - dbt will match up the relations it finds in the catalog with the models/sources/seeds/etc that exist in the dbt project, so I don't think visibility into the existence of objects that a user can't access should be a problem.

Let me kick off the tests here, then we should be good to merge this. We'll probably want to change the target branch - let me chat with @beckjake about that one.

v = view r, f, p = all are different forms of table

drewbanin · 2019-08-02T17:45:02Z

@elexisvenator I lost track of this one! Kicked off the tests here. When they pass, this will be ready to roll for (hopefully!) 0.14.1 :)

drewbanin · 2019-08-04T23:29:18Z

Merging this for dbt v0.14.1 -- thanks for your contribution (and your patience :p) @elexisvenator!

Fix incorrectly named column table_type

03bc581

Cleaned up filtering to be in line with information_schema logic

38c2d82

drewbanin changed the base branch from dev/wilt-chamberlain to dev/0.14.1 July 2, 2019 18:13

Fix table type

2645667

v = view r, f, p = all are different forms of table

cmcarthur requested a review from drewbanin August 1, 2019 15:05

drewbanin added this to the 0.14.1 milestone Aug 4, 2019

drewbanin merged commit b2f2e69 into dbt-labs:dev/0.14.1 Aug 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change postgres `get_catalog` to not use `information_schema` #1540

Change postgres `get_catalog` to not use `information_schema` #1540

elexisvenator commented Jun 14, 2019

elexisvenator commented Jun 14, 2019

elexisvenator commented Jun 14, 2019

drewbanin commented Jun 14, 2019

elexisvenator commented Jun 15, 2019 •

edited

Loading

drewbanin commented Jun 17, 2019

drewbanin commented Jun 27, 2019

elexisvenator commented Jun 27, 2019

elexisvenator commented Jul 1, 2019

drewbanin commented Jul 2, 2019

drewbanin commented Aug 2, 2019

drewbanin commented Aug 4, 2019

Change postgres get_catalog to not use information_schema #1540

Change postgres get_catalog to not use information_schema #1540

Conversation

elexisvenator commented Jun 14, 2019

elexisvenator commented Jun 14, 2019

elexisvenator commented Jun 14, 2019

drewbanin commented Jun 14, 2019

elexisvenator commented Jun 15, 2019 • edited Loading

drewbanin commented Jun 17, 2019

drewbanin commented Jun 27, 2019

elexisvenator commented Jun 27, 2019

elexisvenator commented Jul 1, 2019

drewbanin commented Jul 2, 2019

drewbanin commented Aug 2, 2019

drewbanin commented Aug 4, 2019

Change postgres `get_catalog` to not use `information_schema` #1540

Change postgres `get_catalog` to not use `information_schema` #1540

elexisvenator commented Jun 15, 2019 •

edited

Loading