-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change postgres get_catalog
to not use information_schema
#1540
Change postgres get_catalog
to not use information_schema
#1540
Conversation
- `information_schema` in Postgres is not very performant due to the complex views used to create it - use underlying `pg_catalog` tables/views instead - returns the same rows/columns as the `information_schema` version - order of rows is different, this is because there was only a partial sort on the `information_schema` version - `column_type` will return different values to before - some arrays were `ARRAY`, will now be `type[]` - user-defined types were previously `USER_DEFINED`, now will be the name of the user-defined type <-- main point of this PR - performance is 2-5x faster, depending on query caching
No sure what is failing in tests, looks like a date formatting issue? Doesn't seem related to the code I touched. |
Thanks for the PR @elexisvenator - this looks really great! Don't worry about the integration tests - we'll kick those off for you. I'd need to do some searching here, but I recall reading that different tables/views are present in I'd like to play around with this one a little bit, but this looks very, very good :). Thanks again! |
It's interesting to check the where clauses of the other views to see how they work. For comparison, here is WHERE ((pg_class.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"]))
AND (NOT pg_is_other_temp_schema(npg_class.oid))
AND ( pg_has_role(pg_class.relowner, 'USAGE'::text)
OR has_table_privilege(pg_class.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text)
OR has_any_column_privilege(pg_class.oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text))); and here is the -- pg_catalog.pg_tables
WHERE (pg_class.relkind = ANY (ARRAY['r'::"char", 'p'::"char"]));
-- pg_catalog.pg_views
WHERE (pg_class.relkind = 'v'::"char"); So information_schema includes foreign tables, but then also does a bunch of privilege checks that the pg_catalog views dont. a relkind of Happy to go back and add the permission checks and tweaks |
I spent some time playing around with this and noticed that there are columns in the response that have names like |
@elexisvenator I'd love to target this for a patch release like 0.14.1 -- is this something you think you'll have time to iterate on? I think it's very, very close, and @beckjake and I are big fans of your implementation :) |
Sure, ill tweak it on the weekend :) I suspect the |
So ran some more tests, these changes should stop the deleted columns appearing. The one big thing that the |
Thanks @elexisvenator - I buy all of this. You're right - dbt will match up the relations it finds in the catalog with the models/sources/seeds/etc that exist in the dbt project, so I don't think visibility into the existence of objects that a user can't access should be a problem. Let me kick off the tests here, then we should be good to merge this. We'll probably want to change the target branch - let me chat with @beckjake about that one. |
v = view r, f, p = all are different forms of table
@elexisvenator I lost track of this one! Kicked off the tests here. When they pass, this will be ready to roll for (hopefully!) 0.14.1 :) |
Merging this for dbt v0.14.1 -- thanks for your contribution (and your patience :p) @elexisvenator! |
information_schema
in Postgres is not very performant due to the complex views used to create itpg_catalog
tables/views insteadinformation_schema
versioninformation_schema
versioncolumn_type
will return different values to beforeARRAY
, will now be{type}[]
USER_DEFINED
, now will be the name of the user-defined type^ main point of this PR