-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(bigquery)!: parse information schema views into a single identifier #4336
Fix(bigquery)!: parse information schema views into a single identifier #4336
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m mostly familiar with MySQL’s information_schema. In MySQL, information_schema is a database, so I think this would be wrong?
Does this PR imply that Bigquery creates default information_schema tables in each database? And if so, I wonder how common that is.
Ah, got it - thanks for catching this. I did assume there would be an information schema under each db, but I'll double-check this assumption. Regarding BigQuery, yes, they need the information schema to be qualified with a region or dataset:
This means you can have up to 4 parts in a Table reference, if you were to include both the region and the project name. |
dea8e76
to
378efbb
Compare
@barakalon it turns out your hunch was right, none of the other engines treats the information schema views like BigQuery does, so this fix really needs to be specific to it. Thanks for catching this, should've investigated more thoroughly. |
There's some more work to be done here around 4-part table references in the schema module, because right now we're expecting the references to have the same depth but information schema views break this invariant. |
378efbb
to
143924e
Compare
de57202
to
6d12372
Compare
6d12372
to
00da5bd
Compare
'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA"."COLUMNS" AS "COLUMNS") SELECT "x"."a" AS "a" FROM "x" AS "x"', | ||
'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA.COLUMNS" AS "columns") SELECT "x"."a" AS "a" FROM "x" AS "x"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be safe because BigQuery aliases are case-insensitive.
Behavior in main today:
This PR makes it so that we will always represent
INFORMATION_SCHEMA.X
views using a singleIdentifier
expression for BigQuery. The motivation here is that, previously, we wouldn't properly qualify information schema references, because we'd think there's a db/catalog present.Choosing
Identifier
over aDot
is motivated by the fact that we the schema module can't handle multiple mapping depths, so this should keep the nesting consistent even if there are information schema view references.Xref: TobikoData/sqlmesh#3317