Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix(bigquery)!: parse information schema views into a single identifier #4336

Merged
merged 6 commits into from
Nov 5, 2024

Conversation

georgesittas
Copy link
Collaborator

@georgesittas georgesittas commented Nov 2, 2024

Behavior in main today:

>>> from sqlglot import parse_one
>>> parse_one("select * from db.information_schema.tables", "bigquery")
Select(
  expressions=[
    Star()],
  from=From(
    this=Table(
      this=Identifier(this=tables, quoted=False),
      db=Identifier(this=information_schema, quoted=False),
      catalog=Identifier(this=db, quoted=False))))
>>> parse_one("select * from c.db.information_schema.tables", "bigquery")
Select(
  expressions=[
    Star()],
  from=From(
    this=Table(
      this=Dot(
        this=Identifier(this=information_schema, quoted=False),
        expression=Identifier(this=tables, quoted=False)),
      db=Identifier(this=db, quoted=False),
      catalog=Identifier(this=c, quoted=False))))

This PR makes it so that we will always represent INFORMATION_SCHEMA.X views using a single Identifier expression for BigQuery. The motivation here is that, previously, we wouldn't properly qualify information schema references, because we'd think there's a db/catalog present.

Choosing Identifier over a Dot is motivated by the fact that we the schema module can't handle multiple mapping depths, so this should keep the nesting consistent even if there are information schema view references.

Xref: TobikoData/sqlmesh#3317

Copy link
Collaborator

@barakalon barakalon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m mostly familiar with MySQL’s information_schema. In MySQL, information_schema is a database, so I think this would be wrong?

Does this PR imply that Bigquery creates default information_schema tables in each database? And if so, I wonder how common that is.

@georgesittas
Copy link
Collaborator Author

georgesittas commented Nov 2, 2024

I’m mostly familiar with MySQL’s information_schema. In MySQL, information_schema is a database, so I think this would be wrong?

Does this PR imply that Bigquery creates default information_schema tables in each database? And if so, I wonder how common that is.

Ah, got it - thanks for catching this. I did assume there would be an information schema under each db, but I'll double-check this assumption.

Regarding BigQuery, yes, they need the information schema to be qualified with a region or dataset:

An INFORMATION_SCHEMA view needs to be qualified with a dataset or region.

This means you can have up to 4 parts in a Table reference, if you were to include both the region and the project name.

@georgesittas georgesittas marked this pull request as draft November 2, 2024 12:30
@georgesittas georgesittas changed the title Fix(parser)!: always parse INFORMATION_SCHEMA.X table ref into a dot Fix(parser)!: parse INFORMATION_SCHEMA.X table ref into a dot Nov 4, 2024
@georgesittas georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch from dea8e76 to 378efbb Compare November 4, 2024 15:17
@georgesittas georgesittas marked this pull request as ready for review November 4, 2024 15:17
@georgesittas
Copy link
Collaborator Author

georgesittas commented Nov 4, 2024

@barakalon it turns out your hunch was right, none of the other engines treats the information schema views like BigQuery does, so this fix really needs to be specific to it. Thanks for catching this, should've investigated more thoroughly.

@georgesittas georgesittas changed the title Fix(parser)!: parse INFORMATION_SCHEMA.X table ref into a dot Fix(bigquery)!: parse INFORMATION_SCHEMA.X table ref into a dot Nov 4, 2024
@georgesittas
Copy link
Collaborator Author

georgesittas commented Nov 4, 2024

There's some more work to be done here around 4-part table references in the schema module, because right now we're expecting the references to have the same depth but information schema views break this invariant.

@georgesittas georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch from 378efbb to 143924e Compare November 4, 2024 17:44
@georgesittas georgesittas changed the title Fix(bigquery)!: parse INFORMATION_SCHEMA.X table ref into a dot Fix(bigquery)!: parse information schema views into a single identifier Nov 5, 2024
@georgesittas georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch 2 times, most recently from de57202 to 6d12372 Compare November 5, 2024 13:22
@georgesittas georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch from 6d12372 to 00da5bd Compare November 5, 2024 13:24
'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA"."COLUMNS" AS "COLUMNS") SELECT "x"."a" AS "a" FROM "x" AS "x"',
'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA.COLUMNS" AS "columns") SELECT "x"."a" AS "a" FROM "x" AS "x"',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be safe because BigQuery aliases are case-insensitive.

@georgesittas georgesittas merged commit 84f78aa into main Nov 5, 2024
6 checks passed
@georgesittas georgesittas deleted the jo/make_information_schema_parsing_consistent branch November 5, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants