Fix(bigquery)!: parse information schema views into a single identifier #4336

georgesittas · 2024-11-02T08:55:35Z

Behavior in main today:

>>> from sqlglot import parse_one
>>> parse_one("select * from db.information_schema.tables", "bigquery")
Select(
  expressions=[
    Star()],
  from=From(
    this=Table(
      this=Identifier(this=tables, quoted=False),
      db=Identifier(this=information_schema, quoted=False),
      catalog=Identifier(this=db, quoted=False))))
>>> parse_one("select * from c.db.information_schema.tables", "bigquery")
Select(
  expressions=[
    Star()],
  from=From(
    this=Table(
      this=Dot(
        this=Identifier(this=information_schema, quoted=False),
        expression=Identifier(this=tables, quoted=False)),
      db=Identifier(this=db, quoted=False),
      catalog=Identifier(this=c, quoted=False))))

This PR makes it so that we will always represent INFORMATION_SCHEMA.X views using a single Identifier expression for BigQuery. The motivation here is that, previously, we wouldn't properly qualify information schema references, because we'd think there's a db/catalog present.

Choosing Identifier over a Dot is motivated by the fact that we the schema module can't handle multiple mapping depths, so this should keep the nesting consistent even if there are information schema view references.

Xref: TobikoData/sqlmesh#3317

barakalon

I’m mostly familiar with MySQL’s information_schema. In MySQL, information_schema is a database, so I think this would be wrong?

Does this PR imply that Bigquery creates default information_schema tables in each database? And if so, I wonder how common that is.

georgesittas · 2024-11-02T12:30:04Z

I’m mostly familiar with MySQL’s information_schema. In MySQL, information_schema is a database, so I think this would be wrong?

Does this PR imply that Bigquery creates default information_schema tables in each database? And if so, I wonder how common that is.

Ah, got it - thanks for catching this. I did assume there would be an information schema under each db, but I'll double-check this assumption.

Regarding BigQuery, yes, they need the information schema to be qualified with a region or dataset:

An INFORMATION_SCHEMA view needs to be qualified with a dataset or region.

This means you can have up to 4 parts in a Table reference, if you were to include both the region and the project name.

georgesittas · 2024-11-04T15:18:26Z

@barakalon it turns out your hunch was right, none of the other engines treats the information schema views like BigQuery does, so this fix really needs to be specific to it. Thanks for catching this, should've investigated more thoroughly.

georgesittas · 2024-11-04T15:20:48Z

There's some more work to be done here around 4-part table references in the schema module, because right now we're expecting the references to have the same depth but information schema views break this invariant.

georgesittas · 2024-11-05T13:27:30Z

tests/test_optimizer.py

-            'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA"."COLUMNS" AS "COLUMNS") SELECT "x"."a" AS "a" FROM "x" AS "x"',
+            'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA.COLUMNS" AS "columns") SELECT "x"."a" AS "a" FROM "x" AS "x"',


This change should be safe because BigQuery aliases are case-insensitive.

georgesittas requested review from tobymao, barakalon and VaggelisD November 2, 2024 08:55

georgesittas mentioned this pull request Nov 2, 2024

Fix: enable fetching schema for models querying INFORMATION_SCHEMA TobikoData/sqlmesh#3324

Merged

barakalon reviewed Nov 2, 2024

View reviewed changes

georgesittas marked this pull request as draft November 2, 2024 12:30

Fix(parser)!: always parse INFORMATION_SCHEMA.X table ref into a dot

9a23e28

georgesittas changed the title ~~Fix(parser)!: always parse INFORMATION_SCHEMA.X table ref into a dot~~ Fix(parser)!: parse INFORMATION_SCHEMA.X table ref into a dot Nov 4, 2024

PR feedback

378efbb

georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch from dea8e76 to 378efbb Compare November 4, 2024 15:17

georgesittas marked this pull request as ready for review November 4, 2024 15:17

georgesittas changed the title ~~Fix(parser)!: parse INFORMATION_SCHEMA.X table ref into a dot~~ Fix(bigquery)!: parse INFORMATION_SCHEMA.X table ref into a dot Nov 4, 2024

georgesittas requested a review from barakalon November 4, 2024 15:19

barakalon approved these changes Nov 4, 2024

View reviewed changes

georgesittas added 3 commits November 4, 2024 19:37

Refactor: produce a single identifeir for information schema view

faff23e

Fix(parser)!: always parse INFORMATION_SCHEMA.X table ref into a dot

fd7a845

PR feedback

143924e

georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch from 378efbb to 143924e Compare November 4, 2024 17:44

georgesittas changed the title ~~Fix(bigquery)!: parse INFORMATION_SCHEMA.X table ref into a dot~~ Fix(bigquery)!: parse information schema views into a single identifier Nov 5, 2024

georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch 2 times, most recently from de57202 to 6d12372 Compare November 5, 2024 13:22

Refactor by representing info schema views using a single identifier

00da5bd

georgesittas force-pushed the jo/make_information_schema_parsing_consistent branch from 6d12372 to 00da5bd Compare November 5, 2024 13:24

georgesittas commented Nov 5, 2024

View reviewed changes

barakalon approved these changes Nov 5, 2024

View reviewed changes

georgesittas merged commit 84f78aa into main Nov 5, 2024
6 checks passed

georgesittas deleted the jo/make_information_schema_parsing_consistent branch November 5, 2024 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(bigquery)!: parse information schema views into a single identifier #4336

Fix(bigquery)!: parse information schema views into a single identifier #4336

georgesittas commented Nov 2, 2024 •

edited

Loading

barakalon left a comment

georgesittas commented Nov 2, 2024 •

edited

Loading

georgesittas commented Nov 4, 2024 •

edited

Loading

georgesittas commented Nov 4, 2024 •

edited

Loading

georgesittas Nov 5, 2024

		'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA"."COLUMNS" AS "COLUMNS") SELECT "x"."a" AS "a" FROM "x" AS "x"',
		'WITH "x" AS (SELECT "y"."a" AS "a" FROM "DB"."y" AS "y" CROSS JOIN "a"."b"."INFORMATION_SCHEMA.COLUMNS" AS "columns") SELECT "x"."a" AS "a" FROM "x" AS "x"',

Fix(bigquery)!: parse information schema views into a single identifier #4336

Fix(bigquery)!: parse information schema views into a single identifier #4336

Conversation

georgesittas commented Nov 2, 2024 • edited Loading

barakalon left a comment

Choose a reason for hiding this comment

georgesittas commented Nov 2, 2024 • edited Loading

georgesittas commented Nov 4, 2024 • edited Loading

georgesittas commented Nov 4, 2024 • edited Loading

georgesittas Nov 5, 2024

Choose a reason for hiding this comment

georgesittas commented Nov 2, 2024 •

edited

Loading

georgesittas commented Nov 2, 2024 •

edited

Loading

georgesittas commented Nov 4, 2024 •

edited

Loading

georgesittas commented Nov 4, 2024 •

edited

Loading