Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix!: bump sqlglot to v25.29.0, fix info schema view handling in bigquery #3332

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

georgesittas
Copy link
Contributor

Fixes #3317

The goal of this PR is to enable SQLMesh to correctly handle information schema view references in BigQuery. The main problem with those until now was that, in their fully-qualified form, they comprised 4 identifiers:

project.dataset_or_region.INFORMATION_SCHEMA.SOME_VIEW

This means that we'd end up with Table references of mixed nesting, e.g. model names comprise 3 identifiers:

project.dataset.model_name

Mixing multiple nesting levels in table references is prohibited by SQLGlot's schema module [1, 2], in order to avoid issues related to ambiguity. So, one workaround for that was to represent information schema views using 3 identifiers at parse time, only for BigQuery. Other engines don't allow >3 identifiers in their table references based on my investigation.

I went with this approach because making the schema module more lenient, i.e. allowing multiple nesting depths, was quite complex. We rely on the invariant that the depth is the same in several places and the scope of the current approach seemed way smaller in comparison.

I guess one thing we'll need to be careful about is that parsing BigQuery's information schema views without specifying the dialect can result in an incorrect AST representation, because we represent the first example with the 4 identifiers using a Dot instead of merging the last two parts into a single Identifier, as done in BigQuery's parser.

For additional context, please refer to:

@georgesittas georgesittas requested review from tobymao and a team November 5, 2024 20:00
Copy link
Collaborator

@erindru erindru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm.. amazed this works

@georgesittas georgesittas merged commit 6af38f6 into main Nov 6, 2024
23 checks passed
@georgesittas georgesittas deleted the jo/bump_sqlglot_to_v25_29_0 branch November 6, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot generate external model for BigQuery information schema
3 participants