Fix!: bump sqlglot to v25.29.0, fix info schema view handling in bigquery #3332
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3317
The goal of this PR is to enable SQLMesh to correctly handle information schema view references in BigQuery. The main problem with those until now was that, in their fully-qualified form, they comprised 4 identifiers:
This means that we'd end up with
Table
references of mixed nesting, e.g. model names comprise 3 identifiers:Mixing multiple nesting levels in table references is prohibited by SQLGlot's schema module [1, 2], in order to avoid issues related to ambiguity. So, one workaround for that was to represent information schema views using 3 identifiers at parse time, only for BigQuery. Other engines don't allow >3 identifiers in their table references based on my investigation.
I went with this approach because making the schema module more lenient, i.e. allowing multiple nesting depths, was quite complex. We rely on the invariant that the depth is the same in several places and the scope of the current approach seemed way smaller in comparison.
I guess one thing we'll need to be careful about is that parsing BigQuery's information schema views without specifying the dialect can result in an incorrect AST representation, because we represent the first example with the 4 identifiers using a
Dot
instead of merging the last two parts into a singleIdentifier
, as done in BigQuery's parser.For additional context, please refer to: