Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column names in CTE names are not preserved during model parse #3556

Open
Bilbottom opened this issue Dec 22, 2024 · 1 comment
Open

Column names in CTE names are not preserved during model parse #3556

Bilbottom opened this issue Dec 22, 2024 · 1 comment
Labels
Bug Something isn't working

Comments

@Bilbottom
Copy link

Bilbottom commented Dec 22, 2024

Summary

In many SQL dialects, column names can be specified as part of the table alias.

SQLMesh does not preserve the column aliases when specified in CTE names, so model parsing fails when the column names are defined here.

Note that SQLMesh does preserve column aliases in subquery and table aliases.

MWE (DuckDB)

The examples below are run with the following environment:

  • OS: Windows 10
  • Python: 3.11.5
  • SQLMesh: 0.141.1
  • DuckDB: 1.1.3

A failing example

Here's an example where the column names are lost during the parse, so model evaluation fails:

model (name test.model);
with

l(id, val_l) as (values (1, 'a')),
r(id, val_r) as (values (1, 'X'))

from l inner join r using (id)

The CTE names specify the column names, and DuckDB would return the following table for this query:

id val_l val_r
1 a X

However, running sqlmesh evaluate test.model throw the following error:

...

duckdb.duckdb.BinderException: Binder Error: Values list "l" does not have a column named "id"
LINE 1: ...OM "l" AS "l" INNER JOIN "r" AS "r" ON "l"."id" = "r"."id" LIMIT 1000

This is because the rendered version of the model does not have the column names in the CTE names; the output of sqlmesh render test.model is:

WITH "l" AS (
  VALUES
    (1, 'a')
), "r" AS (
  VALUES
    (1, 'X')
)
SELECT
  *
FROM "l" AS "l"
INNER JOIN "r" AS "r"
  ON "l"."id" = "r"."id"

Two passing examples

As mentioned above, SQLMesh does preserve column names when specified as a subquery/table alias. Here are two examples that work as expected (included just for comparison/completeness):

from (values (1, 'a')) as l(id, val_l)
    inner join (values (1, 'X')) as r(id, val_r)
        using (id)
with

l as (values (1, 'a')),
r as (values (1, 'X'))

from l as l(id, val_l)
    inner join r as r(id, val_r)
        using (id)
@treysp
Copy link
Contributor

treysp commented Dec 22, 2024

Hello - thanks for reporting this, looks like we need to update this behavior in sqlglot. cc @georgesittas

@treysp treysp added the Bug Something isn't working label Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants