Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(duckdb)!: Fix STRUCT cast generation #4366

Merged
merged 1 commit into from
Nov 11, 2024
Merged

Conversation

VaggelisD
Copy link
Collaborator

Fixes #4365

BigQuery supports inline STRUCT constructors such as the following:

bq> SELECT STRUCT<a STRING, b INTEGER>('str', 1) AS strct;
strct.a   strct.b
'str'        1

#3751 introduced support in SQLGlot by canonicalizing those expressions to a CAST:

>> sqlglot.parse_one("SELECT STRUCT<a STRING, b INTEGER>('str', 1)", read="bigquery").sql("bigquery")
`CAST(STRUCT('str', 1) AS STRUCT<a STRING, b INTEGER>)`

Note thought that transpilation towards DuckDB was still not complete as it requires explicit key-value pairs to construct STRUCTs:

D SELECT CAST(STRUCT_PACK('str', 1) AS STRUCT("a" TEXT, "b" INT));
Binder Error: Need named argument for struct pack, e.g. STRUCT_PACK(a := b)

To solve this issue efficiently, the PR was extended so that DuckDB can understand when a STRUCT was constructed under a STRUCT cast and turn that into CAST(ROW(...) AS STRUCT(...) instead:

D select CAST(ROW('str', 1) AS STRUCT("a" TEXT, "b" INT));
┌────────────────────────────────────────────────────────────┐
│ CAST(main."row"('str', 1) AS STRUCT(a VARCHAR, b INTEGER)) │
│                struct(a varchar, b integer)                │
├────────────────────────────────────────────────────────────┤
│ {'a': str, 'b': 1}                                         │
└────────────────────────────────────────────────────────────┘

However, #4365 shows how this is prone to errors if the ROW field def order does not match with the cast:

D SELECT CAST(ROW(1, 'foo') AS STRUCT("s" TEXT, "i" INT));
Conversion Error: Could not convert string 'foo' to INT32

To mitigate this issue, the #3751 solution can be further limited to generate a ROW only if the STRUCT being casted does not have key - value pairs; This seems enough to (1) preserve existing BQ -> DuckDB transpilation cases and (2) restore DuckDB's roundtrips which are order-agnostic:

D SELECT CAST({'i':1, 's':'foo'} AS STRUCT("s" TEXT, "i" INT));
┌────────────────────────────────────────────────────────────────────────────┐
│ CAST(main.struct_pack(i := 1, s := 'foo') AS STRUCT(s VARCHAR, i INTEGER)) │
│                        struct(s varchar, i integer)                        │
├────────────────────────────────────────────────────────────────────────────┤
│ {'s': foo, 'i': 1}                                                         │
└────────────────────────────────────────────────────────────────────────────┘

@georgesittas georgesittas merged commit 8f5e70f into main Nov 11, 2024
6 checks passed
@georgesittas georgesittas deleted the vaggelisd/ddb_struct branch November 11, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG(duckdb): Compiling STRUCTs throws away field name info, which is sometimes needed
2 participants