Error in incremental models when using struct columns #202

SCouto · 2021-08-13T13:02:20Z

Describe the bug

Hi everyone

I think I've found an issue when using structs and incremental model

Find below a create table statement from the source table

CREATE TABLE `<someSchema>`.`<sourceTable>` (
  `properties` STRUCT<`site`: STRING>,
  `channel` STRING,
  `timestamp` STRING,
  `anotherDate` STRING,
  `aDate ` STRING)
  USING parquet
  PARTITIONED BY (aDate)
  LOCATION 's3a://<someBucket>'

If the model moves the struct field as is to the sink table (with a simple select). It works the first time, but fails the second one

Here is the create table of the example sink table

CREATE TABLE `<someSchema>`.`dbtsink` (
      `properties` STRUCT<`site`: STRING>,
      `channel` STRING,
      `timestamp` STRING,
      `anotherDate ` STRING,
      `aDate` STRING)
USING parquet
PARTITIONED BY (anotherDate)

As I said, second execution raise this error:

Runtime Error in model dbtsink (models/anotherDate/dbtsink.sql)
  Database Error
    Error running query: org.apache.spark.sql.AnalysisException: cannot resolve '`site`' given input columns: [dbtsink__dbt_tmp.channel, dbtsink__dbt_tmp.anotherDate, dbtsink__dbt_tmp.aDate, dbtsink__dbt_tmp.properties, dbtsink__dbt_tmp.timestamp]; line 4 pos 25;
    'InsertIntoStatement 'UnresolvedRelation [someSchema, dbtsink], false, false
    +- 'Project [properties#6526, 'site, channel#6527, timestamp#6528, aDate#6541, anotherDate#6540]
       +- SubqueryAlias dbtsink__dbt_tmp
          +- Project [properties#6526, channel#6527, timestamp#6528, anotherDate#6540, aDate#6541]
             +- Filter (((aDate#6541 > 2021060100) AND (aDate#6541 <= 2021070609)) AND (anotherDate#6540 = 2021070609))
                +- SubqueryAlias spark_catalog.someSchema.sourceTable
                   +- Relation[context#6524,traits#6525,properties#6526,channel#6527,timestamp#6528,projectId#6529,integrations#6530,messageId#6531,originalTimestamp#6532,receivedAt#6533,sentAt#6534,userId#6535,anonymousId#6536,type#6537,providerId#6538,version#6539,anotherDate#6540,aDate#6541] parquet

Steps To Reproduce

Create table with struct is shown in the section above
Populate that table and run a model as follows twice
First run will work just fine, second one it's where the error is found

select 
properties,
channel,
timestamp,
anotherDate,
aDate
from {{ source('someSchema', 'sourceTable') }}
where aDate > '{{ var("aDateLowerLimit") }}' and aDate <= '{{ var("aDateUpperLimit") }}'
and anotherDate = '{{ var("anotherDate") }}'

Expected behavior

It should move data to the corresponding partition without raising an error

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

The output of dbt --version:

installed version: 0.20.1
   latest version: 0.20.1

Up to date!

Plugins:
  - bigquery: 0.20.1
  - snowflake: 0.20.1
  - redshift: 0.20.1
  - postgres: 0.20.1
  - spark: 0.20.1

The operating system you're using:
Mac Os

The output of python --version:

python --version
Python 3.9.6

Additional context

After checking the issue I think problem may be here which is using a regex to parse column names
https://github.com/dbt-labs/dbt-spark/blob/master/dbt/adapters/spark/impl.py#L222-L249

the regex

    INFORMATION_COLUMNS_REGEX = re.compile(
        r"\|-- (.*): (.*) \(nullable = (.*)\b", re.MULTILINE)

It's retrieving all the columns, even the inner ones from the struct

I'm working on a fix already.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-08-13T13:11:16Z

Nice digging @SCouto! And thanks for working on the fix :)

SCouto added bug Something isn't working triage labels Aug 13, 2021

jtcohen6 removed the triage label Aug 13, 2021

This was referenced Aug 16, 2021

Feature/parse struct fields #203

Closed

fix issue parsing structs #204

Merged

jtcohen6 closed this as completed in #204 Aug 23, 2021

jtcohen6 mentioned this issue Oct 13, 2021

get_columns_in_relation return all elements in a field with struct datatype #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in incremental models when using struct columns #202

Error in incremental models when using struct columns #202

SCouto commented Aug 13, 2021

jtcohen6 commented Aug 13, 2021

Error in incremental models when using struct columns #202

Error in incremental models when using struct columns #202

Comments

SCouto commented Aug 13, 2021

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

Additional context

jtcohen6 commented Aug 13, 2021