We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi everyone
I think I've found an issue when using structs and incremental model
Find below a create table statement from the source table
CREATE TABLE `<someSchema>`.`<sourceTable>` ( `properties` STRUCT<`site`: STRING>, `channel` STRING, `timestamp` STRING, `anotherDate` STRING, `aDate ` STRING) USING parquet PARTITIONED BY (aDate) LOCATION 's3a://<someBucket>'
If the model moves the struct field as is to the sink table (with a simple select). It works the first time, but fails the second one
Here is the create table of the example sink table
CREATE TABLE `<someSchema>`.`dbtsink` ( `properties` STRUCT<`site`: STRING>, `channel` STRING, `timestamp` STRING, `anotherDate ` STRING, `aDate` STRING) USING parquet PARTITIONED BY (anotherDate)
As I said, second execution raise this error:
Runtime Error in model dbtsink (models/anotherDate/dbtsink.sql) Database Error Error running query: org.apache.spark.sql.AnalysisException: cannot resolve '`site`' given input columns: [dbtsink__dbt_tmp.channel, dbtsink__dbt_tmp.anotherDate, dbtsink__dbt_tmp.aDate, dbtsink__dbt_tmp.properties, dbtsink__dbt_tmp.timestamp]; line 4 pos 25; 'InsertIntoStatement 'UnresolvedRelation [someSchema, dbtsink], false, false +- 'Project [properties#6526, 'site, channel#6527, timestamp#6528, aDate#6541, anotherDate#6540] +- SubqueryAlias dbtsink__dbt_tmp +- Project [properties#6526, channel#6527, timestamp#6528, anotherDate#6540, aDate#6541] +- Filter (((aDate#6541 > 2021060100) AND (aDate#6541 <= 2021070609)) AND (anotherDate#6540 = 2021070609)) +- SubqueryAlias spark_catalog.someSchema.sourceTable +- Relation[context#6524,traits#6525,properties#6526,channel#6527,timestamp#6528,projectId#6529,integrations#6530,messageId#6531,originalTimestamp#6532,receivedAt#6533,sentAt#6534,userId#6535,anonymousId#6536,type#6537,providerId#6538,version#6539,anotherDate#6540,aDate#6541] parquet
select properties, channel, timestamp, anotherDate, aDate from {{ source('someSchema', 'sourceTable') }} where aDate > '{{ var("aDateLowerLimit") }}' and aDate <= '{{ var("aDateUpperLimit") }}' and anotherDate = '{{ var("anotherDate") }}'
It should move data to the corresponding partition without raising an error
If applicable, add screenshots or log output to help explain your problem.
The output of dbt --version:
dbt --version
installed version: 0.20.1 latest version: 0.20.1 Up to date! Plugins: - bigquery: 0.20.1 - snowflake: 0.20.1 - redshift: 0.20.1 - postgres: 0.20.1 - spark: 0.20.1
The operating system you're using: Mac Os
The output of python --version:
python --version
python --version Python 3.9.6
After checking the issue I think problem may be here which is using a regex to parse column names https://github.com/dbt-labs/dbt-spark/blob/master/dbt/adapters/spark/impl.py#L222-L249
the regex
INFORMATION_COLUMNS_REGEX = re.compile( r"\|-- (.*): (.*) \(nullable = (.*)\b", re.MULTILINE)
It's retrieving all the columns, even the inner ones from the struct
I'm working on a fix already.
The text was updated successfully, but these errors were encountered:
Nice digging @SCouto! And thanks for working on the fix :)
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
Describe the bug
Hi everyone
I think I've found an issue when using structs and incremental model
Find below a create table statement from the source table
If the model moves the struct field as is to the sink table (with a simple select). It works the first time, but fails the second one
Here is the create table of the example sink table
As I said, second execution raise this error:
Steps To Reproduce
Expected behavior
It should move data to the corresponding partition without raising an error
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
The output of
dbt --version
:The operating system you're using:
Mac Os
The output of
python --version
:Additional context
After checking the issue I think problem may be here which is using a regex to parse column names
https://github.com/dbt-labs/dbt-spark/blob/master/dbt/adapters/spark/impl.py#L222-L249
the regex
It's retrieving all the columns, even the inner ones from the struct
I'm working on a fix already.
The text was updated successfully, but these errors were encountered: