Fix: column order for incremental insert overwrite #60

jtcohen6 · 2020-03-09T01:39:55Z

Fixes Incremental insert overwrite requires identical column ordering #59
Plays nicely with Pull the owner from the DESCRIBE EXTENDED #39 (passing integration tests)

Small caveat

In #39, we change spark__get_columns_in_relation from describe to describe extended. Spark has the interesting behavior of including a table's partition column(s) twice in the output, once among the set of columns and again in the # Partition Information section.

describe extended dbt_jcohen.incremental_relation

col_name	data_type	comment
first_name	string	null
last_name	string	null
email	string	null
gender	string	null
ip_address	string	null
id	bigint	null
# Partition Information
# col_name	data_type	comment
id	bigint	null

# Detailed Table Information
...	...	...

Thus, get_columns_in_relation returns

`first_name`, `last_name`, `email`, `gender`, `ip_address`, `id`, `id`

And the incremental insert overwrite fails.

I see two potential approaches:

Solve this particular problem by using the | unique Jinja filter here
Solve for the general case by updating this line to stop parsing rows at the first empty row or at the # Partition Information meta row, and thereby avoid double-counting partition column(s)

I think the latter solution is better, and I've commented on #39 to that effect. I'll wait for that update before pulling pr/39 into this branch and getting a clean integration test on this PR. At that point, we can merge #39 and this PR immediately following.

… fix/incremental-col-order

Use dest_cols_csv for correct order

9f77ae1

Fokko mentioned this pull request Mar 16, 2020

Pull the owner from the DESCRIBE EXTENDED #39

Merged

jtcohen6 added 2 commits March 16, 2020 16:23

Merge branch 'master' of github.com:fishtown-analytics/dbt-spark into…

9000dab

… fix/incremental-col-order

Retry tests

698e0b0

jtcohen6 merged commit 8a5168a into master Mar 16, 2020

jtcohen6 deleted the fix/incremental-col-order branch March 16, 2020 21:25

This was referenced Mar 18, 2020

0.15.0 upgrade #46

Merged

upgrade to dbt-core==0.16.0 #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: column order for incremental insert overwrite #60

Fix: column order for incremental insert overwrite #60

jtcohen6 commented Mar 9, 2020 •

edited

Loading

Fix: column order for incremental insert overwrite #60

Fix: column order for incremental insert overwrite #60

Conversation

jtcohen6 commented Mar 9, 2020 • edited Loading

Small caveat

jtcohen6 commented Mar 9, 2020 •

edited

Loading