Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: column order for incremental insert overwrite #60

Merged
merged 3 commits into from
Mar 16, 2020

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Mar 9, 2020

Small caveat

In #39, we change spark__get_columns_in_relation from describe to describe extended. Spark has the interesting behavior of including a table's partition column(s) twice in the output, once among the set of columns and again in the # Partition Information section.

describe extended dbt_jcohen.incremental_relation
col_name data_type comment
first_name string null
last_name string null
email string null
gender string null
ip_address string null
id bigint null
# Partition Information    
# col_name data_type comment
id bigint null
     
# Detailed Table Information    
... ... ...

Thus, get_columns_in_relation returns

`first_name`, `last_name`, `email`, `gender`, `ip_address`, `id`, `id`

And the incremental insert overwrite fails.

I see two potential approaches:

  • Solve this particular problem by using the | unique Jinja filter here
  • Solve for the general case by updating this line to stop parsing rows at the first empty row or at the # Partition Information meta row, and thereby avoid double-counting partition column(s)

I think the latter solution is better, and I've commented on #39 to that effect. I'll wait for that update before pulling pr/39 into this branch and getting a clean integration test on this PR. At that point, we can merge #39 and this PR immediately following.

@jtcohen6 jtcohen6 merged commit 8a5168a into master Mar 16, 2020
@jtcohen6 jtcohen6 deleted the fix/incremental-col-order branch March 16, 2020 21:25
This was referenced Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incremental insert overwrite requires identical column ordering
1 participant