Fix: column order for incremental insert overwrite #60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Small caveat
In #39, we change
spark__get_columns_in_relation
fromdescribe
todescribe extended
. Spark has the interesting behavior of including a table's partition column(s) twice in the output, once among the set of columns and again in the# Partition Information
section.Thus,
get_columns_in_relation
returnsAnd the incremental
insert overwrite
fails.I see two potential approaches:
| unique
Jinja filter hererows
at the first empty row or at the# Partition Information
meta row, and thereby avoid double-counting partition column(s)I think the latter solution is better, and I've commented on #39 to that effect. I'll wait for that update before pulling
pr/39
into this branch and getting a clean integration test on this PR. At that point, we can merge #39 and this PR immediately following.