-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add snapshot support #66
Conversation
tongqqiu
commented
Mar 20, 2020
- cast as string (the base on use varchar, which is not working on spark)
- snapshot only works on delta format
- slightly change the based implementation to fit delta merging needs.
- making snapshot working with delta lake
…pr/support_snapshot # Conflicts: # dbt/adapters/spark/impl.py # dbt/include/spark/macros/materializations/seed.sql
a24e4e3
to
9b13d12
Compare
@tongqqiu This is really cool! I'm going to take your code for a spin over the next few days. It would be awesome to ship this as part of a 0.16.0 release. |
There's two categories of code addition going on here:
Specific problemThere are two issues with the
The solution in this PR is to create two separate temp tables, one for updates and one for inserts, and then perform two While I can't determine the exact implementation right now, if we're already overwriting a bunch of code, I wonder if we don't instead create a unioned table of all updates + inserts, and then perform a single (atomic) merge into {{ target }} as DBT_INTERNAL_DEST
using {{ source.include(schema=false) }} as DBT_INTERNAL_SOURCE
on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id
when matched
and DBT_INTERNAL_DEST.dbt_valid_to is null
and DBT_INTERNAL_SOURCE.dbt_change_type = 'update'
then update
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to
when not matched
and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
then insert * General problemHow can we override these macros?
Right now, the only sane way to do this is by adding a prefix (e.g. I'd be interested to talk through a better solution for overriding parts (but not all) of the snapshot macro-stack, working toward something that could be more easily generalized to other plugins. Working versionI was able to get the code in this PR working on 0.16.0 by:
I think, no matter how we implement this, we should ensure that the user knows early and often this is Delta-only functionality. Part of that looks like updating the README, part of it looks like raising a compilation error if the snapshot's |
Suggestion: Name spark-unique macros that aren't intentionally overriding a core |
@tongqqiu Are you up for making the few small changes to this PR required to get it working? I believe it would just involve:
@beckjake As far as how to add a test for Spark snapshots, I thinking that for now it's something we could add to the |
@jtcohen6 should you help with those changes? |
I'm happy to take it from here, if that's okay with you! |
@jtcohen6 yes. That would be great! |