-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support schema evolution for delta lake #162
Comments
I think it was implemented by means of |
I'm using the latest dbt-spark 0.19.1 with merge incremental, and I still have the same issues reported in #125. when the select statement returns more columns, nothing is changed on the target table and it returns as successful. |
Hi @laiyuanliu Thanks for opening this ticket. It depends on what your aim is. The Schema evolution for full refreshes (i.e. non-incremental mode) is implemented in #125 in an atomic way. So we don't first drop the table, and then recreate the table. Otherwise, the table will be unavailable while recreating the table. However, when using the
I would lean to the second one, this is more predictable and fits in nicely with the ELT way of thinking. |
Thanks @laiyuanliu. I wasn't aware of that feature, thanks for pointing it out. You should be able to test this using:
Can you verify if this works? We could also integrate this easily into the codebase. Adding a block like:
|
set spark.databricks.delta.schema.autoMerge.enabled=true in pre-hook works like a charm. with this setting, in incremental mode, new columns will be automatically added with default value null for the non-modified records. more interestingly, if we remove some columns in our DBT code: I feel this is perfect. will go ahead close the issue. thanks all for the help |
I have added the pre_hook but in incremental mode, new columns are not being added. |
Describe the feature
We are using DBT+Spark on Delta for incremental load. As we are getting data from various sources, one of the key features is to be able to support schema evolution. Delta lake does support it with the merge command as documented here
Can this be supported by DBT?
Describe alternatives you've considered
the current alternative is refresh all, this has too issues:
Who will this benefit?
I saw another issue #124 that were submitted for the similar case, but it was closed for some reason. supporting schema evolution with Delta will be extremely helpful for anyone who is using Delta incremental strategy.
The text was updated successfully, but these errors were encountered: