-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental models do not fail when schema changes #226
Comments
Worth noting that I discovered #198 after I submitted this, so can see that on_schema_change is not supported yet, but maybe there is still a bug here in terms of the inconsistent default behavior of what incrementals do when a schema change does occur. |
@jnatkins Thanks for opening. The main reason that we're missing this functionality in dbt-spark==0.21.0 is that we need to call a few more macros within the spark-specific As far why this isn't in place yet, and the reason for my hesitation in #198: I think you're on the money. The default behavior is inconsistent between other databases and Spark/Databricks (specifically Delta). The reason is that On Delta, we can run: merge into <target>
using <source>
on <condition>
when matched then update set *
when not matched then insert * Those So, on most other databases, the real Next stepsHere's where I'm arriving at, conceptually, for the different
Based on the way we implemented What do you think? |
@jtcohen6 The mention of spark.databricks.delta.schema.autoMerge.enabled is definitely worth thinking through, but what I've seen on some other tickets is that that config is session-based, and adding it as a pre-hook for a model does not seem to actually persist the function when the table is created. I'm not sure if there's a Spark-specific config option to turn this on for the actual connection that executes the DDL, but that would potentially be a solution there. You can see #162 and #217 for more context there. Anecdotally, I've heard the same complaint from some other users that I've been working with. Is there a correct way to handle? I'm open to either of these options (implementing something that creates parity between function on other warehouses -- this is probably a good idea since dbt provides a useful abstraction layer that obviates code changes when migrating platforms/future-proofing -- or solving at the platform level via first-class support for autoMerge in dbt-spark) |
Yes, this is something we need to do a much better job documenting. It sounds like the
|
Describe the bug
In other dbt adapters, when the schema changes for a source query of an incremental model, the model fails, requiring a
--full-refresh
(or alternatively, in 0.21, some handling withon_schema_change
). However, in the Spark adapter, it appears that the incremental materialization does not check the schema, and so has no opportunity to handle a schema change.Steps To Reproduce
As a relatively trivial example, I have a source for my incremental model:
The actual incremental model looks like this:
If I run this, using
dbt run -m incremental_source+
, the model is created as expected the first time. Now, let's make a slight modification to theincremental_source.sql
file:All I've done is add a new_col column to each record. In other adapters, running this would cause the incremental to fail. In dbt-spark, it succeeds, and just ignores the change entirely. The result is that you get differing behavior in Spark from other relational warehouses, and on_schema_change has no effect.
Expected behavior
Incremental materializations fail, by default, in the event of a source query schema change.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
The output of
dbt --version
:0.21.0
The operating system you're using:
dbt Cloud
The output of
python --version
:N/A
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: