-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
account for changing column types in incremental tables #175
Comments
I don't believe that running python in the middle of a transaction means we need to "split" it... I agree that there is complexity involved in the running of this, and that it's similar to what we've spoken about before where a single operation can be split across multiple nodes in the graph. I think the important thing to do there is to make it such that multiple nodes can participate in a single transaction. |
(otherwise, I'm 👍 on everything you outlined) |
0.5.1 release |
* Add a compare_columns arg to the equality test The `compare_columns` argument allows a user to specify a subset of columns to compare when checking the equality of two models' data. There are a few shortcomings of the previous implementation that this helps to address. With the previous implementation: - You cannot compare two ephemeral models, as the comparison depends on the ability to introspect relation. - If two models differ in a predictable and expected way (i.e., the addition of a surrogate key column), you would have to create an additional materialized model that selects all but the added column in order to compare equality. * Update documentation for equality test
Incremental tables are a powerful dbt feature, but there's at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a
varchar(16)
field which is inserted into avarchar(8)
field.In practice, this error looks like:
The recommended solution, currently, is to change the model materialization to
table
, re-run dbt, then change it back toincremental
.One possible solution is to the use temp table generated by dbt during incremental model updates. The types of every temp table column can be compared to the types of every existing table column. If the types differ, then dbt can alter the offending column to the type of temp table column.
This is kind of tricky in practice, because it could be that the existing column is
varchar(8)
while the new column isvarchar(1)
... Some logic is required to ensure that columns are not altered to more-restrictive data types.Additionally, all incremental model operations currently occur within a single transaction. If we were to implement this proposed solution, we'd need to run Python code in between the temp table phase and the delete/insert phase of the incremental model update, thus splitting up the transaction.
CC @jthandy
The text was updated successfully, but these errors were encountered: