-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplication when using ReplicatedMergeTree with delete+insert incremental strategy #213
Comments
Just to clarify, the problem is that in Step 4, the lightweight DELETE occurs, but the reinsert does not because of built in ClickHouse deduplication, so those records are just missing? It seems reasonable to disable deduplication for the destination table, I'll take a look. |
Yep thats what seems to happen. |
For the short term it seems like intead of using a hook you should be able to add the
Is that a fix you can validate quickly? My current thought is to add |
Ah that looks like a neater solution. Give me 15mins to try it out. |
|
Ugh, apparently it doesn't work for a table level setting. I'm curious how your I'm thinking an alternative solution is just to provide a unique |
For reference I added it like this :
|
Oh, these are table level settings... I figured these were settings on the INSERT? If so then |
Yeah the above works... thanks... I wasnt aware you could set the table SETTINGS this way. |
Probably need the |
cool, that seems like a reasonable solution. And yeah, I think that needs to be called out better in the documentation. |
Still maybe worth defaulting to this option for this strategy, do you think? |
Yes, I think it's a good default for any of the dbt incremental strategies. The ClickHouse automatic deduplication is sort of an unexpected gotcha in such a high level tool. |
Yeah, definitely had me bamboozled earlier... thanks for speedy response. Ill leave this open for you to decide how you want to deal with this. Thanks again. |
You're welcome, thanks for the report. It just so happens I'm in the middle of a lot of dbt-clickhouse work, so this is good timing :) |
Describe the bug
Steps to reproduce
Expected behaviour
step 4 should delete the records inserted in step 3 and reinsert them
Actual behaviour
The records are not reinserted in step 4
Explanation
It looks like when you insert exactly the same set of records in the incremental update, the replication deduplication mechanism kicks in and ignores the new block of rows.
I am working around this by adding an
alter table... modify setting
hook, but it is not ideal. It would be nice if I could specify this as a setting when the table is created, or alternatively that it is set automatically when using the delete+insert strategy, as otherwise it wouldnt be guaranteed to work.The text was updated successfully, but these errors were encountered: