Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle hard deletes in snapshots #2355

Closed
wants to merge 1 commit into from

Conversation

clrcrl
Copy link
Contributor

@clrcrl clrcrl commented Apr 23, 2020

resolves #249

Related Discourse post

Description

When a record is deleted from a snapshot query result set, dbt_valid_to will be updated in the target snapshot as the current timestamp.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Apr 23, 2020
@clrcrl clrcrl marked this pull request as draft April 23, 2020 16:12
@jrandrews
Copy link

Will there be any options to disable this behavior if we want and have snapshots just ignore deletes?

I ask because we've occasionally had issues where our source tables had all the rows truncated out of them but new rows were not inserted (thanks BigQuery!) I think what would happen in that scenario is that all the rows in our snapshot would be end-dated because the source table would have no rows in it, correct? We'd like to avoid that behavior.

Also, what would snapshots do if a row was end-dated because of being hard-deleted in the source, but then subsequently a new row with the same common PK value showed in up the source? E.g. in our scenario if all the rows in the snapshotwith NULL end-dates were first end-dated because the source was truncated but not reloaded, but then the next go-around the source table had been repopulated again?

@clrcrl
Copy link
Contributor Author

clrcrl commented Apr 24, 2020

So the context here is that I wrote this code with a community member the other day, and didn't want to lose it! I don't think we're going to merge this as-is, it needs a lot more testing. For now, just leaving it open as a draft PR

@tayloramurphy
Copy link

@clrcrl what is the effect of switching the join type? I see that the macro doesn't even exist anymore and is now just snapshot_staging_table (based on what I see in https://github.com/fishtown-analytics/dbt/blob/dev/marian-anderson/core/dbt/include/global_project/macros/materializations/snapshot/snapshot.sql#L26)

If we add the timestamp with deletes and override this internal macro, does it break anything else?

@clrcrl
Copy link
Contributor Author

clrcrl commented Jul 8, 2020

Here's a Loom on the topic

And the example data that I'm talking through

@clrcrl
Copy link
Contributor Author

clrcrl commented Sep 10, 2020

Closing as there's no intent to merge this at this time.

@clrcrl clrcrl closed this Sep 10, 2020
@kwigley kwigley deleted the snapshot-hard-deletes branch February 12, 2021 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

snapshots should handle hard deletes
3 participants