Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Exclude foreign key references for deleted records #87

Closed
2 of 4 tasks
kylemcnair opened this issue Oct 3, 2022 · 6 comments
Closed
2 of 4 tasks

[Feature] Exclude foreign key references for deleted records #87

kylemcnair opened this issue Oct 3, 2022 · 6 comments
Labels
priority:p3 Affects many users; can wait status:in_review Currently in review type:enhancement New functionality or enhancement update_type:models Primary focus requires model updates

Comments

@kylemcnair
Copy link

kylemcnair commented Oct 3, 2022

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

Problem

Deleted records for deal, company, and ticket are excluded from their respective models, each with a final select of:

select *
from fields
where not coalesce(is_deleted, false)

However, ALL keys for these entities can appear in other models. The exclusion of deleted records isn't accounted for in other models, and can result in orphaned records.

For example, we had two companies merged into a single company, resulting in a deleted company record. Both companies shared the same deal_id. The deals_company table contains a record for each company, but only one company has a corresponding record in the hubspot_company table. This relationship resulted in a failing uniqueness test.


Solution

To maintain referential integrity, references to deleted entities should be removed from all models. ie, if a deleted company is excluded from the core company model, that deleted company_id should not appear in any other models.

This could be implemented via a macro that looks for foreign keys referencing any of the three models with deletes.

Some version of:

  • if source_columns in company_id
  • append "company id in (select company_id from stg_company) "

If this general approach makes sense, I can flesh out more specifics with the macro.

Describe alternatives you've considered

No response

Are you interested in contributing this feature?

  • Yes.
  • Yes, but I will need assistance and will schedule time during your office hours for guidance.
  • No.

Anything else?

original post in the #tools-fivetran channel in the dbt Slack

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @kylemcnair thanks so much for creating this feature request. I see the issue you are outlining and appreciate you thinking through a series of solutions.

My team and I are going to think through this a bit more and determine possible avenues forward here. Be sure to watch this thread as we will likely come back with follow up questions as we dig further into this.

@fivetran-jamie
Copy link
Contributor

@kylemcnair what are thoughts on your solution vs not deleting soft-deleted entities and persisting the is_entity_deleted columns downstream?

would you rather have no reporting on these deleted entities at all, or complete reporting with this is-deleted flag to pick and choose places to ignore them?

@kylemcnair
Copy link
Author

@kylemcnair what are thoughts on your solution vs not deleting soft-deleted entities and persisting the is_entity_deleted columns downstream?

would you rather have no reporting on these deleted entities at all, or complete reporting with this is-deleted flag to pick and choose places to ignore them?

@fivetran-jamie I like your proposed solution - it leaves a more complete dataset and would be much simpler to implement than an elaborate macro.

@kylemcnair
Copy link
Author

@fivetran-jamie should I run with this and open a PR? more than happy to, just lmk

@fivetran-jamie
Copy link
Contributor

@kylemcnair yeah if the below plan sounds good/makes sense to you! i scoped it out a bit more to identify exactly where we need to make all these changes

Plan:

  • Adjust the staging models for the following tables to NOT filter out deleted records. Persist the is_deleted or _fivetran_deleted field in the final CTE of the model
    • Company
    • Contact
    • Contact list
    • Contact list member
    • Deal
    • Deal pipeline
    • Deal pipeline stage
    • Ticket, ticket pipeline, and ticket pipeline stage, though they are not referenced in the transform package
  • Add the deleted flag to the staging table yml docs.
  • In the transform package, in each instance in the following models, add an is_<foreign key table>_deleted flag where we are joining entities with their potentially deleted foreign key references
    • Email_events_joined macro
      • Add is_contact_deleted field to the final join CTE
    • Int_hubspot__email_metrics__by_contact_list
      • Add is_contact_list_member_deleted to join CTE
    • Int_hubspot__deals_enhanced
      • Add is_deal_pipeline_deleted flag to final join CTE
      • Add is_pipeline_stage_deleted flag to final join CTE
      • Add is_deal_deleted flag to final join CTE
    • Hubspot__deal_stages
      • Add is_deal_pipeline_deleted flag to final join CTE
      • Add is_pipeline_stage_deleted flag to final join CTE
      • Add is_deal_deleted flag to final join CTE
  • Add the new field to the affected end model yml docs
  • Note somewhere (docs, readme, decision log, changelog, etc) that soft-deleted entities are included and can be removed via their boolean flags

happy to help wherever ya need or explain anything further

@fivetran-sheringuyen fivetran-sheringuyen added type:enhancement New functionality or enhancement and removed enhancement labels Dec 22, 2022
@kylemcnair kylemcnair mentioned this issue Jan 4, 2023
14 tasks
@fivetran-sheringuyen fivetran-sheringuyen added status:in_review Currently in review update_type:models Primary focus requires model updates priority:p3 Affects many users; can wait labels Jan 6, 2023
@fivetran-joemarkiewicz
Copy link
Contributor

Closing out this feature request as the above linked merged PRs address the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:p3 Affects many users; can wait status:in_review Currently in review type:enhancement New functionality or enhancement update_type:models Primary focus requires model updates
Projects
None yet
Development

No branches or pull requests

4 participants