Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/uniqueness-test-adjustment #94

Merged
merged 9 commits into from
Feb 6, 2023

Conversation

fivetran-joemarkiewicz
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz commented Feb 1, 2023

Are you a current Fivetran customer?

Fivetran created PR

What change(s) does this PR introduce?

Bug Fixes

  • Following the release of v0.8.0, the end model uniqueness tests were not updated to account for the added flexibility of the inclusion of deleted records. As such the respective end model tests have been adjusted to test uniqueness only on non-deleted records. (#94)

Under the Hood

  • Addition of the dbt-expectations package to be used for more robust testing of uniqueness for end models. (#94)

Did you update the CHANGELOG?

  • Yes

Does this PR introduce a breaking change?

  • Yes (please provide breaking change details below.)
  • No (please provide an explanation as to how the change is non-breaking below.)

This is simply an update to adjust the uniqueness tests following the release from v0.8.0. This will not be a breaking change.

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

  • Yes

Is this PR in response to a previously created Bug or Feature Request

How did you test the PR changes?

  • Buildkite
  • Local (please provide additional testing details below)

Select which warehouse(s) were used to test the PR

  • BigQuery
  • Redshift
  • Snowflake
  • Postgres
  • Databricks
  • Other (provide details below)

Provide an emoji that best describes your current mood

🦄

Feedback

We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your feedback on our existing dbt packages or what you'd like to see next.

@fivetran-joemarkiewicz
Copy link
Contributor Author

@fivetran-jamie FYI I also explored adding more rigorous testing instead of just removing the uniqueness tests. However, I was having a hard time rationalizing the need for a complicated test when I felt the removal would suffice. It was either add a complicated test that could be brittle and break along the way, or remove it and rely on the not_null test only.

Open to your thoughts.

@fivetran-joemarkiewicz fivetran-joemarkiewicz marked this pull request as ready for review February 1, 2023 16:12
Copy link
Collaborator

@fivetran-jamie fivetran-jamie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think we need to adjust the tests for the email event models, not for models like contacts or deal_stages. i followed up in the issue to clarify which models are failing

i also think it would be best to include the is_contact_deleted fields in a dbt_utils.unique_combination_of_columns test rather than just removing the uniqueness tests

also also, i noticed that some _fivetran_synced entries have the _fivetran_deleted doc definition if you'd have bandwidth to fix those in here (or i can make an issue)

@fivetran-joemarkiewicz
Copy link
Contributor Author

so i think we need to adjust the tests for the email event models, not for models like contacts or deal_stages. i followed up in the issue to clarify which models are failing

i also think it would be best to include the is_contact_deleted fields in a dbt_utils.unique_combination_of_columns test rather than just removing the uniqueness tests

also also, i noticed that some _fivetran_synced entries have the _fivetran_deleted doc definition if you'd have bandwidth to fix those in here (or i can make an issue)

Thanks for following up here with these notes @fivetran-jamie! I definitely will have the bandwidth to make the adjustments on the docs definitions.

Additionally, I had one follow up regarding the addition of the tests. Would it technically work to simply include the deleted column to the uniqueness test? What if there are multiple deleted entries for a single contact? Wouldn't that still return a test failure as there could be five deleted entries and only one non deleted? I would assume that uniqueness test on multiple columns would fail.

If we wanted to truly test this, I think we would need to use a more robust test. Maybe something along the lines of the dbt-expectations expect_column_values_to_be_unique test to test on the condition where we filter out deleted records.

tests:
  - dbt_expectations.expect_column_values_to_be_unique:
      row_condition: "not is_deleted_field"

The inverse of this approach would be to just remove the uniqueness test altogether.

@fivetran-jamie
Copy link
Collaborator

@fivetran-joemarkiewicz ah i definitely like the expect_column_values_to_be_unique approach, since i suppose in theory a contact could be deleted multiple times 🤔 and the test reflects our actual thinking more appropriately

@fivetran-joemarkiewicz
Copy link
Contributor Author

Just applied the updates and the users shared that the test updates fixed the issues on their end (see the issue for details).

This is now ready for re-review. I will plan to regen docs following your review/approval.

Copy link
Collaborator

@fivetran-jamie fivetran-jamie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of questions! nearly there

@@ -20,7 +20,8 @@ models:
- name: event_id
description: The ID of the event.
tests:
- unique
- dbt_expectations.expect_column_values_to_be_unique:
row_condition: "not is_contact_deleted"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we ever need to coalesce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. My thought was that since we did the coalesce in the staging model then it didn't need to be done here. However, we did the coalesce in the where clause. So, yes a coalesce would be wise to add here. Will update!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

models/sales/sales.yml Show resolved Hide resolved
@@ -1,3 +1,5 @@
packages:
- package: fivetran/hubspot_source
version: [">=0.8.0", "<0.9.0"]
- package: calogica/dbt_expectations
version: [">=0.8.0", "<0.9.0"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add this to the list of package dependencies in the README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and added dbt-date as well.

Copy link
Collaborator

@fivetran-jamie fivetran-jamie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@fivetran-joemarkiewicz fivetran-joemarkiewicz merged commit 123b3ef into main Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants