Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update macro used for passing through all columns to ensure quoting #129

Merged
merged 30 commits into from
Oct 16, 2024

Conversation

fivetran-reneeli
Copy link
Contributor

@fivetran-reneeli fivetran-reneeli commented Oct 4, 2024

PR Overview

This PR will address the following Issue/Feature:
#128
This PR will result in the following new package version: 0.16.0

While changes are made 'behind the scenes' to now allow models to successfully run with both hubspot__pass_through_all_columns and hubspot__<>_pass_through_columns, this may be a breaking change due to leveraging the remove_duplicate_and_prefix_from_columns macro. This is a breaking change because this macro can remove duplicate fields, resulting in an impact to your schema. See the v0.12.0 release notes for more information.

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

Breaking Changes

  • Switched from using the fivetran_utils.remove_prefix_from_columns macro to the hubspot_source.remove_duplicate_and_prefix_from_columns macro for when hubspot__pass_through_all_columns is enabled and you are passing through all columns in the stg_hubspot__company, stg_hubspot__contact, stg_hubspot__deal, and stg_hubspot__ticket models. This also ensures the source fields passed through are all quoted from the onset. This is a breaking change because this macro can remove duplicate fields, resulting in an impact to your schema. See the v0.12.0 release notes for more information.

Bug Fixes

  • Introduced hubspot-specific version of the fivetran_utils.pass_through_columns macro titled hubspot_add_pass_through_columns, which introduces quoting around the source fields being brought in as passthrough columns. This will ensure that your warehouse reads the sql correctly, particularly if the field contains special characters or syntax.

Under the Hood

  • Updated seed data to include fields with special syntax in order to test the above changes.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

see internal ticket

If you had to summarize this PR in an emoji, which would it be?

💃

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli this PR looks great! However, I do have a request to make this a breaking change. I reviewed how we handled the switch from the fivetran_utils. remove_prefix_columns to remove_duplicate_and_prefix_from_columns and it was in the v0.12.0 release which it's listed as a breaking change.

This should be a breaking change because the duplicate removal nature applies a preference on the property_hs_* column names and will only keep the fields with this naming convention. For example, if a customer has propert_hs_contact_name and the contact_name field, the macro will not select the contact_name and will instead retain the property_hs_contact_name field in addition to renaming it to contact_name.

Because of the above, customers could experience certain columns being removed from their staging and downstream models if they are passing through all columns. As a result, we will need to make this a breaking change and apply a breaking change to the downstream dbt_hubspot package as well.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
fivetran-reneeli and others added 4 commits October 15, 2024 15:38
Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com>
Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com>
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli LGTM with a few final CHANGELOG change requests. Nothing required to block the approval and move to release review. Thanks for working through these updates!

CHANGELOG.md Outdated
[PR #129](https://github.com/fivetran/dbt_hubspot_source/pull/129) includes the following updates:

## Breaking Changes
- Switched from using the `fivetran_utils.remove_prefix_from_columns` macro to the `hubspot_source.remove_duplicate_and_prefix_from_columns` macro for when `hubspot__pass_through_all_columns` is enabled and you are passing through all columns in the `stg_hubspot__company`, `stg_hubspot__contact`, `stg_hubspot__deal`, and `stg_hubspot__ticket` models. This also ensures the source fields passed through are all quoted from the onset. This is a breaking change because this macro can remove duplicate fields, resulting in an impact to your schema. See the [v0.12.0 release notes](https://github.com/fivetran/dbt_hubspot_source/releases/tag/v0.12.0) for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we necessarily need to link the v0.12.0 release notes since that may cause more confusion. I think what you have before is great, no need to add the last sentence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the stg_hubspot__ticket model was not updated in this release for this change. You can include this in the below bugfix however.

CHANGELOG.md Outdated Show resolved Hide resolved
fivetran-reneeli and others added 2 commits October 16, 2024 15:33
Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com>
@fivetran-reneeli
Copy link
Contributor Author

Thanks @fivetran-joemarkiewicz made the changes to the CHANGELOG. Will move on to release review

# hubspot_service_enabled: true # enable when generating docs
# hubspot_deal_enabled: true # enable when generating docs
# hubspot_contact_enabled: true # enable when generating docs
hubspot_sales_enabled: true # enable when generating docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should lines 15-16 be commented out before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Avinash-- commented out

@@ -65,15 +67,37 @@ vars:
hubspot_email_event_dropped_identifier: "email_event_dropped_data"
hubspot_merged_deal_identifier: "merged_deal_data"

# hubspot__pass_through_all_columns: true
hubspot__company_pass_through_columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should lines 71-73 be commented out now (or completely removed) now that we've thoroughly tested and validated this solution works, or is there a reason for keeping them in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason for keeping them in-- will comment it out

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli Nice work, particularly with the cross-warehouse seed testing. Just a few quick comments before approval!

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli fivetran-reneeli merged commit 5a4301c into main Oct 16, 2024
8 checks passed
@fivetran-reneeli fivetran-reneeli deleted the bugfix/column_quote_all_passthrough_columns branch October 16, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants