Bugfix/duplicate columns #115

fivetran-joemarkiewicz · 2023-09-27T19:43:31Z

PR Overview

This PR will address the following Issue/Feature: dbt_hubspot Issue 119

This PR will result in the following new package version: v0.12.0

This will technically be a breaking change since it will remove (via a coalesce) existing impacted fields if duplicates are identified. As such, I would feel more comfortable with this being a breaking change so customers are aware of the upgrade being applied.

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

🚨 Breaking Changes 🚨

The following models have received an update to leverage a new custom macro to remove the prefix property_hs_ prefix from the source columns in the staging models. If a column with the prefix removed matches the same name as an existing column (for example property_hs_meeting_outcome and meeting_outcome are both fields in the source table), then the new macro will coalesce the fields giving preference to the property_hs_ field as this is likely the most relevant field per the latest HubSpot API upgrade.
- stg_hubspot__engagement_call
- stg_hubspot__engagement_company
- stg_hubspot__engagement_contact
- stg_hubspot__engagement_deal
- stg_hubspot__engagement_email
- stg_hubspot__engagement_meeting
- stg_hubspot__engagement_note
- stg_hubspot__engagement_task
- stg_hubspot__ticket
- stg_hubspot__ticket_company
- stg_hubspot__ticket_contact
- stg_hubspot__ticket_deal
- stg_hubspot__ticket_engagement
- stg_hubspot__ticket_property_history

Feature Updates

A new macro remove_duplicate_and_prefix_from_columns has been included which expands off the fivetran_utils.remove_prefix_columns macro by removing any duplicate columns that result from the prefix removal.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

dbt compile
dbt run –full-refresh
dbt run
dbt test
[n/a] dbt run –vars (if applicable)

Before marking this PR as "ready for review" the following have been applied:

The appropriate issue has been linked and tagged
You are assigned to the corresponding issue and this PR
BuildKite integration tests are passing

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":

You have validated these changes and assure this PR will address the respective Issue/Feature.
You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
You have provided details below around the validation steps performed to gain confidence in these changes.

These steps were validated by recreating the issue with the seed data for our integration tests and also via validation with the customer that the fix does in fact resolve the error they are seeing.

Standard Updates

Please acknowledge that your PR contains the following standard updates:

Package versioning has been appropriately indexed in the following locations:
- indexed within dbt_project.yml
- indexed within integration_tests/dbt_project.yml
CHANGELOG has individual entries for each respective change in this PR
README updates have been applied (if applicable)
[n/a] DECISIONLOG updates have been updated (if applicable)
[n/a] Appropriate yml documentation has been added (if applicable)

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

docs were regenerated (unless this PR does not include any code or yml updates)

If you had to summarize this PR in an emoji, which would it be?

2️⃣

Loop twice

fivetran-catfritz

Was able to reproduce issue using seed data and confirm error resolved. I also confirmed that hs_property was preferred over the non-prefixed column.

lgtm!

greg-finley

Thanks much! Will it be released soon?

greg-finley · 2023-09-29T18:10:38Z

macros/remove_duplicate_and_prefix_from_columns.sql

+        {% for col in columns if col.name not in exclude %}
+        {%- for dupe in columns if col.name[prefix|length:]|lower == dupe.name|lower -%}


This nested loop thing is O(n^2) right? I'm still learning Jinja, but seems it also supports dictionaries https://documentation.bloomreach.com/engagement/docs/datastructures#dictionaries

@greg-finley correct it does result in n^2 and jinja does support dictionaries. How are you proposing we leverage a dictionary to replace the nested loops?

It may be worthwhile to move forward with this solution in the immediate so users can resolve the error and consider optimizing the macro down the road.

Yep, 100%, let's ship it.

If we had the list of columns in a dictionary or set, we could O(1) look up whether the duplicate name exists vs looking through the column list again.

fivetran-joemarkiewicz · 2023-09-29T18:17:43Z

Thanks much! Will it be released soon?

I am very close to releasing this! Likely Monday morning at this point. I heard from our product team that these additional fields (the non property_hs_ fields) were due to a bug in the connector and we may want to ignore the non property_hs_ fields altogether. I am just waiting on confirmation from our product team if I should keep the coalesce or remove it entirely.

greg-finley · 2023-09-29T18:19:26Z

Thanks much! Will it be released soon?

I am very close to releasing this! Likely Monday morning at this point. I heard from our product team that these additional fields (the non property_hs_ fields) were due to a bug in the connector and we may want to ignore the non property_hs_ fields altogether. I am just waiting on confirmation from our product team if I should keep the coalesce or remove it entirely.

From looking at my own data, they seem to be all nulls, so I think removing or coalescing would have the same effect (though I guess slightly more efficient to avoid the coalesce)

fivetran-joemarkiewicz and others added 4 commits September 27, 2023 13:25

bugfix/duplicate-columns

29d4d28

lower functions for casing issues

a03b219

Loop twice

1677fd5

Merge pull request #114 from greg-finley/greg-dupe2

2d74ab4

Loop twice

fivetran-joemarkiewicz self-assigned this Sep 27, 2023

fivetran-joemarkiewicz added 2 commits September 27, 2023 14:45

final updates

078af34

succinct changelog

b62f5c5

fivetran-joemarkiewicz mentioned this pull request Sep 27, 2023

bugfix/duplicate-columns fivetran/dbt_hubspot#120

Merged

15 tasks

fivetran-joemarkiewicz marked this pull request as ready for review September 27, 2023 20:12

fivetran-catfritz approved these changes Sep 27, 2023

View reviewed changes

greg-finley approved these changes Sep 29, 2023

View reviewed changes

greg-finley reviewed Sep 29, 2023

View reviewed changes

remove coalesce from macro

f31c7c8

fivetran-joemarkiewicz merged commit b79137d into main Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/duplicate columns #115

Bugfix/duplicate columns #115

fivetran-joemarkiewicz commented Sep 27, 2023 •

edited

Loading

fivetran-catfritz left a comment

greg-finley left a comment

greg-finley Sep 29, 2023

fivetran-joemarkiewicz Oct 2, 2023

greg-finley Oct 2, 2023

fivetran-joemarkiewicz commented Sep 29, 2023

greg-finley commented Sep 29, 2023

		{% for col in columns if col.name not in exclude %}
		{%- for dupe in columns if col.name[prefix\|length:]\|lower == dupe.name\|lower -%}

Bugfix/duplicate columns #115

Bugfix/duplicate columns #115

Conversation

fivetran-joemarkiewicz commented Sep 27, 2023 • edited Loading

PR Overview

🚨 Breaking Changes 🚨

Feature Updates

PR Checklist

Basic Validation

Detailed Validation

Standard Updates

dbt Docs

If you had to summarize this PR in an emoji, which would it be?

fivetran-catfritz left a comment

Choose a reason for hiding this comment

greg-finley left a comment

Choose a reason for hiding this comment

greg-finley Sep 29, 2023

Choose a reason for hiding this comment

fivetran-joemarkiewicz Oct 2, 2023

Choose a reason for hiding this comment

greg-finley Oct 2, 2023

Choose a reason for hiding this comment

fivetran-joemarkiewicz commented Sep 29, 2023

greg-finley commented Sep 29, 2023

fivetran-joemarkiewicz commented Sep 27, 2023 •

edited

Loading