Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal Pipeline Stage Label is duplicated #75

Closed
2 of 4 tasks
moseleyi opened this issue Aug 1, 2022 · 22 comments
Closed
2 of 4 tasks

Deal Pipeline Stage Label is duplicated #75

moseleyi opened this issue Aug 1, 2022 · 22 comments
Assignees
Labels
type:bug Something is broken or incorrect

Comments

@moseleyi
Copy link

moseleyi commented Aug 1, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

Here is an extract for a deal that has two different stages:

image

The stage_ids are properly mapped in stg_hubspot__deal_pipeline_stage

But then, hubspot__deal_stages shows the same id for both stages

image

Relevant error log or model output

No response

Expected behavior

The first row should have Pipeline_stage_label = New and Deal_pipeline_stage_id = 6309924

dbt Project configurations

vars:
  hubspot_source:
    hubspot_database: FIVETRAN
    hubspot_schema: hubspot
  hubspot_email_event_print_enabled: false
  hubspot_email_event_forward_enabled: false
  hubspot_contact_merge_audit_enabled: true
  hubspot__pass_through_all_columns: true

Package versions

packages:

  • package: dbt-labs/audit_helper
    version: 0.5.0
  • package: fivetran/hubspot
    version: [">=0.5.0", "<0.6.0"]

What database are you using dbt with?

snowflake

dbt Version

1.1.1

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.
@moseleyi moseleyi added the bug label Aug 1, 2022
@fivetran-joemarkiewicz
Copy link
Contributor

Hi @moseleyi thanks so much for opening this Issue!

This definitely seems like a behavior that is not intended and should be addressed. I did some querying on my side as well and found the same results you are mentioning where the same stage is applied to each stage in the final model for the deal.

I looked a bit closer and found that we seem to filter out inactive stages in our staging model.
https://github.com/fivetran/dbt_hubspot_source/blob/3888cde8d1d87ef65e406a9511509bb2af5128d1/models/stg_hubspot__deal_pipeline_stage.sql#L37

This in turn ensures that each stage in the hubspot__deal_stages only references the active stage (which I imagine is not the desired outcome). It seems that removing that where clause in the staging model could resolve this issue. Do you see any issue on your end with including inactive stages in the downstream transformations?

@moseleyi
Copy link
Author

moseleyi commented Aug 2, 2022

Since it's a historical table, the active flag is completely redundant, as we want to see all past stages regardless of their current status.

I ran the models without the WHERE clause you mentioned. Suddenly the deal ID has disappeared from hubspot__deal_stages AND hubspot__deals, even though it's still there in stg_hubspot__deal_stage.

I'm not sure this is it because when I look at stg_hubspot__deal_pipeline_stage_tmp all the stages have FALSE in _fivetran_deleted:

image

@moseleyi
Copy link
Author

moseleyi commented Aug 4, 2022

Any news? It's critical bug as it stops being able to analyse deal stages

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @moseleyi thanks for getting back with these details. I was able to look into this some more and found that removing the filter didn't have the same effect you are mentioning where the deal_id seems to be removed. Would you be able to swap your packages.yml dependency on the hubspot package for the one below:

packages:
  - git: https://github.com/fivetran/dbt_hubspot.git
    revision: bugfix/deleted-stages
    warn-unpinned: false

Let me know if this resolves the issue on your end!

@fivetran-joemarkiewicz fivetran-joemarkiewicz self-assigned this Aug 5, 2022
@moseleyi
Copy link
Author

moseleyi commented Aug 5, 2022

Still seeing the same results:

hubspot.deal_stages
image

stg_hubspot__deal_stage
image

@moseleyi
Copy link
Author

moseleyi commented Aug 5, 2022

This code:

 select
        deal_stage.deal_id || '-' || row_number() over(partition by deal_stage.deal_id order by deal_stage.date_entered asc) as deal_stage_id,
        deal_stage.deal_stage_name, 
        deal_stage._fivetran_start as date_stage_entered,
        deal_stage._fivetran_end as date_stage_exited,
        deal_stage._fivetran_active as is_stage_active,
        deals_enhanced.*

    from deal_stage

    left join deals_enhanced
        on deal_stage.deal_id = deals_enhanced.deal_id

Is a little bit weird. Why do we bring all the deals' data to deal stages? That's why my query returns the label only of the current one. I expected the pipeline_stage_label to be related to deal_stage_name not to the current deal's one

@moseleyi
Copy link
Author

moseleyi commented Aug 5, 2022

I added this:

SELECT
  deal_stage_id,
  deal_stage_name,
  date_stage_entered,
  date_stage_exited,
  deal_id,
  dps.pipeline_stage_label,
  dps.deal_pipeline_id,
  dp.pipeline_label
FROM final
  LEFT JOIN analytics.base_hubspot.stg_hubspot__deal_pipeline_stage dps ON dps.deal_pipeline_stage_id = final.deal_stage_name
  LEFT JOIN analytics.base_hubspot.stg_hubspot__deal_pipeline dp ON dp.deal_pipeline_id = dps.deal_pipeline_id
WHERE
  deal_id = 9631290514

To get the name of each stage, and here are the results for the same deal:

image

@fivetran-jamie
Copy link
Collaborator

hey @moseleyi - so yes you are correct in that we are only grabbing pipeline stage information for the deal's current stage, which doesn't make sense for the hubspot__deal_stages model.

i've adapted your logic from the query above and joined in information for each stage if you wanna try out this branch again!

packages:
  - git: https://github.com/fivetran/dbt_hubspot.git
    revision: bugfix/deleted-stages
    warn-unpinned: false

@moseleyi
Copy link
Author

Thanks @fivetran-jamie for the update. Because we bring all the data from Deals to Deal Stages, I'm still confused as to which field corresponds to the stages. pipeline_stage_label is stil a current stage because it's being taken from deals. Bringing all the data from parent to child table causes a lot of confusion especially since it's a historical table. There's no differentiation in hubspot__deal_stages as to which fields are historical (deal_stage_name, deal_stage_id) and which ones are current (pipeline_stage_label, deal_pipeline_stage_id).

Two things:

  1. I wouldn't bring all the data from parent to this table
  2. Pipeline Stage Label is still current one and doesn't correspond to deal_stage_id

That's because Deal Stage Name is still a numerical way of hubspot naming the stages. What we need is a label:
image

@fivetran-jamie
Copy link
Collaborator

fivetran-jamie commented Aug 18, 2022

@moseleyi I completely agree -- having the parent deal data in there is confusing (and unnecessary, as one could just join with the hubspot__deals model)

i've removed the deal info (with the exceptions of deal_id and deal_name) from the bugfix/deleted-stages branch if you could re-run and check if things seem clearer/correct

This was referenced Aug 18, 2022
@moseleyi
Copy link
Author

Do I need to update the package information?

  - git: https://github.com/fivetran/dbt_hubspot.git
    revision: bugfix/deleted-stages
    warn-unpinned: false

When I do dbt build, I can't even see the latest code

image

@fivetran-jamie
Copy link
Collaborator

oh hm, if you're in dbt cloud i don't think you can dbt clean, but did you dbt deps after switching the hubspot package reference?

@moseleyi
Copy link
Author

You need to change reference to the deal name field, it's dealname not deal_name from deals_enhanced

@moseleyi
Copy link
Author

Apart from that, things are working as expected :)

@fivetran-jamie
Copy link
Collaborator

great! i'm a little confused about the dealname issue though...i believe we rename the column to deal_name in the staging model -- are you seeing an error somewhere?

@moseleyi
Copy link
Author

moseleyi commented Aug 23, 2022

Indeed I do see an error (after doing dbt deps and dbt build):
image

When I look at Fivetran source hubspot.deal table it's property_dealname

Then it's renamed to to deal_name in stg_hubspot__deal but only in one of the IF statements

{% if var('hubspot__pass_through_all_columns', false) %}
        -- just pass everything through
        {{ 
            fivetran_utils.remove_prefix_from_columns(
                columns=adapter.get_columns_in_relation(ref('stg_hubspot__deal_tmp')), 
                prefix='property_') 
        }}
    from base

{% else %}
        -- just default columns + explicitly configured passthrough columns
        _fivetran_synced,
        is_deleted,
        deal_id,
        cast(deal_pipeline_id as {{ dbt_utils.type_string() }}) as deal_pipeline_id,
        cast(deal_pipeline_stage_id as {{ dbt_utils.type_string() }}) as deal_pipeline_stage_id,
        owner_id,
        portal_id,
        property_dealname as deal_name,
        property_description as description,
        property_amount as amount,
        property_closedate as closed_at,
        property_createdate as created_at

        --The below macro adds the fields defined within your hubspot__deal_pass_through_columns variable into the staging model
        {{ fivetran_utils.fill_pass_through_columns('hubspot__deal_pass_through_columns') }}

        -- The below macro add the ability to create calculated fields using the hubspot__deal_calculated_fields variable.
        {{ fivetran_utils.calculated_fields('hubspot__deal_calculated_fields') }}

    from macro
{% endif %}

But my value for this variable is actually true (so it should be renamded) but if I look into stg_hubspot__deal the name is dealname.

Shouldn't the statement in the IF say {% if var('hubspot__pass_through_all_columns', true) %} ?

@fivetran-jamie
Copy link
Collaborator

Ohhhh great catch ... yeah we're renaming property_dealname (and property_closedate and property_createdate) inconsistently, depending on the value of hubspot__pass_through_all_columns

I've just pushed some code that renames these columns to deal_name, closed_at, and created_at regardless of your value for hubspot__pass_through_all_columns if you could re-install and try it out! thanks

@moseleyi
Copy link
Author

Works now : )

@moseleyi
Copy link
Author

@fivetran-jamie Do I need to wait for the release?

image

@fivetran-jamie
Copy link
Collaborator

ah yeah the release should come out tomorrow -- we were rolling out the source package release but uncovered a bug with dbt core that impacts some new functionality in the packages

@moseleyi
Copy link
Author

Thanks!

@fivetran-jamie
Copy link
Collaborator

release is live so closing this one out!

@fivetran-sheringuyen fivetran-sheringuyen added the type:bug Something is broken or incorrect label Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something is broken or incorrect
Projects
None yet
Development

No branches or pull requests

4 participants