Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add service end models #123

Merged
merged 24 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .buildkite/scripts/run_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,12 @@ dbt seed --target "$db" --full-refresh
dbt compile --target "$db" --select hubspot # source does not compile at this time
dbt run --target "$db" --full-refresh
dbt test --target "$db"
dbt run --vars '{hubspot_marketing_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
dbt run --target "$db" --vars '{hubspot_service_enabled: true}' --full-refresh
dbt run --target "$db" --vars '{hubspot_service_enabled: true}'
dbt test --target "$db"
dbt run --vars '{hubspot_service_enabled: true, hubspot_marketing_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
dbt run --vars '{hubspot_service_enabled: true, hubspot_marketing_enabled: true, hubspot_sales_enabled: false}' --target "$db"
dbt test --target "$db"
dbt run --vars '{hubspot_marketing_enabled: true, hubspot_contact_merge_audit_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
dbt run --vars '{hubspot_marketing_enabled: false, hubspot_sales_enabled: true, hubspot__pass_through_all_columns: true, hubspot_using_all_email_events: false}' --target "$db" --full-refresh
dbt test --target "$db"
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# dbt_hubspot v0.14.0

## New Model Alert 😮
Introducing Service end models! These are disabled by default but can be enabled by setting `hubspot_service_enabled` to `true` ([PR #123](https://github.com/fivetran/dbt_hubspot/pull/123)):
- `hubspot__tickets` - [Docs](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__tickets)
- `hubspot__daily_ticket_history` - [Docs](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__daily_ticket_history)
- See additional configurations for the history model in [README](https://github.com/fivetran/dbt_hubspot/tree/main#daily-ticket-history)

# dbt_hubspot v0.13.0
## 🚨 Breaking Changes 🚨
- This release will be a breaking change due to the removal of below dependencies.
Expand Down
45 changes: 41 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ The following table provides a detailed list of all models materialized within t
| [hubspot__deals](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deals) | Each record represents a deal in Hubspot, enriched with metrics about engagement activities. |
| [hubspot__deal_stages](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_stages) | Each record represents a deal stage in Hubspot, enriched with metrics deal activities. |
| [hubspot__deal_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_history) | Each record represents a change to a deal in Hubspot, with `valid_to` and `valid_from` information. |
| [hubspot__tickets](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__tickets) | Each record represents a ticket in Hubspot, enriched with metrics about engagement activities and information on associated deals, contacts, companies, and owners. |
| [hubspot__daily_ticket_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__daily_ticket_history) | Each record represents a ticket's day in Hubspot with tracked properties pivoted out into columns. |
| [hubspot__email_campaigns](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__email_campaigns) | Each record represents a email campaign in Hubspot, enriched with metrics about email activities. |
| [hubspot__email_event_*](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__email_event_bounce) | Each record represents an email event in Hubspot, joined with relevant tables to make them analysis-ready. |
| [hubspot__email_sends](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__email_sends) | Each record represents a sent email in Hubspot, enriched with metrics about opens, clicks, and other email activity. |
Expand All @@ -57,6 +59,14 @@ dispatch:
search_order: ['spark_utils', 'dbt_utils']
```

### Database Incremental Strategies
Some of the models (`+hubspot__daily_ticket_history`) in this package are materialized incrementally. We have chosen `insert_overwrite` as the default strategy for **BigQuery** and **Databricks** databases, as it is only available for these dbt adapters. For **Snowflake**, **Redshift**, and **Postgres** databases, we have chosen `delete+insert` as the default strategy.

`insert_overwrite` is our preferred incremental strategy because it will be able to properly handle updates to records that exist outside the immediate incremental window. That is, because it leverages partitions, `insert_overwrite` will appropriately update existing rows that have been changed upstream instead of inserting duplicates of them--all without requiring a full table scan.

`delete+insert` is our second-choice as it resembles `insert_overwrite` but lacks partitions. This strategy works most of the time and appropriately handles incremental loads that do not contain changes to past records. However, if a past record has been updated and is outside of the incremental window, `delete+insert` will insert a duplicate record. 😱
> Because of this, we highly recommend that **Snowflake**, **Redshift**, and **Postgres** users periodically run a `--full-refresh` to ensure a high level of data quality and remove any possible duplicates.

## Step 2: Install the package
Include the following hubspot package version in your `packages.yml` file:
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
Expand Down Expand Up @@ -133,11 +143,10 @@ vars:
hubspot_owner_enabled: false

# Service
hubspot_service_enabled: true # Enables all service models
hubspot_ticket_deal_enabled: true
hubspot_service_enabled: true # Enables all service/ticket models. Default = false
hubspot_ticket_deal_enabled: true # Default = false
```
## (Optional) Step 5: Additional configurations
<details><summary>Expand for configurations</summary>

### Configure email metrics
This package allows you to specify which email metrics (total count and total unique count) you would like to be calculated for specified fields within the `hubspot__email_campaigns` model. By default, the `email_metrics` variable below includes all the shown fields. If you would like to remove any field metrics from the final model, you may copy and paste the below snippet within your root `dbt_project.yml` and remove any fields you want to be ignored in the final model.
Expand Down Expand Up @@ -210,6 +219,35 @@ vars:
hubspot_using_all_email_events: false # True by default
```

### Daily ticket history
The `hubspot__daily_ticket_history` model is disabled by default, but will materialize if `hubspot_service_enabled` is set to `true`. See additional configurations for this model below.

> **Note**: `hubspot__daily_ticket_history` and its parent intermediate models are incremental. After making any of the below configurations, you will need to run a full refresh.

#### **Tracking ticket properties**
By default, `hubspot__daily_ticket_history` will track each ticket's state, pipeline, and pipeline stage and pivot these properties into columns. However, any property from the source `TICKET_PROPERTY_HISTORY` table can be tracked and pivoted out into columns. To add other properties to this end model, add the following configuration to your `dbt_project.yml` file:

```yml
vars:
hubspot__ticket_property_history_columns:
- the
- list
- of
- property
- names
```

#### **Extending ticket history past closing date**
This package will create a row in `hubspot__daily_ticket_history` for each day that a ticket is open, starting at its creation date. A Hubspot ticket can be altered after being closed, so its properties can change after this date.

By default, the package will track a ticket up to its closing date (or the current date if still open). To capture post-closure changes, you may want to extend a ticket's history past the close date. To do so, add the following configuration to your root dbt_project.yml file:

```yml
vars:
hubspot:
ticket_history_extension_days: integer_number_of_days # default = 0
```

### Changing the Build Schema
By default this package will build the HubSpot staging models within a schema titled (<target_schema> + `_stg_hubspot`) and HubSpot final models within a schema titled (<target_schema> + `hubspot`) in your target database. If this is not where you would like your modeled HubSpot data to be written to, add the following configuration to your root `dbt_project.yml` file:

Expand All @@ -230,7 +268,6 @@ If an individual source table has a different name than the package expects, add
vars:
hubspot_<default_source_table_name>_identifier: your_table_name
```
</details>

## (Optional) Step 6: Orchestrate your models with Fivetran Transformations for dbt Core™
<details><summary>Expand for details</summary>
Expand Down
10 changes: 10 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,16 @@ vars:
engagement_meeting: "{{ ref('stg_hubspot__engagement_meeting') }}"
engagement_note: "{{ ref('stg_hubspot__engagement_note') }}"
engagement_task: "{{ ref('stg_hubspot__engagement_task') }}"

ticket_company: "{{ ref('stg_hubspot__ticket_company') }}"
ticket_contact: "{{ ref('stg_hubspot__ticket_contact') }}"
ticket_deal: "{{ ref('stg_hubspot__ticket_deal') }}"
ticket_engagement: "{{ ref('stg_hubspot__ticket_engagement') }}"
ticket_pipeline: "{{ ref('stg_hubspot__ticket_pipeline') }}"
ticket_pipeline_stage: "{{ ref('stg_hubspot__ticket_pipeline_stage') }}"
ticket_property_history: "{{ ref('stg_hubspot__ticket_property_history') }}"
ticket: "{{ ref('stg_hubspot__ticket') }}"

hubspot_contact_merge_audit_enabled: false
models:
hubspot:
Expand Down
3 changes: 3 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ seeds:
created: timestamp
obsoleted_by_created: timestamp
sent_by_created: timestamp
ticket_property_history_data:
+column_types:
timestamp_instant: timestamp

dispatch:
- macro_namespace: dbt_utils
Expand Down
3 changes: 2 additions & 1 deletion integration_tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ dbt-redshift>=1.3.0,<2.0.0
dbt-postgres>=1.3.0,<2.0.0
dbt-spark>=1.3.0,<2.0.0
dbt-spark[PyHive]>=1.3.0,<2.0.0
dbt-databricks>=1.3.0,<2.0.0
dbt-databricks>=1.3.0,<2.0.0
oscrypto @ git+https://github.com/wbond/oscrypto.git@d5f3437
23 changes: 11 additions & 12 deletions integration_tests/seeds/ticket_property_history_data.csv
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
_fivetran_synced,ticket_id,name,source,source_id,timestamp_instant,value
2020-07-09 11:06:21.056,1,seed name, source name, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed data is cool, source name1, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,i can make these names whatever i want, source name2, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed name1, source name3, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed name2, source name4, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed name3, source name5, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,stop being lazy with names, source name99, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed name4, source name6?, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed name5, source name7, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,seed name6, source name8, sg5923,2020-07-09 11:06:21.056,value
2020-07-09 11:06:21.056,1,i got lazy with names, source name99, sg5923,2020-07-09 11:06:21.056,value
_fivetran_start,name,ticket_id,_fivetran_active,_fivetran_end,_fivetran_synced,source,source_id,timestamp_instant,value
2023-10-04 12:39:39.423000,hs_object_id,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:39.423000,123456
2023-10-04 13:13:34.875000,hs_lastmodifieddate,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 13:13:34.875000,1696425214875
2023-10-04 13:13:20.101000,hs_lastmodifieddate,123456,false,2023-10-04 13:13:34.874000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 13:13:20.101000,1696425200101
2023-10-04 12:43:14.219000,hs_lastmodifieddate,123456,false,2023-10-04 13:13:20.100000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 12:43:14.219000,1696423394219
2023-10-04 12:41:57.805000,hs_lastmodifieddate,123456,false,2023-10-04 12:43:14.218000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 12:41:57.805000,1696423317805
2023-10-04 12:39:42.413000,hs_lastmodifieddate,123456,false,2023-10-04 12:41:57.804000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 12:39:42.413000,1696423182413
2023-10-04 12:39:41.528000,hs_lastmodifieddate,123456,false,2023-10-04 12:39:42.412000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:41.528000,1696423181528
2023-10-04 12:39:39.423000,hs_lastmodifieddate,123456,false,2023-10-04 12:39:41.527000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:39.423000,1696423179423
2023-10-04 12:39:39.423000,hs_object_source,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:39.423000,CONVERSATIONS
2023-10-04 12:39:41.528000,hs_thread_ids_to_restore,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:41.528000,5523941243
144 changes: 144 additions & 0 deletions models/service/hubspot__daily_ticket_history.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
{{
config(
enabled=var('hubspot_service_enabled', False),
materialized='incremental',
partition_by = {'field': 'date_day', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['date_day'],
unique_key='ticket_day_id',
incremental_strategy = 'insert_overwrite' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert',
file_format = 'parquet'
)
}}

{%- set change_data_columns = adapter.get_columns_in_relation(ref('int_hubspot__scd_daily_ticket_history')) -%}

with change_data as (

select *
from {{ ref('int_hubspot__scd_daily_ticket_history') }}

{% if is_incremental() %}
where date_day >= (select max(date_day) from {{ this }})

-- If no issue fields have been updated since the last incremental run, the pivoted_daily_history CTE will return no record/rows.
-- When this is the case, we need to grab the most recent day's records from the previously built table so that we can persist
-- those values into the future.

), most_recent_data as (

select
*
from {{ this }}
where date_day = (select max(date_day) from {{ this }} )
{% endif %}

), calendar as (

select *
from {{ ref('int_hubspot__ticket_calendar_spine') }}

{% if is_incremental() %}
where date_day >= (select max(date_day) from {{ this }})
{% endif %}

), pipeline as (

select *
from {{ var('ticket_pipeline')}}

), pipeline_stage as (

select *
from {{ var('ticket_pipeline_stage')}}

), joined as (

select
calendar.date_day,
calendar.ticket_id
{% if is_incremental() %}
{% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
, coalesce(change_data.{{ col.name }}, most_recent_data.{{ col.name }}) as {{ col.name }}
{% endfor %}

{% else %}
{% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
, {{ col.name }}
{% endfor %}
{% endif %}

from calendar
left join change_data
on calendar.ticket_id = change_data.ticket_id
and calendar.date_day = change_data.date_day

{% if is_incremental() %}
left join most_recent_data
on calendar.ticket_id = most_recent_data.ticket_id
and calendar.date_day = most_recent_data.date_day
{% endif %}

), set_values as (

select
date_day,
ticket_id

{% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
, {{ col.name }}
-- create a batch/partition once a new value is provided
, sum(case when joined.{{ col.name }} is null then 0 else 1 end) over (
partition by ticket_id
order by date_day rows unbounded preceding) as {{ col.name }}_partition
{% endfor %}

from joined

), fill_values as (

select
date_day,
ticket_id

{% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
-- grab the value that started this batch/partition
, first_value( {{ col.name }} ) over (
partition by ticket_id, {{ col.name }}_partition
order by date_day asc rows between unbounded preceding and current row) as {{ col.name }}
{% endfor %}

from set_values

), fix_null_values as (

select
date_day,
ticket_id,
pipeline_stage.ticket_state,
pipeline.pipeline_label as hs_pipeline_label,
pipeline_stage.pipeline_stage_label as hs_pipeline_stage_label

{% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
-- we de-nulled the true null values earlier in order to differentiate them from nulls that just needed to be backfilled
, case when cast( {{ col.name }} as {{ dbt.type_string() }} ) = 'is_null' then null else {{ col.name }} end as {{ col.name }}
{% endfor %}

from fill_values

left join pipeline
on cast(fill_values.hs_pipeline as {{ dbt.type_int() }}) = pipeline.ticket_pipeline_id
left join pipeline_stage
on cast(fill_values.hs_pipeline_stage as {{ dbt.type_int() }}) = pipeline_stage.ticket_pipeline_stage_id
and pipeline.ticket_pipeline_id = pipeline_stage.ticket_pipeline_id

), surrogate as (

select
{{ dbt_utils.generate_surrogate_key(['date_day','ticket_id']) }} as ticket_day_id,
*

from fix_null_values
)

select *
from surrogate
Loading
Loading