fivetran · fivetran-jamie · Oct 30, 2023 · Oct 9, 2023 · Oct 21, 2023 · Oct 23, 2023
diff --git a/.buildkite/scripts/run_models.sh b/.buildkite/scripts/run_models.sh
@@ -20,7 +20,12 @@ dbt seed --target "$db" --full-refresh
 dbt compile --target "$db" --select hubspot # source does not compile at this time
 dbt run --target "$db" --full-refresh
 dbt test --target "$db"
-dbt run --vars '{hubspot_marketing_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
+dbt run --target "$db" --vars '{hubspot_service_enabled: true}' --full-refresh
+dbt run --target "$db" --vars '{hubspot_service_enabled: true}'
+dbt test --target "$db"
+dbt run --vars '{hubspot_service_enabled: true, hubspot_marketing_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
+dbt run --vars '{hubspot_service_enabled: true, hubspot_marketing_enabled: true, hubspot_sales_enabled: false}' --target "$db"
+dbt test --target "$db"
 dbt run --vars '{hubspot_marketing_enabled: true, hubspot_contact_merge_audit_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
 dbt run --vars '{hubspot_marketing_enabled: false, hubspot_sales_enabled: true, hubspot__pass_through_all_columns: true, hubspot_using_all_email_events: false}' --target "$db" --full-refresh
 dbt test --target "$db"

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,11 @@
+# dbt_hubspot v0.14.0
+
+## New Model Alert 😮
+Introducing Service end models! These are disabled by default but can be enabled by setting `hubspot_service_enabled` to `true` ([PR #123](https://github.com/fivetran/dbt_hubspot/pull/123)):
+  - `hubspot__tickets` - [Docs](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__tickets)
+  - `hubspot__daily_ticket_history` - [Docs](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__daily_ticket_history)
+    - See additional configurations for the history model in [README](https://github.com/fivetran/dbt_hubspot/tree/main#daily-ticket-history)
+
 # dbt_hubspot v0.13.0
 ## 🚨 Breaking Changes 🚨
 - This release will be a breaking change due to the removal of below dependencies.

diff --git a/README.md b/README.md
@@ -35,6 +35,8 @@ The following table provides a detailed list of all models materialized within t
 | [hubspot__deals](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deals)            | Each record represents a deal in Hubspot, enriched with metrics about engagement activities.                         |
 | [hubspot__deal_stages](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_stages)            | Each record represents a deal stage in Hubspot, enriched with metrics deal activities.                         |
 | [hubspot__deal_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__deal_history)    | Each record represents a change to a deal in Hubspot, with `valid_to` and `valid_from` information.                  |
+| [hubspot__tickets](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__tickets)    | Each record represents a ticket in Hubspot, enriched with metrics about engagement activities and information on associated deals, contacts, companies, and owners.                  |
+| [hubspot__daily_ticket_history](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__daily_ticket_history)    | Each record represents a ticket's day in Hubspot with tracked properties pivoted out into columns.               |
 | [hubspot__email_campaigns](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__email_campaigns) | Each record represents a email campaign in Hubspot, enriched with metrics about email activities.                    |
 | [hubspot__email_event_*](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__email_event_bounce)   | Each record represents an email event in Hubspot, joined with relevant tables to make them analysis-ready.           |
 | [hubspot__email_sends](https://fivetran.github.io/dbt_hubspot/#!/model/model.hubspot.hubspot__email_sends)     | Each record represents a sent email in Hubspot, enriched with metrics about opens, clicks, and other email activity. |
@@ -57,6 +59,14 @@ dispatch:
     search_order: ['spark_utils', 'dbt_utils']
 ```
 
+### Database Incremental Strategies 
+Some of the models (`+hubspot__daily_ticket_history`) in this package are materialized incrementally. We have chosen `insert_overwrite` as the default strategy for **BigQuery** and **Databricks** databases, as it is only available for these dbt adapters. For **Snowflake**, **Redshift**, and **Postgres** databases, we have chosen `delete+insert` as the default strategy.
+
+`insert_overwrite` is our preferred incremental strategy because it will be able to properly handle updates to records that exist outside the immediate incremental window. That is, because it leverages partitions, `insert_overwrite` will appropriately update existing rows that have been changed upstream instead of inserting duplicates of them--all without requiring a full table scan.
+
+`delete+insert` is our second-choice as it resembles `insert_overwrite` but lacks partitions. This strategy works most of the time and appropriately handles incremental loads that do not contain changes to past records. However, if a past record has been updated and is outside of the incremental window, `delete+insert` will insert a duplicate record. 😱
+> Because of this, we highly recommend that **Snowflake**, **Redshift**, and **Postgres** users periodically run a `--full-refresh` to ensure a high level of data quality and remove any possible duplicates.
+
 ## Step 2: Install the package
 Include the following hubspot package version in your `packages.yml` file:
 > TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
@@ -133,11 +143,10 @@ vars:
   hubspot_owner_enabled: false
 
   # Service
-  hubspot_service_enabled: true                           # Enables all service models
-  hubspot_ticket_deal_enabled: true
+  hubspot_service_enabled: true                           # Enables all service/ticket models. Default = false
+  hubspot_ticket_deal_enabled: true                       # Default = false
 ```
 ## (Optional) Step 5: Additional configurations
-<details><summary>Expand for configurations</summary>
 
 ### Configure email metrics
 This package allows you to specify which email metrics (total count and total unique count) you would like to be calculated for specified fields within the `hubspot__email_campaigns` model. By default, the `email_metrics` variable below includes all the shown fields. If you would like to remove any field metrics from the final model, you may copy and paste the below snippet within your root `dbt_project.yml` and remove any fields you want to be ignored in the final model.
@@ -210,6 +219,35 @@ vars:
   hubspot_using_all_email_events: false # True by default
 ```
 
+### Daily ticket history
+The `hubspot__daily_ticket_history` model is disabled by default, but will materialize if `hubspot_service_enabled` is set to `true`. See additional configurations for this model below.
+
+> **Note**: `hubspot__daily_ticket_history` and its parent intermediate models are incremental. After making any of the below configurations, you will need to run a full refresh.
+
+#### **Tracking ticket properties**
+By default, `hubspot__daily_ticket_history` will track each ticket's state, pipeline, and pipeline stage and pivot these properties into columns. However, any property from the source `TICKET_PROPERTY_HISTORY` table can be tracked and pivoted out into columns. To add other properties to this end model, add the following configuration to your `dbt_project.yml` file:
+
+```yml
+vars:
+  hubspot__ticket_property_history_columns:
+    - the
+    - list
+    - of 
+    - property
+    - names
+```
+
+#### **Extending ticket history past closing date**
+This package will create a row in `hubspot__daily_ticket_history` for each day that a ticket is open, starting at its creation date. A Hubspot ticket can be altered after being closed, so its properties can change after this date.
+
+By default, the package will track a ticket up to its closing date (or the current date if still open). To capture post-closure changes, you may want to extend a ticket's history past the close date. To do so, add the following configuration to your root dbt_project.yml file:
+
+```yml
+vars:
+  hubspot:
+    ticket_history_extension_days: integer_number_of_days # default = 0
+```
+
 ### Changing the Build Schema
 By default this package will build the HubSpot staging models within a schema titled (<target_schema> + `_stg_hubspot`) and HubSpot final models within a schema titled (<target_schema> + `hubspot`) in your target database. If this is not where you would like your modeled HubSpot data to be written to, add the following configuration to your root `dbt_project.yml` file:
 
@@ -230,7 +268,6 @@ If an individual source table has a different name than the package expects, add
 vars:
     hubspot_<default_source_table_name>_identifier: your_table_name
 ```
-</details>
 
 ## (Optional) Step 6: Orchestrate your models with Fivetran Transformations for dbt Core™
 <details><summary>Expand for details</summary>

diff --git a/dbt_project.yml b/dbt_project.yml
@@ -42,6 +42,16 @@ vars:
     engagement_meeting: "{{ ref('stg_hubspot__engagement_meeting') }}"
     engagement_note: "{{ ref('stg_hubspot__engagement_note') }}"
     engagement_task: "{{ ref('stg_hubspot__engagement_task') }}"
+
+    ticket_company: "{{ ref('stg_hubspot__ticket_company') }}"
+    ticket_contact: "{{ ref('stg_hubspot__ticket_contact') }}"
+    ticket_deal: "{{ ref('stg_hubspot__ticket_deal') }}"
+    ticket_engagement: "{{ ref('stg_hubspot__ticket_engagement') }}"
+    ticket_pipeline: "{{ ref('stg_hubspot__ticket_pipeline') }}"
+    ticket_pipeline_stage: "{{ ref('stg_hubspot__ticket_pipeline_stage') }}"
+    ticket_property_history: "{{ ref('stg_hubspot__ticket_property_history') }}"
+    ticket: "{{ ref('stg_hubspot__ticket') }}"
+
     hubspot_contact_merge_audit_enabled: false
 models:
   hubspot:

diff --git a/integration_tests/dbt_project.yml b/integration_tests/dbt_project.yml
@@ -158,6 +158,9 @@ seeds:
         created: timestamp
         obsoleted_by_created: timestamp
         sent_by_created: timestamp
+    ticket_property_history_data:
+      +column_types:
+        timestamp_instant: timestamp
 
 dispatch:
   - macro_namespace: dbt_utils

diff --git a/integration_tests/requirements.txt b/integration_tests/requirements.txt
@@ -4,4 +4,5 @@ dbt-redshift>=1.3.0,<2.0.0
 dbt-postgres>=1.3.0,<2.0.0
 dbt-spark>=1.3.0,<2.0.0
 dbt-spark[PyHive]>=1.3.0,<2.0.0
-dbt-databricks>=1.3.0,<2.0.0
+dbt-databricks>=1.3.0,<2.0.0
+oscrypto @ git+https://github.com/wbond/oscrypto.git@d5f3437
diff --git a/integration_tests/seeds/ticket_property_history_data.csv b/integration_tests/seeds/ticket_property_history_data.csv
@@ -1,12 +1,11 @@
-_fivetran_synced,ticket_id,name,source,source_id,timestamp_instant,value
-2020-07-09 11:06:21.056,1,seed name, source name, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed data is cool, source name1, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,i can make these names whatever i want, source name2, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed name1, source name3, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed name2, source name4, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed name3, source name5, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,stop being lazy with names, source name99, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed name4, source name6?, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed name5, source name7, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,seed name6, source name8, sg5923,2020-07-09 11:06:21.056,value
-2020-07-09 11:06:21.056,1,i got lazy with names, source name99, sg5923,2020-07-09 11:06:21.056,value
+_fivetran_start,name,ticket_id,_fivetran_active,_fivetran_end,_fivetran_synced,source,source_id,timestamp_instant,value
+2023-10-04 12:39:39.423000,hs_object_id,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:39.423000,123456
+2023-10-04 13:13:34.875000,hs_lastmodifieddate,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 13:13:34.875000,1696425214875
+2023-10-04 13:13:20.101000,hs_lastmodifieddate,123456,false,2023-10-04 13:13:34.874000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 13:13:20.101000,1696425200101
+2023-10-04 12:43:14.219000,hs_lastmodifieddate,123456,false,2023-10-04 13:13:20.100000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 12:43:14.219000,1696423394219
+2023-10-04 12:41:57.805000,hs_lastmodifieddate,123456,false,2023-10-04 12:43:14.218000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 12:41:57.805000,1696423317805
+2023-10-04 12:39:42.413000,hs_lastmodifieddate,123456,false,2023-10-04 12:41:57.804000,2023-10-26 13:45:13.164000,CALCULATED,,2023-10-04 12:39:42.413000,1696423182413
+2023-10-04 12:39:41.528000,hs_lastmodifieddate,123456,false,2023-10-04 12:39:42.412000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:41.528000,1696423181528
+2023-10-04 12:39:39.423000,hs_lastmodifieddate,123456,false,2023-10-04 12:39:41.527000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:39.423000,1696423179423
+2023-10-04 12:39:39.423000,hs_object_source,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:39.423000,CONVERSATIONS
+2023-10-04 12:39:41.528000,hs_thread_ids_to_restore,123456,true,9999-12-31 23:59:59.999000,2023-10-26 13:45:13.164000,CONVERSATIONS,,2023-10-04 12:39:41.528000,5523941243
diff --git a/models/service/hubspot__daily_ticket_history.sql b/models/service/hubspot__daily_ticket_history.sql
@@ -0,0 +1,144 @@
+{{
+    config(
+        enabled=var('hubspot_service_enabled', False),
+        materialized='incremental',
+        partition_by = {'field': 'date_day', 'data_type': 'date'}
+            if target.type not in ['spark', 'databricks'] else ['date_day'],
+        unique_key='ticket_day_id',
+        incremental_strategy = 'insert_overwrite' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert',
+        file_format = 'parquet'
+    )
+}}
+
+{%- set change_data_columns = adapter.get_columns_in_relation(ref('int_hubspot__scd_daily_ticket_history')) -%}
+
+with change_data as (
+
+    select *
+    from {{ ref('int_hubspot__scd_daily_ticket_history') }}
+
+{% if is_incremental() %}
+    where date_day >= (select max(date_day) from {{ this }})
+
+-- If no issue fields have been updated since the last incremental run, the pivoted_daily_history CTE will return no record/rows.
+-- When this is the case, we need to grab the most recent day's records from the previously built table so that we can persist 
+-- those values into the future.
+
+), most_recent_data as ( 
+
+    select 
+        *
+    from {{ this }}
+    where date_day = (select max(date_day) from {{ this }} )
+{% endif %}
+
+), calendar as (
+
+    select *
+    from {{ ref('int_hubspot__ticket_calendar_spine') }}
+
+    {% if is_incremental() %}
+    where date_day >= (select max(date_day) from {{ this }})
+    {% endif %}
+
+), pipeline as (
+
+    select *
+    from {{ var('ticket_pipeline')}}
+
+), pipeline_stage as (
+
+    select *
+    from {{ var('ticket_pipeline_stage')}}
+
+), joined as (
+
+    select 
+        calendar.date_day,
+        calendar.ticket_id
+        {% if is_incremental() %}    
+            {% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %} 
+            , coalesce(change_data.{{ col.name }}, most_recent_data.{{ col.name }}) as {{ col.name }}
+            {% endfor %}
+
+        {% else %}
+            {% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %} 
+            , {{ col.name }}
+            {% endfor %}
+        {% endif %}
+
+    from calendar
+    left join change_data
+        on calendar.ticket_id = change_data.ticket_id
+        and calendar.date_day = change_data.date_day
+
+    {% if is_incremental() %}
+    left join most_recent_data
+        on calendar.ticket_id = most_recent_data.ticket_id
+        and calendar.date_day = most_recent_data.date_day
+    {% endif %}
+
+), set_values as (
+
+    select 
+        date_day,
+        ticket_id
+
+        {% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
+        , {{ col.name }}
+        -- create a batch/partition once a new value is provided
+        , sum(case when joined.{{ col.name }} is null then 0 else 1 end) over (
+                partition by ticket_id
+                order by date_day rows unbounded preceding) as {{ col.name }}_partition
+        {% endfor %}
+
+    from joined
+
+), fill_values as (
+
+    select 
+        date_day,
+        ticket_id
+
+        {% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %}
+        -- grab the value that started this batch/partition
+        , first_value( {{ col.name }} ) over (
+            partition by ticket_id, {{ col.name }}_partition 
+            order by date_day asc rows between unbounded preceding and current row) as {{ col.name }}
+        {% endfor %}
+
+    from set_values
+
+), fix_null_values as (
+
+    select 
+        date_day,
+        ticket_id,
+        pipeline_stage.ticket_state,
+        pipeline.pipeline_label as hs_pipeline_label,
+        pipeline_stage.pipeline_stage_label as hs_pipeline_stage_label
+
+        {% for col in change_data_columns if col.name|lower not in ['ticket_id','date_day','id'] %} 
+        -- we de-nulled the true null values earlier in order to differentiate them from nulls that just needed to be backfilled
+        , case when  cast( {{ col.name }} as {{ dbt.type_string() }} ) = 'is_null' then null else {{ col.name }} end as {{ col.name }}
+        {% endfor %}
+
+    from fill_values
+
+    left join pipeline 
+        on cast(fill_values.hs_pipeline as {{ dbt.type_int() }}) = pipeline.ticket_pipeline_id
+    left join pipeline_stage 
+        on cast(fill_values.hs_pipeline_stage as {{ dbt.type_int() }}) = pipeline_stage.ticket_pipeline_stage_id
+        and pipeline.ticket_pipeline_id = pipeline_stage.ticket_pipeline_id
+
+), surrogate as (
+
+    select
+        {{ dbt_utils.generate_surrogate_key(['date_day','ticket_id']) }} as ticket_day_id,
+        *
+
+    from fix_null_values
+)
+
+select *
+from surrogate