Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Employ base tables to resolve unioning-null issue #25

Merged
merged 17 commits into from
Mar 3, 2025
Merged
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,32 @@
# dbt_unified_rag v0.1.0-a7
This release introduces the following updates that **require a full refresh**.

## Bug Fixes
- Fixed an issue in which [unioned](https://github.com/fivetran/dbt_unified_rag?tab=readme-ov-file#union-multiple-connections) source connections were producing null models. ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25))
- The solution required the addition of a base staging model layer. For each staging model, there is a `*_base` counterpart in which we are running our `union_data` macro. This framework is necessary to the cooperation of our unioning and column-filling macros, which ensure the models do not fail if you are missing an expected column.
- For each connector type, this adds:
- **10 more models if Hubspot is enabled**
- **5 more models if Jira is enabled**
- **3 more models if Zendsk is enabled**
- Updated `stg_rag_hubspot__owner` to correctly find columns from the owner source. Previously, this erroneously looked at the columns from the HubSpot `contact` table. ([#23](https://github.com/fivetran/dbt_unified_rag/pull/23))

## Feature Updates
- Adjusted joins to persist records without any comments to each document model ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25)). This may increase the volume of data in each model:
- `rag_hubspot__document`: HubSpot deals without comments are now included.
- `rag_jira__document`: Jira issues without comments are now included.
- `rag_zendesk__document`: Zendesk tickets without comments are now included.
- `rag__unified_document`: Includes all of the above.
- For each record without any comments, the `most_recent_chunk_update` and `update_date` fields will reflect the deal/issue/ticket creation date. The `chunk_index` and `chunk_tokens_approximate` fields will be `0`. ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25))

## Under the Hood
- Added the `created_on` field to the following intermediate models to support the above inclusion of comment-less document records. ([#25](https://github.com/fivetran/dbt_unified_rag/pull/25))
- `int_rag_hubspot__deal_document`
- `int_rag_jira__issue_document`
- `int_rag_zendesk__ticket_document`

## Contributors
- [@levonkorganyan](https://github.com/JustMaris) ([#23](https://github.com/fivetran/dbt_unified_rag/pull/23))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-jamie Since Levon contributed this PRto this release, we should probably add him as a contributor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

# dbt_unified_rag v0.1.0-a6

## Bug Fixes (requires `--full-refresh`)
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright © 2025 Fivetran Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ Each Quickstart transformation job run materializes the following model counts f

| **Connector** | **Model Count** |
| ------------- | --------------- |
| HubSpot | 11 |
| Jira | 6 |
| Zendesk | 4 |
| HubSpot | 21 |
| Jira | 11 |
| Zendesk | 7 |
| (Combined) | 1 |

<!--section-end-->
Expand All @@ -58,7 +58,7 @@ Include the following package_display_name package version in your `packages.yml
```yml
packages:
- package: fivetran/unified_rag
version: 0.1.0-a6
version: 0.1.0-a7
```

### Step 3: Define database and schema variables
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

274 changes: 211 additions & 63 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 8
databricks:
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: rag_integration_tests_5
schema: rag_integration_tests_04
threads: 2
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
4 changes: 2 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ models:
+schema: "unified_rag_{{ var('directed_schema','dev') }}"

vars:
consistency_test_exclude_fields: ['title'] # for now
consistency_test_exclude_fields: []

rag_hubspot_schema: "rag_integration_tests_04"
rag_zendesk_schema: "rag_integration_tests_04"
rag_jira_schema: "rag_integration_tests_04"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ engagement_markdown as (
title,
source_relation,
url_reference,
created_on,
cast( {{ dbt.concat([
"'Deal Name : '", "title", "'\\n\\n'",
"'Created By : '", "contact_name", "' ('", "created_by", "')\\n'",
Expand Down
1 change: 1 addition & 0 deletions models/intermediate/jira/int_rag_jira__issue_document.sql
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ final as (
title,
source_relation,
url_reference,
created_on,
{{ dbt.concat([
"'# issue : '", "title", "'\\n\\n'",
"'Created By : '", "user_name", "' ('", "created_by", "')\\n'",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ with tickets as (
title,
source_relation,
url_reference,
created_on,
{{ dbt.concat([
"'# Ticket : '", "title", "'\\n\\n'",
"'Created By : '", "user_name", "' ('", "created_by", "')\\n'",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='company',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_company',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='contact',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_contact',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
14 changes: 14 additions & 0 deletions models/staging/hubspot_staging/base/stg_rag_hubspot__deal_base.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='deal',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_deal',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_company',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_company',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_contact',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_contact',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_deal',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_deal',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_email',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_email',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='engagement_note',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_engagement_note',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

{{
fivetran_utils.union_data(
table_identifier='owner',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_owner',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
16 changes: 3 additions & 13 deletions models/staging/hubspot_staging/stg_rag_hubspot__company.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,16 @@

with base as (

{{
fivetran_utils.union_data(
table_identifier='company',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_company',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
select *
from {{ ref('stg_rag_hubspot__company_base') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(source('rag_hubspot','company')),
source_columns=adapter.get_columns_in_relation(ref('stg_rag_hubspot__company_base')),
staging_columns=get_hubspot_company_columns()
)
}}
Expand Down
18 changes: 4 additions & 14 deletions models/staging/hubspot_staging/stg_rag_hubspot__contact.sql
Original file line number Diff line number Diff line change
@@ -1,27 +1,17 @@
{{ config(enabled=var('rag__using_hubspot', True)) }}

with base as (

{{
fivetran_utils.union_data(
table_identifier='contact',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_contact',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}

select *
from {{ ref('stg_rag_hubspot__contact_base') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(source('rag_hubspot','contact')),
source_columns=adapter.get_columns_in_relation(ref('stg_rag_hubspot__contact_base')),
staging_columns=get_hubspot_contact_columns()
)
}}
Expand Down
16 changes: 3 additions & 13 deletions models/staging/hubspot_staging/stg_rag_hubspot__deal.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,16 @@

with base as (

{{
fivetran_utils.union_data(
table_identifier='deal',
database_variable='rag_hubspot_database',
schema_variable='rag_hubspot_schema',
default_database=target.database,
default_schema='rag_hubspot',
default_variable='hubspot_deal',
union_schema_variable='rag_hubspot_union_schemas',
union_database_variable='rag_hubspot_union_databases'
)
}}
select *
from {{ ref('stg_rag_hubspot__deal_base') }}
),

fields as (

select
{{
fivetran_utils.fill_staging_columns(
source_columns=adapter.get_columns_in_relation(source('rag_hubspot','deal')),
source_columns=adapter.get_columns_in_relation(ref('stg_rag_hubspot__deal_base')),
staging_columns=get_hubspot_deal_columns()
)
}}
Expand Down
Loading