[Bug] Source not working as expected when using union feature and default single schema not present #22

fivetran-joemarkiewicz · 2025-02-06T15:55:35Z

Is there an existing issue for this?

I have searched the existing issues

Describe the issue

This Issue

When testing the latest version of the package we uncovered an issue where the union data feature does not work as expected if the rag_*_schema variable is not also defined and the default schema set within the source file is not the name of one of the schemas within the rag_*_union_schemas variable. The end result is not a failure but simply an empty end model and a staging model with rows, but they're all null records. Also, there is no "Table is missing so we're creating an empty model" error message which causes extra confusion.

Recreating the Issue

You can recreate this error by following the below steps

Ensure you do not have a hubspot schema (default schema defined by rag_hubspot_schema) in your destination.
Do not set a variable for rag_hubspot_schema
Instead define the rag_hubspot_union_schemas variable as your single (or multiple) hubspot schemas. Such as the example below

vars:
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

Run the models and see there are no warnings indicating the schemas weren't found or any other errors.
Review the staging and end models and see that there are either no records or records populated with all null values except the source_relation field.

I realized when digging into this that the union data macro is working as expected, which is why we don't see the CLI warning that an empty table will be populated. Instead the error is presented in the fields cte, specifically the source_columns parameter. The reason this is an issue is because the source defined here doesn't actually exist. Therefore, the result of that parameter is simply no fields ([]). As a result, when the match between the staging_columns and the source_columns is initiated, there are essentially no matches. This is why we see records in the staging model, but they're all null.

Based off this, my theory is that if we can ensure the source properly points to one of the schemas in the union variable then this should work as expected.

Possible Solution

If we can ensure the source_columns reference returns rows as expected then this will address the issue. As such, we can explore a few avenues to address this issue:

Ensure the default schema is at least one of the union schemas if defined.
- I'm unsure of this approach as it's a bit of a workaround, but it will still allow proper execution.
Support the latest version of the union data macro (such as what we have implemented in Zendesk). I'm unsure if this will solve this issue directly, but worth the exploration.
Remove the fields cte and the fill_staging_columns macro.
- This could result in more missing field errors, but still a possible solution. Albeit with downsides that must be weighed and discussed.
Any other ideas to ensure this works as expected. Obviously this will be addressed if the customer defines the individual sources for each union data schema. However, that's not realistic to expect a customer to do. Also, we should ensure this works regardless of that effort.

Relevant error log or model output

There actually is no error which is concerning. Instead the models succeed without issue, but the end models have no data within them. This is due to the fields cte in the staging models simply populating null values.

Expected behavior

The union schemas feature works as expected without also needing the rag_*_schema variable defined to create a placeholder source.

Possible solution

See above notes, we should explore updating the union feature model to the one most recently used in other packages.

dbt Project configurations

The below results in successful model runs, but completely empty end models.

vars:
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

However, the below results in a successful model runs and the fully populated end model with the proper results.

vars:
  rag_hubspot_schema: hubspot_testing
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

The above prove that it's the misconfiguration of the source that's resulting in the error.

Package versions

Latest

What database are you using dbt with?

snowflake

How are you running this dbt package?

dbt Core™

dbt Version

Latest

Additional Context

No response

Are you willing to open a PR to help address this issue?

Yes.
Yes, but I will need assistance.
No.

The text was updated successfully, but these errors were encountered:

fivetran-jamie self-assigned this Feb 19, 2025

fivetran-jamie mentioned this issue Feb 19, 2025

Employ base tables to resolve unioning-null issue #25

Merged

7 tasks

fivetran-jamie linked a pull request Feb 20, 2025 that will close this issue

Employ base tables to resolve unioning-null issue #25

Merged

7 tasks

fivetran-jamie closed this as completed in #25 Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Source not working as expected when using union feature and default single schema not present #22

[Bug] Source not working as expected when using union feature and default single schema not present #22

fivetran-joemarkiewicz commented Feb 6, 2025

[Bug] Source not working as expected when using union feature and default single schema not present #22

[Bug] Source not working as expected when using union feature and default single schema not present #22

Comments

fivetran-joemarkiewicz commented Feb 6, 2025

Is there an existing issue for this?

Describe the issue

This Issue

Recreating the Issue

Possible Solution

Relevant error log or model output

Expected behavior

Possible solution

dbt Project configurations

Package versions

What database are you using dbt with?

How are you running this dbt package?

dbt Version

Additional Context

Are you willing to open a PR to help address this issue?