You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When testing the latest version of the package we uncovered an issue where the union data feature does not work as expected if the rag_*_schema variable is not also defined and the default schema set within the source file is not the name of one of the schemas within the rag_*_union_schemas variable. The end result is not a failure but simply an empty end model and a staging model with rows, but they're all null records. Also, there is no "Table is missing so we're creating an empty model" error message which causes extra confusion.
Recreating the Issue
You can recreate this error by following the below steps
Ensure you do not have a hubspot schema (default schema defined by rag_hubspot_schema) in your destination.
Do not set a variable for rag_hubspot_schema
Instead define the rag_hubspot_union_schemas variable as your single (or multiple) hubspot schemas. Such as the example below
vars:
rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. rag__using_zendesk: falserag__using_jira: falserag__using_hubspot: true
Run the models and see there are no warnings indicating the schemas weren't found or any other errors.
Review the staging and end models and see that there are either no records or records populated with all null values except the source_relation field.
I realized when digging into this that the union data macro is working as expected, which is why we don't see the CLI warning that an empty table will be populated. Instead the error is presented in the fields cte, specifically the source_columns parameter. The reason this is an issue is because the source defined here doesn't actually exist. Therefore, the result of that parameter is simply no fields ([]). As a result, when the match between the staging_columns and the source_columns is initiated, there are essentially no matches. This is why we see records in the staging model, but they're all null.
Based off this, my theory is that if we can ensure the source properly points to one of the schemas in the union variable then this should work as expected.
Possible Solution
If we can ensure the source_columns reference returns rows as expected then this will address the issue. As such, we can explore a few avenues to address this issue:
Ensure the default schema is at least one of the union schemas if defined.
I'm unsure of this approach as it's a bit of a workaround, but it will still allow proper execution.
Support the latest version of the union data macro (such as what we have implemented in Zendesk). I'm unsure if this will solve this issue directly, but worth the exploration.
Remove the fields cte and the fill_staging_columns macro.
This could result in more missing field errors, but still a possible solution. Albeit with downsides that must be weighed and discussed.
Any other ideas to ensure this works as expected. Obviously this will be addressed if the customer defines the individual sources for each union data schema. However, that's not realistic to expect a customer to do. Also, we should ensure this works regardless of that effort.
Relevant error log or model output
There actually is no error which is concerning. Instead the models succeed without issue, but the end models have no data within them. This is due to the fields cte in the staging models simply populating null values.
Expected behavior
The union schemas feature works as expected without also needing the rag_*_schema variable defined to create a placeholder source.
Possible solution
See above notes, we should explore updating the union feature model to the one most recently used in other packages.
dbt Project configurations
The below results in successful model runs, but completely empty end models.
vars:
rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. rag__using_zendesk: falserag__using_jira: falserag__using_hubspot: true
However, the below results in a successful model runs and the fully populated end model with the proper results.
vars:
rag_hubspot_schema: hubspot_testingrag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. rag__using_zendesk: falserag__using_jira: falserag__using_hubspot: true
The above prove that it's the misconfiguration of the source that's resulting in the error.
Package versions
Latest
What database are you using dbt with?
snowflake
How are you running this dbt package?
dbt Core™
dbt Version
Latest
Additional Context
No response
Are you willing to open a PR to help address this issue?
Yes.
Yes, but I will need assistance.
No.
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Describe the issue
This Issue
When testing the latest version of the package we uncovered an issue where the union data feature does not work as expected if the
rag_*_schema
variable is not also defined and the default schema set within the source file is not the name of one of the schemas within therag_*_union_schemas
variable. The end result is not a failure but simply an empty end model and a staging model with rows, but they're allnull
records. Also, there is no "Table is missing so we're creating an empty model" error message which causes extra confusion.Recreating the Issue
You can recreate this error by following the below steps
hubspot
schema (default schema defined byrag_hubspot_schema
) in your destination.rag_hubspot_schema
rag_hubspot_union_schemas
variable as your single (or multiple) hubspot schemas. Such as the example belowI realized when digging into this that the union data macro is working as expected, which is why we don't see the CLI warning that an empty table will be populated. Instead the error is presented in the fields cte, specifically the source_columns parameter. The reason this is an issue is because the source defined here doesn't actually exist. Therefore, the result of that parameter is simply no fields (
[]
). As a result, when the match between the staging_columns and the source_columns is initiated, there are essentially no matches. This is why we see records in the staging model, but they're all null.Based off this, my theory is that if we can ensure the source properly points to one of the schemas in the union variable then this should work as expected.
Possible Solution
If we can ensure the source_columns reference returns rows as expected then this will address the issue. As such, we can explore a few avenues to address this issue:
fields
cte and thefill_staging_columns
macro.Relevant error log or model output
There actually is no error which is concerning. Instead the models succeed without issue, but the end models have no data within them. This is due to the fields cte in the staging models simply populating null values.
Expected behavior
The union schemas feature works as expected without also needing the
rag_*_schema
variable defined to create a placeholder source.Possible solution
See above notes, we should explore updating the union feature model to the one most recently used in other packages.
dbt Project configurations
The below results in successful model runs, but completely empty end models.
However, the below results in a successful model runs and the fully populated end model with the proper results.
The above prove that it's the misconfiguration of the source that's resulting in the error.
Package versions
Latest
What database are you using dbt with?
snowflake
How are you running this dbt package?
dbt Core™
dbt Version
Latest
Additional Context
No response
Are you willing to open a PR to help address this issue?
The text was updated successfully, but these errors were encountered: