Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Source not working as expected when using union feature and default single schema not present #22

Closed
1 of 4 tasks
fivetran-joemarkiewicz opened this issue Feb 6, 2025 · 0 comments · Fixed by #25
Assignees

Comments

@fivetran-joemarkiewicz
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

This Issue

When testing the latest version of the package we uncovered an issue where the union data feature does not work as expected if the rag_*_schema variable is not also defined and the default schema set within the source file is not the name of one of the schemas within the rag_*_union_schemas variable. The end result is not a failure but simply an empty end model and a staging model with rows, but they're all null records. Also, there is no "Table is missing so we're creating an empty model" error message which causes extra confusion.

Recreating the Issue

You can recreate this error by following the below steps

  • Ensure you do not have a hubspot schema (default schema defined by rag_hubspot_schema) in your destination.
  • Do not set a variable for rag_hubspot_schema
  • Instead define the rag_hubspot_union_schemas variable as your single (or multiple) hubspot schemas. Such as the example below
vars:
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true
  • Run the models and see there are no warnings indicating the schemas weren't found or any other errors.
  • Review the staging and end models and see that there are either no records or records populated with all null values except the source_relation field.

I realized when digging into this that the union data macro is working as expected, which is why we don't see the CLI warning that an empty table will be populated. Instead the error is presented in the fields cte, specifically the source_columns parameter. The reason this is an issue is because the source defined here doesn't actually exist. Therefore, the result of that parameter is simply no fields ([]). As a result, when the match between the staging_columns and the source_columns is initiated, there are essentially no matches. This is why we see records in the staging model, but they're all null.

Based off this, my theory is that if we can ensure the source properly points to one of the schemas in the union variable then this should work as expected.

Possible Solution

If we can ensure the source_columns reference returns rows as expected then this will address the issue. As such, we can explore a few avenues to address this issue:

  • Ensure the default schema is at least one of the union schemas if defined.
    • I'm unsure of this approach as it's a bit of a workaround, but it will still allow proper execution.
  • Support the latest version of the union data macro (such as what we have implemented in Zendesk). I'm unsure if this will solve this issue directly, but worth the exploration.
  • Remove the fields cte and the fill_staging_columns macro.
    • This could result in more missing field errors, but still a possible solution. Albeit with downsides that must be weighed and discussed.
  • Any other ideas to ensure this works as expected. Obviously this will be addressed if the customer defines the individual sources for each union data schema. However, that's not realistic to expect a customer to do. Also, we should ensure this works regardless of that effort.

Relevant error log or model output

There actually is no error which is concerning. Instead the models succeed without issue, but the end models have no data within them. This is due to the fields cte in the staging models simply populating null values.

Expected behavior

The union schemas feature works as expected without also needing the rag_*_schema variable defined to create a placeholder source.

Possible solution

See above notes, we should explore updating the union feature model to the one most recently used in other packages.

dbt Project configurations

The below results in successful model runs, but completely empty end models.

vars:
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

However, the below results in a successful model runs and the fully populated end model with the proper results.

vars:
  rag_hubspot_schema: hubspot_testing
  rag_hubspot_union_schemas: ['hubspot_testing'] ## In this scenario the `hubspot` schema (default) also doesn't exist. 
  rag__using_zendesk: false
  rag__using_jira: false
  rag__using_hubspot: true

The above prove that it's the misconfiguration of the source that's resulting in the error.

Package versions

Latest

What database are you using dbt with?

snowflake

How are you running this dbt package?

dbt Core™

dbt Version

Latest

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance.
  • No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants