Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework sources to be dbt models rather than manually created #188

Merged
merged 19 commits into from
Sep 13, 2022

Conversation

jaypeedevlin
Copy link
Contributor

@jaypeedevlin jaypeedevlin commented Aug 31, 2022

Closes #153, #190 and unblocks #175, #177

This PR replaces our manual schema generation macros with incremental models with on_schema_change configuration designed to make schema evolution more painfree.

Of note, it bumps the require-dbt-version to >= 1.2.0.


Additionally this removes the dependency on dbt_utils by using a combination of the new cross DB macros in core >=1.2.0 and a copy/paste of the surrogate_key macro.

@jaypeedevlin jaypeedevlin requested a review from NiallRees August 31, 2022 02:24
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:24 Failure
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:24 Failure
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:24 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:24 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:36 Inactive
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:36 Failure
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:36 Inactive
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:36 Failure
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:42 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:42 Inactive
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:42 Failure
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:42 Failure
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:48 Failure
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:48 Failure
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:48 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:48 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:54 Inactive
@jaypeedevlin jaypeedevlin had a problem deploying to Approve Integration Tests August 31, 2022 02:54 Failure
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:54 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests August 31, 2022 02:54 Inactive
README.md Outdated Show resolved Hide resolved
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 31, 2022 08:13 Inactive
@NiallRees NiallRees had a problem deploying to Approve Integration Tests September 2, 2022 06:50 Failure
macros/surrogate_key.sql Outdated Show resolved Hide resolved
@NiallRees NiallRees temporarily deployed to Approve Integration Tests September 2, 2022 06:51 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests September 2, 2022 06:51 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests September 2, 2022 06:51 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests September 2, 2022 06:51 Inactive
README.md Outdated Show resolved Hide resolved
@NiallRees NiallRees had a problem deploying to Approve Integration Tests September 6, 2022 10:10 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests September 6, 2022 10:10 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests September 6, 2022 10:10 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests September 6, 2022 10:10 Failure
cast(null as {{ type_string() }}) as name,
cast(null as {{ type_string() }}) as identifier,
cast(null as {{ type_string() }}) as loaded_at_field,
cast(null as {{ type_json() }}) as freshness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an ARRAY type before for Snowflake which is currently leading to errors when I do:
dbt run -m dbt_artifacts

when I install this branch into a project which already is using dbt_artifacts:

10:11:48  Database Error in model sources (models/sources/sources.sql)
10:11:48    002023 (22000): SQL compilation error:
10:11:48    Expression type does not match column data type, expecting ARRAY but got OBJECT for column FRESHNESS
10:11:48    compiled SQL at target/run/dbt_artifacts/models/sources/sources.sql`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this will be tricky to solve in a way that's compatible to both Snowflake and Bigquery I think, given that one uses JSON and the other ARRAY

{% macro snowflake__get_create_sources_table_if_not_exists_statement(database_name, schema_name, table_name) -%}
create table {{database_name}}.{{schema_name}}.{{table_name}} (
command_invocation_id STRING,
node_id STRING,
run_started_at TIMESTAMP_TZ,
database STRING,
schema STRING,
source_name STRING,
loader STRING,
name STRING,
identifier STRING,
loaded_at_field STRING,
freshness ARRAY
)
{%- endmacro %}
{% macro bigquery__get_create_sources_table_if_not_exists_statement(database_name, schema_name, table_name) -%}
create table {{database_name}}.{{schema_name}}.{{table_name}} (
command_invocation_id STRING,
node_id STRING,
run_started_at TIMESTAMP,
database STRING,
schema STRING,
source_name STRING,
loader STRING,
name STRING,
identifier STRING,
loaded_at_field STRING,
freshness JSON
)
{%- endmacro %}

One option would be to create a specific type helper for just this column, but that seems a bit suboptimal IMO. Do you have any ideas @NiallRees?

Copy link
Contributor Author

@jaypeedevlin jaypeedevlin Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could offer a one-time migration macro for the source tables and cut a new major version with this new method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I'd be inclined to use a specific type helper (just doing an if snowflake else in the model vs a macro) rather than adding more complexity to the migration process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NiallRees done and ready for another test.

@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 7, 2022 23:38 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 7, 2022 23:38 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 7, 2022 23:38 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 7, 2022 23:38 Inactive
Copy link
Contributor

@NiallRees NiallRees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested successfully on Snowflake, BigQuery and Databricks when tables already existed.

@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 10, 2022 20:12 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 10, 2022 20:12 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 10, 2022 20:12 Inactive
@jaypeedevlin jaypeedevlin temporarily deployed to Approve Integration Tests September 10, 2022 20:12 Inactive
@jaypeedevlin jaypeedevlin merged commit f9fe8ec into main Sep 13, 2022
@jaypeedevlin jaypeedevlin deleted the JD/rework_sources branch September 13, 2022 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for dynamic schema name generation for artifacts
3 participants