Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt Constraints / model contracts #574

Merged
merged 43 commits into from
Feb 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
186df86
Add support for constraints in Spark
b-per Dec 22, 2022
9aeb7cb
Add tests for constraints
b-per Dec 22, 2022
59ef1bd
Update requirements for CI to pass
b-per Dec 22, 2022
c22d0f4
Update dispatched macro with argument
b-per Dec 22, 2022
e3f77c1
Use spark decorator for tests
b-per Dec 23, 2022
e1a9a54
Update test to remove unsupported constraints
b-per Dec 23, 2022
bf314dd
Allow multiple queries to be sent
b-per Dec 23, 2022
bc47cb6
Merge branch 'main' of github.com:dbt-labs/dbt-spark into bper/add-su…
b-per Dec 23, 2022
8d392c2
Revert change on splitting satements in `execute`
b-per Dec 23, 2022
e633f52
Add `call statement` for table with constraints
b-per Dec 23, 2022
4c961da
Add checks when the split by `;` is empty
b-per Dec 23, 2022
89a740d
Fix typo in JInja variable name
b-per Dec 23, 2022
7b92d71
Rename `constraints` to `constraints_check`
b-per Jan 13, 2023
7c141af
Merge branch 'main' of github.com:dbt-labs/dbt-spark into bper/add-su…
b-per Jan 13, 2023
4df795d
Support constraints with `alter` statements
b-per Jan 30, 2023
d1d3940
Changie entry
b-per Jan 30, 2023
d88cee7
Fix missing `endif`
b-per Jan 30, 2023
f9bec28
Remove get_columns_spec_ddl as we use alter
b-per Jan 30, 2023
e99a27f
Remove unused dispatch macro
b-per Jan 30, 2023
0595d22
Update dispatched macro
b-per Jan 30, 2023
eccdbf3
Update tests to work with `alter` approach
b-per Jan 30, 2023
f2ac49d
Make tests valid for databricks only for delta
b-per Jan 30, 2023
586ad97
Try other way to call tests
b-per Jan 30, 2023
b9b5d34
Add schema info
b-per Jan 30, 2023
c1d3d2c
Remove wrong argument to test
b-per Jan 30, 2023
2de8296
Merge branch 'main' into bper/add-support-for-constraints
sungchun12 Jan 31, 2023
794570f
Use new testing framework
b-per Feb 1, 2023
1e48bfc
Merge branch 'bper/add-support-for-constraints' of github.com:dbt-lab…
b-per Feb 1, 2023
8650a20
Add check on column names and order
b-per Feb 1, 2023
83427e7
Check only when constraints enabled
b-per Feb 1, 2023
42791be
Remove config nesting
b-per Feb 1, 2023
3460091
constraint_check is not a list
b-per Feb 1, 2023
262fe69
Fix CICD
b-per Feb 1, 2023
c9b20e4
Typo
b-per Feb 1, 2023
32e8769
Only allow not null
b-per Feb 1, 2023
5eea874
Update expected SQL to the Spark one
b-per Feb 1, 2023
e7d2280
Make file_format delta
b-per Feb 1, 2023
ceb5b02
Try this
jtcohen6 Feb 2, 2023
deb5677
Check for earlier part of error message
jtcohen6 Feb 2, 2023
269e15e
Check for any rather than all error messages
jtcohen6 Feb 2, 2023
f8fd7dd
Reset to dbt-core main
jtcohen6 Feb 15, 2023
1a291f0
Merge branch 'main' into bper/add-support-for-constraints
MichelleArk Feb 15, 2023
9762b70
Merge branch 'main' into bper/add-support-for-constraints
jtcohen6 Feb 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .changes/unreleased/Features-20230130-125855.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
kind: Features
body: 'Support for data types constraints in Spark following the dbt Core feature
#6271'
time: 2023-01-30T12:58:55.972992+01:00
custom:
Author: b-per
Issue: "558"
PR: "574"
52 changes: 52 additions & 0 deletions dbt/include/spark/macros/adapters.sql
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,9 @@
{% else %}
create table {{ relation }}
{% endif %}
{% if config.get('constraints_enabled', False) %}
{{ get_assert_columns_equivalent(sql) }}
{% endif %}
{{ file_format_clause() }}
{{ options_clause() }}
{{ partition_cols(label="partitioned by") }}
Expand All @@ -160,6 +163,55 @@
{%- endmacro -%}


{% macro persist_constraints(relation, model) %}
{{ return(adapter.dispatch('persist_constraints', 'dbt')(relation, model)) }}
{% endmacro %}

{% macro spark__persist_constraints(relation, model) %}
{% if config.get('constraints_enabled', False) and config.get('file_format', 'delta') == 'delta' %}
{% do alter_table_add_constraints(relation, model.columns) %}
{% do alter_column_set_constraints(relation, model.columns) %}
{% endif %}
{% endmacro %}

{% macro alter_table_add_constraints(relation, constraints) %}
{{ return(adapter.dispatch('alter_table_add_constraints', 'dbt')(relation, constraints)) }}
{% endmacro %}

{% macro spark__alter_table_add_constraints(relation, column_dict) %}

{% for column_name in column_dict %}
{% set constraints_check = column_dict[column_name]['constraints_check'] %}
{% if constraints_check and not is_incremental() %}
{%- set constraint_hash = local_md5(column_name ~ ";" ~ constraint_check) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: Constraint hash is a sensible default, since we need a unique identifier. We may also want to let users define their own custom name for the constraint.

{% call statement() %}
alter table {{ relation }} add constraint {{ constraint_hash }} check {{ constraints_check }};
{% endcall %}
{% endif %}
{% endfor %}
{% endmacro %}

{% macro alter_column_set_constraints(relation, column_dict) %}
{{ return(adapter.dispatch('alter_column_set_constraints', 'dbt')(relation, column_dict)) }}
{% endmacro %}

{% macro spark__alter_column_set_constraints(relation, column_dict) %}
{% for column_name in column_dict %}
{% set constraints = column_dict[column_name]['constraints'] %}
{% for constraint in constraints %}
{% if constraint != 'not null' %}
{{ exceptions.warn('Invalid constraint for column ' ~ column_name ~ '. Only `not null` is supported.') }}
{% else %}
{% set quoted_name = adapter.quote(column_name) if column_dict[column_name]['quote'] else column_name %}
{% call statement() %}
alter table {{ relation }} change column {{ quoted_name }} set {{ constraint }};
{% endcall %}
{% endif %}
{% endfor %}
{% endfor %}
{% endmacro %}


{% macro spark__create_view_as(relation, sql) -%}
create or replace view {{ relation }}
{{ comment_clause() }}
Expand Down
2 changes: 2 additions & 0 deletions dbt/include/spark/macros/materializations/table.sql
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@

{% do persist_docs(target_relation, model) %}

{% do persist_constraints(target_relation, model) %}

{{ run_hooks(post_hooks) }}

{{ return({'relations': [target_relation]})}}
Expand Down
58 changes: 58 additions & 0 deletions tests/functional/adapter/test_constraints.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import pytest
from dbt.tests.util import relation_from_name
from dbt.tests.adapter.constraints.test_constraints import (
BaseConstraintsColumnsEqual,
BaseConstraintsRuntimeEnforcement
)

# constraints are enforced via 'alter' statements that run after table creation
_expected_sql_spark = """
create or replace table {0}
using delta
as

select
1 as id,
'blue' as color,
cast('2019-01-01' as date) as date_day
"""

@pytest.mark.skip_profile('spark_session', 'apache_spark')
class TestSparkConstraintsColumnsEqual(BaseConstraintsColumnsEqual):
pass

@pytest.mark.skip_profile('spark_session', 'apache_spark')
class TestSparkConstraintsRuntimeEnforcement(BaseConstraintsRuntimeEnforcement):
@pytest.fixture(scope="class")
def project_config_update(self):
return {
"models": {
"+file_format": "delta",
}
}

@pytest.fixture(scope="class")
def expected_sql(self, project):
relation = relation_from_name(project.adapter, "my_model")
return _expected_sql_spark.format(relation)

# On Spark/Databricks, constraints are applied *after* the table is replaced.
# We don't have any way to "rollback" the table to its previous happy state.
# So the 'color' column will be updated to 'red', instead of 'blue'.
@pytest.fixture(scope="class")
def expected_color(self):
return "red"

@pytest.fixture(scope="class")
def expected_error_messages(self):
return [
"violate the new CHECK constraint",
"DELTA_NEW_CHECK_CONSTRAINT_VIOLATION",
"violate the new NOT NULL constraint",
]

def assert_expected_error_messages(self, error_message, expected_error_messages):
# This needs to be ANY instead of ALL
# The CHECK constraint is added before the NOT NULL constraint
# and different connection types display/truncate the error message in different ways...
assert any(msg in error_message for msg in expected_error_messages)