-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PK and FK declarations for ER diagram and modelling purposes #3295
Comments
@JC-Lightfold Thanks so much for opening this issue, and for the really thoughtful comments in the original Slack thread. You raise a really good question: Even though table constraints don't do anything in most modern data warehouses (or worse—sometimes they're used but not validated!), should dbt support table constraints as a way to backport support for older schema diagramming software? I do think the answer here could take one of two forms:
I think you can have it both ways! Consider an approach where I define a macro, {% macro add_constraints() %}
{% set constraint_sql_list = [] %}
{% set column_dict = model.columns %}
{% for column_name in column_dict %}
{% set quoted_name = adapter.quote(column_name) if column_dict[column_name]['quote'] else column_name %}
{% set constraint = column_dict[column_name]['meta']['constraint'] %}
{% if constraint %}
{% set constraint_name =
this.identifier + '_' + constraint|replace(' ','_') + '_' + column_name %}
{% set constraint_sql %}
alter table {{ this }} add constraint {{ constraint_name }}
{{ constraint }} ({{ quoted_name }})
{% endset %}
{% do constraint_sql_list.append(constraint_sql) %}
{% endif %}
{% endfor %}
{% do return(constraint_sql_list | join(';\n')) %}
{% endmacro %} Then I have a model, {{ config(
materialized = 'table',
post_hook = "{{ add_constraints() }}"
) }}
select 1 as column_a I define properties for it in version: 2
models:
- name: my_model
columns:
- name: column_a
meta:
constraint: primary key And voila, dbt runs: create table "jerco"."dbt_jcohen"."my_model__dbt_tmp" as (select 1 as column_a);
alter table "jerco"."dbt_jcohen"."my_model" rename to "my_model__dbt_backup"
alter table "jerco"."dbt_jcohen"."my_model__dbt_tmp" rename to "my_model"
alter table "jerco"."dbt_jcohen"."my_model" add constraint my_model_primary_key_column_a
primary key (column_a);
commit;
drop table if exists "jerco"."dbt_jcohen"."my_model__dbt_backup" cascade; The The downside of this approach is that you still need to specify
I'm curious to hear what you think! If any of these approaches feels especially promising, we could think about adding it into dbt proper. |
Of course, there's another question at play here: Should dbt be able to create its own ERDs? It's totally possible from a conceptual perspective, and there's been a lot of discussion on it (dbt-labs/dbt-docs#84, discourse). These could be generated from
|
@jtcohen6 Totally agree than having one ERD with hundreds of models wouldn't do much. In our case we have the most important models in one folder but it might difficult to know what most people would be doing. |
@JC-Lightfold , just curious if there has been any more traction on this? I'm super interested in this functionality. Thank you @jtcohen6 for the solution above! Will be using that shortly on some customer projects. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
I definitely still think this issue is relevant for documenting data architecture |
Not being able to set primary and foregin keys is basically what's keeping us from using dbt. We have requirements to produce this kind of documentation and keep it up to date, and no-one wants to do this by hand on the side that is not automatically updated. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
DBT needs to be able to play nice with data model diagrams!
If you point any diagramming tool at schema that DBT produces, it's not going to be able to draw connections between tables, because they all rely on the use of the CONSTRAINT DDL syntax, regardless of whether or not its enforced (this is the reason Snowflake has the syntax but doesn't enforce it). Wouldn't it be great to simply be able to declare PK and FK constraints in the model YML, which seems the EXACT place to do it?!
You can currently attempt to declare the necessary DDL with post-hooks and macros, but it's a lot of work for each and every column and table.
Being able to diagrammise structures created and maintained by DBT with more than simply the lineage of the DAG generator is critical to meeting many enterprise governance requirements that are used to using ERDs to communicate logical models with stakeholders.
Likewise, given the HUGE potential for DBT to basically become the "Universal Data Transformer" for MPP systems like Snowflake, it seems overly limited to expect that it will only ever define transient/ephemeral tables that are slices of a DAG on their way to a final analytics transformation. Don't we want to use DBT to build data vaults? Star schemas? 3NF models of any kind?
If the answer is yes, then people will expect to be able to understand the relationship model of the schemas DBT is persisting. That means diagrams using protocols like IDEF1X or crow's feet.
Recommended implementation patterns:
An option is to simply annotate PK/FK status and linkages as the COMMENT field in all databases (especially Snowflake). This will still require custom parsing solutions to convert into reliable diagrams, but it's a start
I think the preferred implementation is to allow the model parameters declaring fields in the model YAML to declare PK and FK CONSTRAINTS and pass that through to the DDL compiled
This issue is raised specifically with regards to how it can support standard model diagrams using ER schema, but it connects well to the following issues:
#2936
#2191
#1570
The text was updated successfully, but these errors were encountered: