fix(dbt): add use_identifiers option and avoid duplicate descriptions #3179

remisalmon · 2021-09-01T20:22:22Z

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable)

This PR adds an additional option use_identifiers to the dbt configuration to use nodes identifiers instead of names if those are defined, and default back to names if not.

Previously this was achieved by setting load_schemas=False however there are cases when we want to load schemas while using node identifiers instead of names (for ex. dbt does not write column descriptions to Snowflake on views* and those need to be ingested with load_schemas=True, while using identifiers to preserve the lineage).

*see https://github.com/dbt-labs/dbt/issues/3291

Also:

renamed load_catalog to load_schemas in dbt.py for consistency with the config file
fixed a case issue when reading columns from catalog vs manifest files (names are upper case in the catalog but lower case in the manifest...)
added a check for node/column description and comment to be different if both are displayed

shirshanka

LGTM!

fix(dbt): use_identifiers option and avoid duplicate descriptions

3616c35

remisalmon changed the title ~~fix(dbt): use_identifiers option and avoid duplicate descriptions~~ fix(dbt): add use_identifiers option and avoid duplicate descriptions Sep 1, 2021

shirshanka approved these changes Sep 2, 2021

View reviewed changes

shirshanka merged commit 75d9969 into datahub-project:master Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dbt): add use_identifiers option and avoid duplicate descriptions #3179

fix(dbt): add use_identifiers option and avoid duplicate descriptions #3179

remisalmon commented Sep 1, 2021

shirshanka left a comment

fix(dbt): add use_identifiers option and avoid duplicate descriptions #3179

fix(dbt): add use_identifiers option and avoid duplicate descriptions #3179

Conversation

remisalmon commented Sep 1, 2021

Checklist

shirshanka left a comment

Choose a reason for hiding this comment