Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dbt): add use_identifiers option and avoid duplicate descriptions #3179

Merged

Conversation

remisalmon
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

This PR adds an additional option use_identifiers to the dbt configuration to use nodes identifiers instead of names if those are defined, and default back to names if not.

Previously this was achieved by setting load_schemas=False however there are cases when we want to load schemas while using node identifiers instead of names (for ex. dbt does not write column descriptions to Snowflake on views* and those need to be ingested with load_schemas=True, while using identifiers to preserve the lineage).

*see https://github.com/dbt-labs/dbt/issues/3291

Also:

  • renamed load_catalog to load_schemas in dbt.py for consistency with the config file
  • fixed a case issue when reading columns from catalog vs manifest files (names are upper case in the catalog but lower case in the manifest...)
  • added a check for node/column description and comment to be different if both are displayed

@remisalmon remisalmon changed the title fix(dbt): use_identifiers option and avoid duplicate descriptions fix(dbt): add use_identifiers option and avoid duplicate descriptions Sep 1, 2021
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 75d9969 into datahub-project:master Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants