-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update external table columns #252
update external table columns #252
Conversation
@jeremyyeo do you have any idea why CI fails on snowflake and redshift? |
@jeremyyeo CI is broken see the PR I created without any changes: |
@dataders I saw you merge another pr, could you also take a look at this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first, from-the-hip thoughts
- this builds off of a niche BigQuery feature which allows persisting column-level metadata beyond the standard
description
field such as policy tags. - I'm fuzzy on (and skeptical of) Core's implementation (Added support for setting policy tags for columns in BigQuery dbt-core#2589)
- Hard for me to judge this PR's implementation because it is so closely tied to that of
dbt-bigquery
.
questions for @thomas-vl
- what does this PR do? I cannot tell the problem it is solving. please open an issue to give context then link it to this PR.
- additionally, can you describe exactly how this PR works? Am I correct that the operation happens not within the creation of a single external table, but all have been already staged?
- would another warehouse ever want/need to make use of your proposed
update_external_table_columns
? still don't understand the use case, asadapter.update_columns
is not used by other adapters. should it be? - is there info missing from
docs.getdbt.com
's Big Query configuration page section on Tags about how/when these tags get persisted? - extra credit: rebase this PR to be a single commit
{% set update_columns = dbt_external_tables.update_external_table_columns(node) %} | ||
{{ update_columns }} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is my understanding that this operation should happen after (and separated from) the actual staging of the tables?
- If
update_external_table_columns
returns an empty string (as is the case with thedefault__
version) then it is a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes, it just updates the schema with operations that can only be done via the API, see:
https://github.com/dbt-labs/dbt-bigquery/blob/6c0afe4cfb69761dada5d16150fe632b8f72bf39/dbt/adapters/bigquery/impl.py#L609
It adds descriptions and policyTags effectively creating the same behaviour as when you create a normal model. -
So I added the default__update_external_table_columns because I thought that this was how you should implement a macro that is BigQuery specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps i was looking for something more conventional (for this repo/ dbt macros) like
- fetching config,
- if config is not empty, then do the thingy
{% set update_columns = dbt_external_tables.update_external_table_columns(node) %} | |
{{ update_columns }} | |
{% set update_columns = dbt_external_tables.update_external_table_columns(node) %} | |
{%- if update_columns -%} | |
{{ update_columns }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @thomas-vl did some more thinking at wrote up an answer on #263. The feature is hard for me to rationalize at this point. Fortunately your implementation was a clue that eventually clued me into the bigger picture. For that reason, I'm going to close this PR and instead welcome more discussion on the issue itself and the larger discussion about the future of external tables
@thomas-vl thinking about this again. Ultimately, when External Tables are in Core, implementing your proposed change will be both easier and more robust. However that is still a ways away. I'm comfortable accepting this PR given that:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description & motivation
resolves: #263
Descriptions and policyTags on columns where not propagated to the table schema of an external table on BigQuery.
Checklist