Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support field additions when updating partitioned BigQuery table #2699

Closed
MartinNowak opened this issue Aug 13, 2020 · 2 comments
Closed

Support field additions when updating partitioned BigQuery table #2699

MartinNowak opened this issue Aug 13, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@MartinNowak
Copy link

Describe the feature

Using date-partitioned tables requires an identical schema for all partitions. Dbt should allow to configure the BigQuery-specific schema_update_options to support field addition.

Describe alternatives you've considered

Automatic handling of field addition and removal might be an alternative, but requires careful handling of the existing data.

Additional context

BigQuery-specific

Who will this benefit?

Will be useful to evolve schema of regularly updated table.

Are you interested in contributing this feature?

It's quite deeply nested, so not sure how to best get config options there.

https://github.com/fishtown-analytics/dbt/blob/21a3462798fce4aea0530df595a8faa4828782dc/plugins/bigquery/dbt/adapters/bigquery/impl.py#L474-L479
https://github.com/fishtown-analytics/dbt/blob/1bd82d4914fd80fcc6fe17140e46554ad677eab0/plugins/bigquery/dbt/adapters/bigquery/connections.py#L348-L353
https://github.com/fishtown-analytics/dbt/blob/1bd82d4914fd80fcc6fe17140e46554ad677eab0/plugins/bigquery/dbt/adapters/bigquery/connections.py#L323
https://github.com/fishtown-analytics/dbt/blob/1bd82d4914fd80fcc6fe17140e46554ad677eab0/plugins/bigquery/dbt/adapters/bigquery/connections.py#L338-L339

@MartinNowak MartinNowak added enhancement New feature or request triage labels Aug 13, 2020
@jtcohen6 jtcohen6 removed the triage label Aug 13, 2020
@jtcohen6
Copy link
Contributor

Thanks for the writeup @MartinNowak!

The underlying issue you're getting at is a lively, long-lived one: incremental models being able to detect, handle, or simply warn on changes to columns (#1132). That issue hasn't seen a lot of buzz recently, but I know it's always on people's minds.

It sounds from the docs you linked like BQ has mechanisms for doing this automatically, though those mechanisms are limited to LoadJob, i.e. ingestion-time partitioned tables. I don't think we want to add more specific functionality around ingestion-time partitioning; we're more likely to remove dbt support for them entirely (https://github.com/fishtown-analytics/dbt/issues/2332). As we understand it, it's an older feature. By comparison, there's more exciting development around column-based partitioning (date/timestamp and integer range). The syntax is a bit nicer, and they're more conducive to idempotent data modeling.

The docs you link are quite compelling. What would really excite me is BigQuery adding support for schema evolution within merge operations, along the lines of this relatively new Delta feature.

At that point, if there's a database config we can allow users to flip on, I'd be happy to do it. Until then, I think we'll need a generic solution that gives incremental models greater capabilities around schema change detection. I'm going to close this in acknowledgment of our eventual plans to deprecate support for ingestion-partitioned tables. That said, I'm still open to hearing your thoughts here.

@MartinNowak
Copy link
Author

The options for DDL/DML basically seems to be ALTER TABLE ADD COLUMN or calling the API.
https://cloud.google.com/bigquery/docs/managing-table-schemas

Related change #2547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants