Add support for altering BigQuery column types #2547

azhard · 2020-06-15T16:55:50Z

resolves #2546

Description

Implements functionality to alter BigQuery column types using dbt.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6 · 2020-06-15T19:21:43Z

I just want to clarify my understanding of the implications: dbt only uses the alter_column_type method within the expand_column_types method, which is a noop on BigQuery. Out of curiosity, do you have a specific use case for this?

In any case, I agree with your rationale (in #2546) that:

the default implementation doesn't work on BQ, so anything (even a compilation error) is better than the status quo
without BQ having a great way of implementing this, best to implement it in a SQL-first way

I'm a little uneasy that it requires replacing the entire table, but I guess that's the best of what they recommend. I'm still hopeful for support someday of alter table ... remove column (adding new type + removing old type is how we do this on other databases), or alter table ... alter column set data type ... (though this frequently has limitations when implemented).

azhard · 2020-06-15T19:27:58Z

Hey @jtcohen6, the reason I'm implementing this is because my team is planning to use dbt to handle some of our schema management and we have probably 90% of what we need already with the macros that currently exist. So considering there was an alter_column_type macro, figured it made sense to just implement it to actually work with BigQuery.

In terms of specific schema management use case, our dbt source tables get loaded through various mechanisms and schema changes get propagated through. These schema changes can then cause breaks in our downstream tables that are based on incremental loads. This doesn't mitigate that, but I've built a mini schema management tool that leverages dbt macros to allow us to fix / update schemas as needed. So for this particular macro, in the rare case where a data type got changed, we can use this macro to fix the downstream table's schema.

jtcohen6 · 2020-06-15T19:41:12Z

Got it, that sounds reasonable—and cool that you're doing so much of it with dbt macros!

If someone's going to be calling this method from an external tool, I think it's fair to say they've read through the codebase around adapter methods. I'd be in favor of adding an inline code comment, caveat usor, that quotes or paraphrases from the BQ docs:

Changing a column's data type using a query requires you to scan the entire table. The query charges can be significant if the table is very large.

azhard · 2020-06-16T00:28:53Z

Just to make sure we're on the same page, is adding that as a jinja comment to the top of the macro what you had in mind?

jtcohen6 · 2020-06-16T22:00:35Z

Yes, that's exactly what I had in mind—a quick note and link to that BQ docs page, within a Jinja comment, at the top of the bigquery__alter_column_type macro.

azhard · 2020-06-20T15:30:34Z

Hey @jtcohen6 any other changes needed?

jtcohen6

@azhard This looks great to me, thanks for adding the disclaimer. I played around with the run-operation locally. It's very cool (if a bit frightening) to be replacing the entire table in an atomic create or replace statement just to change one column's type.

@beckjake I defer to you on any code style. Also, some unexpected test errors (Redshift on Circle, Postgres on Azure).

beckjake · 2020-06-22T17:46:21Z

Looks like the redshift tests triggered a redshift bug, I'm just re-running that from failed. The postgres tests look like an issue I fixed a while ago, but this PR is pretty old so I assume it just hasn't pulled in the fix.

I'm perfectly comfortable merging this in as it is once the redshift test passes. Nothing this PR changes should care about the OS.

beckjake

The code all looks good to me

Add support for altering BigQuery column types

a7a73df

cla-bot bot added the cla:yes label Jun 15, 2020

Added warning comment to alter macro

769e4d5

jtcohen6 reviewed Jun 22, 2020

View reviewed changes

beckjake approved these changes Jun 22, 2020

View reviewed changes

jtcohen6 approved these changes Jun 22, 2020

View reviewed changes

beckjake merged commit 8f649f5 into dbt-labs:dev/marian-anderson Jun 22, 2020

azhard deleted the bigquery-alter-column-type branch June 22, 2020 19:25

MartinNowak mentioned this pull request Nov 6, 2020

Support field additions when updating partitioned BigQuery table #2699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for altering BigQuery column types #2547

Add support for altering BigQuery column types #2547

azhard commented Jun 15, 2020

jtcohen6 commented Jun 15, 2020

azhard commented Jun 15, 2020

jtcohen6 commented Jun 15, 2020

azhard commented Jun 16, 2020

jtcohen6 commented Jun 16, 2020

azhard commented Jun 20, 2020

jtcohen6 left a comment

beckjake commented Jun 22, 2020

beckjake left a comment

Add support for altering BigQuery column types #2547

Add support for altering BigQuery column types #2547

Conversation

azhard commented Jun 15, 2020

Description

Checklist

jtcohen6 commented Jun 15, 2020

azhard commented Jun 15, 2020

jtcohen6 commented Jun 15, 2020

azhard commented Jun 16, 2020

jtcohen6 commented Jun 16, 2020

azhard commented Jun 20, 2020

jtcohen6 left a comment

Choose a reason for hiding this comment

beckjake commented Jun 22, 2020

beckjake left a comment

Choose a reason for hiding this comment