Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for altering BigQuery column types #2547

Merged
merged 2 commits into from
Jun 22, 2020
Merged

Add support for altering BigQuery column types #2547

merged 2 commits into from
Jun 22, 2020

Conversation

azhard
Copy link
Contributor

@azhard azhard commented Jun 15, 2020

resolves #2546

Description

Implements functionality to alter BigQuery column types using dbt.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 15, 2020
@jtcohen6
Copy link
Contributor

I just want to clarify my understanding of the implications: dbt only uses the alter_column_type method within the expand_column_types method, which is a noop on BigQuery. Out of curiosity, do you have a specific use case for this?

In any case, I agree with your rationale (in #2546) that:

  • the default implementation doesn't work on BQ, so anything (even a compilation error) is better than the status quo
  • without BQ having a great way of implementing this, best to implement it in a SQL-first way

I'm a little uneasy that it requires replacing the entire table, but I guess that's the best of what they recommend. I'm still hopeful for support someday of alter table ... remove column (adding new type + removing old type is how we do this on other databases), or alter table ... alter column set data type ... (though this frequently has limitations when implemented).

@azhard
Copy link
Contributor Author

azhard commented Jun 15, 2020

Hey @jtcohen6, the reason I'm implementing this is because my team is planning to use dbt to handle some of our schema management and we have probably 90% of what we need already with the macros that currently exist. So considering there was an alter_column_type macro, figured it made sense to just implement it to actually work with BigQuery.

In terms of specific schema management use case, our dbt source tables get loaded through various mechanisms and schema changes get propagated through. These schema changes can then cause breaks in our downstream tables that are based on incremental loads. This doesn't mitigate that, but I've built a mini schema management tool that leverages dbt macros to allow us to fix / update schemas as needed. So for this particular macro, in the rare case where a data type got changed, we can use this macro to fix the downstream table's schema.

@jtcohen6
Copy link
Contributor

Got it, that sounds reasonable—and cool that you're doing so much of it with dbt macros!

If someone's going to be calling this method from an external tool, I think it's fair to say they've read through the codebase around adapter methods. I'd be in favor of adding an inline code comment, caveat usor, that quotes or paraphrases from the BQ docs:

Changing a column's data type using a query requires you to scan the entire table. The query charges can be significant if the table is very large.

@azhard
Copy link
Contributor Author

azhard commented Jun 16, 2020

Just to make sure we're on the same page, is adding that as a jinja comment to the top of the macro what you had in mind?

@jtcohen6
Copy link
Contributor

Yes, that's exactly what I had in mind—a quick note and link to that BQ docs page, within a Jinja comment, at the top of the bigquery__alter_column_type macro.

@azhard
Copy link
Contributor Author

azhard commented Jun 20, 2020

Hey @jtcohen6 any other changes needed?

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azhard This looks great to me, thanks for adding the disclaimer. I played around with the run-operation locally. It's very cool (if a bit frightening) to be replacing the entire table in an atomic create or replace statement just to change one column's type.

@beckjake I defer to you on any code style. Also, some unexpected test errors (Redshift on Circle, Postgres on Azure).

@beckjake
Copy link
Contributor

Looks like the redshift tests triggered a redshift bug, I'm just re-running that from failed. The postgres tests look like an issue I fixed a while ago, but this PR is pretty old so I assume it just hasn't pulled in the fix.

I'm perfectly comfortable merging this in as it is once the redshift test passes. Nothing this PR changes should care about the OS.

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code all looks good to me

@beckjake beckjake merged commit 8f649f5 into dbt-labs:dev/marian-anderson Jun 22, 2020
@azhard azhard deleted the bigquery-alter-column-type branch June 22, 2020 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement BigQuery alter column type functionality
3 participants