Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support KMS encryption for BigQuery #1829

Closed
kconvey opened this issue Oct 11, 2019 · 3 comments · Fixed by #1851
Closed

Support KMS encryption for BigQuery #1829

kconvey opened this issue Oct 11, 2019 · 3 comments · Fixed by #1851
Labels
bigquery enhancement New feature or request

Comments

@kconvey
Copy link
Contributor

kconvey commented Oct 11, 2019

Describe the feature

Allow for KMS encryption to be specified in project configuration. Ideally this would be done in dbt_projects.yml like:


models:
  fishtown_analytics:
  kms_key_name: 'some_kms_key'
  
    events:
      kms_key_name: 'some_other_key' 

but may also exist in a model's {{ config() }}

Describe alternatives you've considered

It may be possible to do this as a post-processing step outside of dbt, or in a post-hook (although I am not aware of any way to specify KMS key in sql).

Additional context

This feature is specific to BigQuery as it currently stands but may generalize to other databases.

Who will this benefit?

Anyone who uses BigQuery and may need to encrypt their data with cloud KMS keys (as opposed to the default encryption BigQuery already performs).

@kconvey kconvey added enhancement New feature or request triage labels Oct 11, 2019
@drewbanin drewbanin added bigquery and removed triage labels Oct 14, 2019
@drewbanin
Copy link
Contributor

Really neat idea @kconvey!

It looks to me like a kms_key_name option can be provided to a create table as statement in BigQuery. That would be a good mechanism for dbt to use here I think.

It appears to me that the kms_key_name will also need to be supplied when querying this data too. I think that might make it challenging for dbt to use different KMS keys for different models. What happens if you write a query like:

select *
from my_dataset.my_first_model
left join my_dataset.my_second_model using (id)

If my_first_model and my_second_model use different KMS keys, which key do you use?

I haven't use KMS on BQ before - if you have insight into how this works in practice, I'd love to hear it!

@heisencoder
Copy link
Contributor

As a quick note on querying, the KMS key name is not needed when reading tables. See https://cloud.google.com/bigquery/docs/customer-managed-encryption:

No special arrangements are required to query a table protected by Cloud KMS. BigQuery stores the name of the key used to encrypt the table content and will use that key when a table protected by Cloud KMS is queried.

All existing tools, the BigQuery console, and the bq command-line interface run the same way as with default-encrypted tables, as long as BigQuery has access to the Cloud KMS key used to encrypt the table content.

@kconvey
Copy link
Contributor Author

kconvey commented Nov 6, 2019

#1851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants