Allow impersonation of Google Cloud service accounts with BigQuery #2672
Labels
bigquery
enhancement
New feature or request
good_first_issue
Straightforward + self-contained changes, good for new contributors!
Describe the feature
I recently read this article about using the new IAM Credentials api in order to impersonate a service account by generating a OAuth2 bearer tokens that identifies as them. Adding these abilities to dbt would help increase security and allow for additional abstraction when running BigQuery queries on sensitive data. This way, we don't need to run dbt with a service account key, or worry about starting a node identified by a particular service account.
This would allow us to invert control, using service accounts to control access to different types of data, and granting access to the service accounts (the "abstraction") instead of having all of the users have IAM access to the datasets directly. This way you get the benefits of federated identity with the flexibility of service account and their keys, which are typically required for this workflow, but have many downsides as described in the article.
From an end-user perspective, I propose that dbt adds an
impersonate_service_account
option to the profiles file for the BigQuery adapter. Using this functionality would work like this:In this case, the oauth/service account/service account file types would each be compatible. Instead of using the existing Credentials() directly as is the current functionality, when
impersonate_service_account
option is present, Credentials would be generated using the new IAM Credential API (example).It seems like this would be pretty easy to send a patch for since the Google Auth library already has support for this method. There is one big difference that we may want to handle, these tokens are very short lived compared to ADC or Service Accounts. By default, these OAuth2 tokens expire after 1 hour. It is possible to increase this when requesting, but 1 hour is a soft limit, only able to be increased by adding the service account to an organization policy. This is usually only possible by GCP Org admins, so not your target audience. I think for an MVP it would be alright for this to be the fallback, but ideally there would be a way to detect the expired credentials and request a new set while dbt is running. This may already be possible, it's not clear how often new clients are requested. If a new client is used often, then this may not even be a concern since it would get a fresh token each time a job was created. In this case, we'd want to set the expiration much shorter than the default.
Describe alternatives you've considered
The only other option is to have GCE or Cloud Run instances run as service accounts. Cloud Run times out after 15 minutes though currently so that is a non-starter, and GCE isn't really designed for this purpose. There are a bunch of other downsides as well that I'm happy to go into if the value isn't obvious.
Additional context
This would be just for BigQuery users.
Who will this benefit?
It will benefit BigQuery users with many datasets, each needing to be and remain isolated. Any operation concerned with increased security would prefer to use this, and most of us are.
Are you interested in contributing this feature?
Yes, I am interested.
The text was updated successfully, but these errors were encountered: