Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery impersonate service account #15726

Open
joshk0 opened this issue Aug 17, 2022 · 10 comments
Open

BigQuery impersonate service account #15726

joshk0 opened this issue Aug 17, 2022 · 10 comments

Comments

@joshk0
Copy link

joshk0 commented Aug 17, 2022

Tell us about the problem you're trying to solve

In the BigQuery connector, I would like to use Application Default Credentials (I think this is already supported) but then use those credentials to impersonate a different service account.

Example: https://github.com/salrashid123/gcp_impersonated_credentials/blob/main/java/src/main/java/com/test/TestApp.java#L28-L30

Describe the solution you’d like

BigQuery connector should continue to accept JSON credentials for authenticating but then fall back to ADC. Then, if the configuration field for setting a "Account to impersonate" is set, we should attempt to impersonate that account before attempting to access the table.

For accessing the target table, if this feature works, then the ADC credentials will not need any access to the target BigQuery, only the impersonated account does.

This is a crucial component for using Airbyte in a multi-tenant customer environment.

Describe the alternative you’ve considered or used

We have no alternative; our security posture as a business is strengthened by using as few hardcoded credentials as possible. Airbyte's BigQuery connector is not usable for us at this time until we can use our existing service accounts in a credentialless fashion.

Additional context

https://cloud.google.com/iam/docs/impersonating-service-accounts

Are you willing to submit a PR?

It'll probably be a very long time before I'd have time to, but theoretically I am willing to.

@marcelopio
Copy link
Contributor

marcelopio commented Aug 20, 2022

Just to link, I added ADC, but not as a fallback, because that would require that the worker had access to the host environment. So the way to get ADC now is passing the json generated by gcloud auth application-default login

#14784

@marcelopio
Copy link
Contributor

I think to add the impersonation would be just to add a field with the account to be impersonated, and that is basically it, if you are ok with using the application-default login json. I can make a PR with that idea

@marcosmarxm
Copy link
Member

Thanks @marcelopio the team will review your proposal this week.

@joshk0
Copy link
Author

joshk0 commented Aug 22, 2022

I would rather not have JSON anywhere in my authentication pipeline.

The power for ADC to be attached to workloads, cloud function invocation or compute instances by the provider implicitly, rather than with a data file that can escape the system is crucial to my company's security posture.

@marcelopio
Copy link
Contributor

ADC will work on compute engines without any json provided, just not on developers machines

@joshk0
Copy link
Author

joshk0 commented Aug 22, 2022

Sure, that's fine, and JSON should be a valid option for dev machines. But I'd like to use better practices in prod.

@marcelopio
Copy link
Contributor

Nice, then draft solution should be fine!

@joshk0
Copy link
Author

joshk0 commented Aug 22, 2022

Got it - I misunderstood your sentence back there then. Thanks!

@marcelopio
Copy link
Contributor

marcelopio commented Aug 22, 2022

My fault, I didn't explain the whole problem with ADC.

Google ADC implementation has a lot of steps and what everyone generally assumes is the use of GOOGLE_APPLICATION_CREDENTIALS. That won't work with Airbyte unfortunately, but if you are on Google environment that exposes the credentials via API it should work.

This service account might be a default service account provided by Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, or Cloud Functions.

It will also work on Cloud Shell

@grishick grishick added the team/destinations Destinations team's backlog label Sep 27, 2022
@grishick
Copy link
Contributor

@marcelopio I took your PR, added a test for impersonation and pushed to this PR: #20788

@sherifnada sherifnada added team/db-dw-sources Backlog for Database and Data Warehouse Sources team and removed team/extensibility labels Jan 8, 2023
@bleonard bleonard removed the team/db-dw-sources Backlog for Database and Data Warehouse Sources team label Jan 13, 2023
@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants