Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GCP OAuth scopes configurable via pipeline options. #23644

Merged
merged 3 commits into from
Oct 15, 2022

Conversation

lukecwik
Copy link
Member

This allows users to limit scopes dependent on their pipeline.

fixes #23290


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@lukecwik
Copy link
Member Author

This allows users to limit scopes dependent on their pipeline.

fixes apache#23290
@codecov
Copy link

codecov bot commented Oct 14, 2022

Codecov Report

Merging #23644 (1d6a0e0) into master (45cc085) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #23644   +/-   ##
=======================================
  Coverage   73.33%   73.33%           
=======================================
  Files         719      719           
  Lines       95795    95794    -1     
=======================================
+ Hits        70248    70252    +4     
+ Misses      24236    24231    -5     
  Partials     1311     1311           
Flag Coverage Δ
python 83.05% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/internal/gcp/auth.py 82.85% <100.00%> (+4.19%) ⬆️
sdks/python/apache_beam/io/gcp/bigquery_tools.py 73.34% <100.00%> (+0.03%) ⬆️
...dks/python/apache_beam/options/pipeline_options.py 94.39% <100.00%> (+0.02%) ⬆️
...ks/python/apache_beam/runners/interactive/utils.py 95.08% <100.00%> (+0.02%) ⬆️
.../python/apache_beam/transforms/periodicsequence.py 98.38% <0.00%> (-1.62%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.54% <0.00%> (+0.12%) ⬆️
...ks/python/apache_beam/runners/worker/sdk_worker.py 89.24% <0.00%> (+0.16%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@@ -169,7 +161,7 @@ def _get_service_credentials(pipeline_options):
return None

@staticmethod
def _add_impersonation_credentials(credentials, pipeline_options):
def _add_impersonation_credentials(credentials, scopes, pipeline_options):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we add a param instead of retrieving from pipeline_options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I also dropped the logic here since PipelineOptions will now always be passed in.

Drop the non pipelineoptions routes in _add_impersonation_...
@tvalentyn
Copy link
Contributor

Thank you

@lukecwik
Copy link
Member Author

Run Java PreCommit

@ricardograca-scratch
Copy link

@lukecwik Can you explain how this works? I upgraded Beam to version to 2.43.0, added the following:

pipeline_args = [
  ...,
  '--gcp_oauth_scope',
  'https://www.googleapis.com/auth/drive'
]
pipeline_options = PipelineOptions(flags=pipeline_args)
pipeline = beam.Pipeline(options=pipeline_options)

But still get

Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.

@lukecwik
Copy link
Member Author

lukecwik commented Jan 5, 2023

It looks like the error your getting is from BigQuery trying to access Drive on your behalf and not from Dataflow trying to access Drive. How are you trying to get BigQuery to access Drive from Dataflow (is it via your own code or is it via cross language IO connector or ...)?

@ricardograca-scratch
Copy link

ricardograca-scratch commented Jan 6, 2023

I'm not using Dataflow. This job is running as a cronjob in Kubernetes and the credentials come from a service account. I can confirm the service account has the required role to access the BigQuery table, and the underlying Google Drive file is shared with the service account email. This same setup used to work just fine with my patched version of Beam that added the needed Auth scope.

Also worth noting that the error message I mentioned is what I get when the Drive Auth scope is missing as reported in the previous issue.

Update: I forked the latest 2.43.0 release, hard-coded the needed auth scope in the OAUTH_SCOPES variable and I can now access the mentioned table just fine. It seems this feature isn't working as intended, or I don't know how to pass arguments to the pipeline. I'll try to debug the issue to see what could be wrong.

@tvalentyn
Copy link
Contributor

note that when setting --gcp_oauth_scope you need to list all the scopes, not just the additional drive scope.

@ricardograca-scratch
Copy link

According to the official documentation that is not the case when using the append option. It should just append the passed in scope to the default list, or am I missing something?

@tvalentyn
Copy link
Contributor

I stand corrected, you don't need to repeat the default options.

I'll try to debug the issue to see what could be wrong.

I am not sure what is going on then. I would print the content of the defined scopes and A/B test both cases.

@tvalentyn
Copy link
Contributor

I tried supplying the option on the command line and can see that it is being populated correctly, but don't have a readily-available pipeline that I can use to meaningfully test a different scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Add Google Drive scope to GCP Auth
3 participants