Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass unnecessary GCS storage.buckets.get permission #516

Merged
merged 2 commits into from
Jul 3, 2020

Conversation

gelioz
Copy link
Contributor

@gelioz gelioz commented Jul 2, 2020

Motivation

smart_open.gcs.Reader class uses google.cloud.storage.Client.get_bucket method which not only returns instance of google.cloud.storage.Blob, but also performs GET request to retrieve bucket metadata.
This request requires Cloud IAM storage.buckets.get permission which isn't a part of predefined "Storage Object Viewer" IAM role and this cause unjustifiable troubles with roles management:

Traceback (most recent call last):
  File "example.py", line 3, in <module>
    smart_open.open("gs://bucket_name/file.txt")
  File "/usr/local/lib/python3.7/site-packages/smart_open/smart_open_lib.py", line 225, in open
    binary = _open_binary_stream(uri, binary_mode, transport_params)
  File "/usr/local/lib/python3.7/site-packages/smart_open/smart_open_lib.py", line 401, in _open_binary_stream
    fobj = submodule.open_uri(uri, mode, transport_params)
  File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 102, in open_uri
    return open(parsed_uri['bucket_id'], parsed_uri['blob_id'], mode, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 137, in open
    client=client,
  File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 214, in __init__
    self._blob = client.get_bucket(bucket).get_blob(key)  # type: google.cloud.storage.Blob
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 351, in get_bucket
    if_metageneration_not_match=if_metageneration_not_match,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/bucket.py", line 869, in reload
    if_metageneration_not_match=if_metageneration_not_match,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 207, in reload
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 423, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/bucket_name?projection=noAcl: account_name@proj_name.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket.

Since retrieved bucket metadata is not used at all, google.cloud.storage.Client.get_bucket method can be safely replaced with google.cloud.storage.Client.bucket method which not performs API call to bucket endpoint and returns the same google.cloud.storage.Blob instance.

Checklist

Before you create the PR, please make sure you have:

  • Picked a concise, informative and complete title
  • Clearly explained the motivation behind the PR
  • Linked to any existing issues that your PR will be solving
  • Included tests for any new functionality
  • Checked that all unit tests pass

@gelioz gelioz changed the title Avoid requirement of unnecessary GCS storage.buckets.get permission Avoid requirement of unnecessary GCS storage.buckets.get permission Jul 2, 2020
@mpenkov
Copy link
Collaborator

mpenkov commented Jul 3, 2020

Thank you for pointing this out @gelioz. @petedannemann What do you think?

@petedannemann
Copy link
Contributor

Looks great to me. Nice work @gelioz

@petedannemann
Copy link
Contributor

petedannemann commented Jul 3, 2020

It would be worth looking into if google.cloud.storage.Bucket.get_blob() has the same behavior. Maybe we should switch them both

EDIT: Looks like it is

@petedannemann
Copy link
Contributor

Never mind, I think we need the blob metadata to check if the blob exists right below that.

@mpenkov mpenkov changed the title Avoid requirement of unnecessary GCS storage.buckets.get permission Bypass unnecessary GCS storage.buckets.get permission Jul 3, 2020
@mpenkov mpenkov merged commit d08f1ca into piskvorky:develop Jul 3, 2020
@mpenkov
Copy link
Collaborator

mpenkov commented Jul 3, 2020

Merged. Congrats on your first contribution to smart_open, @gelioz 👍

@pquentin
Copy link

pquentin commented Jul 3, 2020

I was about to submit the exact same pull request, thanks @gelioz!

@mpenkov I realise you've just released 2.1.0, but do you know when the next release is planned? Thanks for your work on smart_open!

@mpenkov
Copy link
Collaborator

mpenkov commented Jul 3, 2020

We don't really have a schedule. Perhaps in a month or so, once we've accumulated some changes to release. Why do you ask @pquentin ?

@pquentin
Copy link

pquentin commented Jul 3, 2020

Thanks! I ask, because since it's in a month, that means I'll upload that master version to our private devpi server. Otherwise, I would just have skipped our failing test for a few days. As an open source maintainer myself, I don't want to put more pressure on you. :)

@shields-fn
Copy link

Having this in a release would help us as well. It's been well over a month, so maybe this is a good time to put out a 2.1.1?

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 27, 2020

OK, https://pypi.org/project/smart-open/2.1.1/

Thank you for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants