Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Google Cloud Storage to use blob.open #744

Merged
merged 11 commits into from
Dec 9, 2022

Conversation

ddelange
Copy link
Contributor

@ddelange ddelange commented Dec 5, 2022

Title

Refactor Google Cloud Storage to use blob.open

Motivation

Supercedes and closes #729, fixes #599

Adds one commit on top, bumps minimum gcs version to avoid breaking changes as discussed:

diff --git a/setup.py b/setup.py
index 4b906a4..6b679e0 100644
--- a/setup.py
+++ b/setup.py
@@ -37,7 +37,7 @@ def read(fname):


 aws_deps = ['boto3']
-gcs_deps = ['google-cloud-storage>=1.37.0']
+gcs_deps = ['google-cloud-storage>=2.6.0']
 azure_deps = ['azure-storage-blob', 'azure-common', 'azure-core']
 http_deps = ['requests']

diff --git a/smart_open/gcs.py b/smart_open/gcs.py
index 260067e..5e28fb6 100644
--- a/smart_open/gcs.py
+++ b/smart_open/gcs.py
@@ -165,18 +165,11 @@ def Writer(bucket,
     )

     for k, v in blob_properties.items():
-        try:
-            setattr(g_blob, k, v)
-        except AttributeError:
-            logger.warn(f'Unable to set property {k} on blob')
+        setattr(g_blob, k, v)

     _blob = g_blob.open('wb', **blob_open_kwargs)

-    if hasattr(_blob, 'terminate'):
-        raise RuntimeWarning(
-            'Unexpected incompatibility between dependency and google-cloud-storage dependency.'
-            'Things may not work as expected'
-            )
+    # backwards-compatiblity, was deprecated upstream https://cloud.google.com/storage/docs/resumable-uploads
     _blob.terminate = lambda: None

     return _blob

Tests

Tests discussed in linked tickets

Work in progress

Checklist

Before you create the PR, please make sure you have:

  • Picked a concise, informative and complete title
  • Clearly explained the motivation behind the PR
  • Linked to any existing issues that your PR will be solving
  • Included tests for any new functionality
  • Checked that all unit tests pass

Workflow

Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.

cadnce and others added 8 commits October 1, 2022 12:43
Swap to using GCS native blob open under the hood.
Reduces code maintenence overhead.
Breaking changes:
* Removed gcs.Reader/gcs.Writer classes
* No Reader/Writer.terminate()
* The buffer size can no-longer be controlled independently of chunk_size
* calling close twice on a gcs file object will now throw an exception
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ddelange ddelange mentioned this pull request Dec 5, 2022
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ddelange
Copy link
Contributor Author

ddelange commented Dec 5, 2022

full diff vs #729, mainly fixed new lint warnings, removed no-op test, and amended docstring: 12e22b7...ddelange:use-gcs-open-fixes

@mpenkov mpenkov added this to the 6.3.0 milestone Dec 9, 2022
@mpenkov
Copy link
Collaborator

mpenkov commented Dec 9, 2022

Thank you @ddelange and @cadnce.

@petedannemann Can you please give this a once-over? If all looks good, we can merge.

@mpenkov mpenkov merged commit 50bbbea into piskvorky:develop Dec 9, 2022
@mpenkov
Copy link
Collaborator

mpenkov commented Dec 9, 2022

Great job everyone, thank you for efforts with this PR @cadnce @ddelange @petedannemann !

@petedannemann
Copy link
Contributor

Thanks everyone for their work, this PR should substantially reduce the amount of maintenance required for this smart-open plugin!

Comment on lines -508 to -513
def terminate(self):
"""Cancel the underlying resumable upload."""
#
# https://cloud.google.com/storage/docs/xml-api/resumable-upload#example_cancelling_an_upload
#
self._session.delete(self._resumable_upload_url)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cadnce fyi, starting with smart_open 6.3.0 (which includes this PR), multipart uploads will no longer be cancelled when an exception occurs. I've opened an upstream issue: googleapis/python-storage#1228

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: the functionality will be back in their next major release (3.0)

@@ -37,7 +37,7 @@ def read(fname):


aws_deps = ['boto3']
gcs_deps = ['google-cloud-storage>=1.31.0']
gcs_deps = ['google-cloud-storage>=2.6.0']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future reference, this version bump relates to googleapis/python-storage#878 and the general maturing of the open interface in v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Google Cloud Storage to use blob.open
4 participants