-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrated Google Cloud Storage #1017
Conversation
Co-authored-by: korusuke <karan.sheth@somaiya.edu>
Codecov Report
@@ Coverage Diff @@
## master #1017 +/- ##
==========================================
- Coverage 63.58% 62.92% -0.67%
==========================================
Files 125 130 +5
Lines 8262 8123 -139
==========================================
- Hits 5253 5111 -142
- Misses 3009 3012 +3
Continue to review full report at Codecov.
|
Hi, @Korusuke I talked with @PrabhanshuAttri Let's add unit tests and e2e test for Google cloud storage. This is a very critical piece of Yatai service and a lot of people are depending on this. Want to be more deliberate with this. |
bentoml/cli/bento_service.py
Outdated
@@ -120,7 +121,7 @@ def resolve_bundle_path(bento, pip_installed_bundle_path): | |||
), "pip installed BentoService commands should not have Bento argument" | |||
return pip_installed_bundle_path | |||
|
|||
if os.path.isdir(bento) or is_s3_url(bento): | |||
if os.path.isdir(bento) or is_s3_url(bento) or is_gcs_url(bento): | |||
# saved_bundle already support loading local and s3 path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment to reflect this change
if response.status_code != 200: | ||
raise BentoMLException( | ||
f"Error retrieving BentoService bundle. " | ||
f"{response.status_code}: {response.text}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
bentoml/utils/gcs.py
Outdated
|
||
def is_gcs_url(url): | ||
""" | ||
Check if url is an gs url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if the URL is a GCS URL.
Check if url is an gs url | ||
""" | ||
try: | ||
return urlparse(url).scheme in ["gs"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is gs://
the standard way for Google cloud URI? Could we document this, if you have a reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yubozhao Should we put the documentation link as a comment?
Here is the documentation for the same https://cloud.google.com/storage/docs/gsutil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great. Would you mind include that link in the form of comments for this funciton?
elif response.uri.type == BentoUri.GCS: | ||
self._update_bento_upload_progress( | ||
bento_service_metadata, UploadStatus.UPLOADING, 0 | ||
) | ||
|
||
fileobj = io.BytesIO() | ||
with tarfile.open(mode="w:gz", fileobj=fileobj) as tar: | ||
tar.add(saved_bento_path, arcname=bento_service_metadata.name) | ||
fileobj.seek(0, 0) | ||
|
||
http_response = requests.put(response.uri.cloud_presigned_url, data=fileobj) | ||
|
||
if http_response.status_code != 200: | ||
self._update_bento_upload_progress( | ||
bento_service_metadata, UploadStatus.ERROR | ||
) | ||
raise BentoMLException( | ||
f"Error saving BentoService bundle to GCS. " | ||
f"{http_response.status_code}: {http_response.text}" | ||
) | ||
|
||
self._update_bento_upload_progress(bento_service_metadata) | ||
|
||
logger.info( | ||
"Successfully saved BentoService bundle '%s:%s' to GCS: %s", | ||
bento_service_metadata.name, | ||
bento_service_metadata.version, | ||
response.uri.uri, | ||
) | ||
|
||
return response.uri.uri | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like using the same code path as the S3 one. Let's refactor this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yubozhao Would you like us to merge this if
condition with S3 if
condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The only difference between those two statements is raising exceptions. We should be refactoring them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have merged them, will push the changes in a while.
bentoml/yatai/yatai_service_impl.py
Outdated
if bento_pb.uri.type == BentoUri.S3: | ||
bento_pb.uri.s3_presigned_url = self.repo.get( | ||
bento_pb.uri.cloud_presigned_url = self.repo.get( | ||
bento_pb.name, bento_pb.version | ||
) | ||
if bento_pb.uri.type == BentoUri.GCS: | ||
bento_pb.uri.cloud_presigned_url = self.repo.get( | ||
bento_pb.name, bento_pb.version | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's refactor these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yubozhao As far as I can understand, we should merge these 2 if conditions. Please confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should refactor these two if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Hello @Korusuke, Thanks for updating this PR. There are currently no PEP 8 issues detected in this PR. Cheers! 🍻 Comment last updated at 2020-09-06 19:19:53 UTC |
* Integrated Google Cloud Storage Co-authored-by: korusuke <karan.sheth@somaiya.edu> * e2e tests * Addressed PR review comments * formatting * update setup file * remove aws-sam-cli from test requirements * restore s3_prsigned_url and add gcs_presigned_url Co-authored-by: PrabhanshuAttri <contact@prabhanshu.com> Co-authored-by: yubozhao <yubz86@gmail.com>
Co-authored-by: korusuke karan.sheth@somaiya.edu
Description
Adds Google Cloud Storage (GCS) support for storing BentoML service, this is an alternative to S3/MiniIO.
This internally changes
s3_presigned_url
tocloud_presigned_url
, this will allow for better readability as more cloud storage can be supported in future. No existing S3 functionality is affected/changed due to this PR.Motivation and Context
#661
How Has This Been Tested?
Manually tested
End-to-end tests are added, but they were not tested cause of storage issues😅
Types of changes
Component(s) if applicable
Checklist:
./dev/format.sh
and./dev/lint.sh
script have passed(instructions).