Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated Google Cloud Storage #1017

Merged
merged 7 commits into from
Sep 6, 2020

Conversation

Korusuke
Copy link
Contributor

Co-authored-by: korusuke karan.sheth@somaiya.edu

Description

Adds Google Cloud Storage (GCS) support for storing BentoML service, this is an alternative to S3/MiniIO.

This internally changes s3_presigned_url to cloud_presigned_url, this will allow for better readability as more cloud storage can be supported in future. No existing S3 functionality is affected/changed due to this PR.

Motivation and Context

#661

How Has This Been Tested?

Manually tested
End-to-end tests are added, but they were not tested cause of storage issues😅

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature and improvements (non-breaking change which adds/improves functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Code Refactoring (internal change which is not user facing)
  • Documentation
  • Test, CI, or build

Component(s) if applicable

  • BentoService (service definition, dependency management, API input/output adapters)
  • Model Artifact (model serialization, multi-framework support)
  • Model Server (mico-batching, dockerisation, logging, OpenAPI, instruments)
  • YataiService gRPC server (model registry, cloud deployment automation)
  • YataiService web server (nodejs HTTP server and web UI)
  • Internal (BentoML's own configuration, logging, utility, exception handling)
  • BentoML CLI

Checklist:

  • My code follows the bentoml code style, both ./dev/format.sh and
    ./dev/lint.sh script have passed
    (instructions).
  • My change reduces project test coverage and requires unit tests to be added
  • I have added unit tests covering my code change
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

Co-authored-by: korusuke <karan.sheth@somaiya.edu>
@Korusuke Korusuke marked this pull request as ready for review August 24, 2020 04:07
@codecov
Copy link

codecov bot commented Aug 24, 2020

Codecov Report

Merging #1017 into master will decrease coverage by 0.66%.
The diff coverage is 33.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1017      +/-   ##
==========================================
- Coverage   63.58%   62.92%   -0.67%     
==========================================
  Files         125      130       +5     
  Lines        8262     8123     -139     
==========================================
- Hits         5253     5111     -142     
- Misses       3009     3012       +3     
Impacted Files Coverage Δ
bentoml/cli/bento_management.py 27.11% <0.00%> (-3.92%) ⬇️
bentoml/yatai/client/bento_repository_api.py 69.56% <0.00%> (-3.17%) ⬇️
bentoml/yatai/proto/repository_pb2.py 100.00% <ø> (ø)
bentoml/saved_bundle/loader.py 52.67% <20.00%> (-6.21%) ⬇️
bentoml/yatai/repository/gcs_repository.py 28.84% <28.84%> (ø)
bentoml/yatai/yatai_service_impl.py 44.54% <50.00%> (+0.05%) ⬆️
bentoml/cli/bento_service.py 69.81% <75.00%> (-1.62%) ⬇️
bentoml/utils/gcs.py 75.00% <75.00%> (ø)
bentoml/yatai/repository/repository.py 81.81% <75.00%> (-1.52%) ⬇️
bentoml/adapters/annotated_image_input.py 84.90% <0.00%> (-15.10%) ⬇️
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7253155...606f22b. Read the comment docs.

@parano parano requested a review from yubozhao August 25, 2020 02:09
@yubozhao
Copy link
Contributor

Hi, @Korusuke I talked with @PrabhanshuAttri Let's add unit tests and e2e test for Google cloud storage. This is a very critical piece of Yatai service and a lot of people are depending on this. Want to be more deliberate with this.

@@ -120,7 +121,7 @@ def resolve_bundle_path(bento, pip_installed_bundle_path):
), "pip installed BentoService commands should not have Bento argument"
return pip_installed_bundle_path

if os.path.isdir(bento) or is_s3_url(bento):
if os.path.isdir(bento) or is_s3_url(bento) or is_gcs_url(bento):
# saved_bundle already support loading local and s3 path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment to reflect this change

Comment on lines +72 to +76
if response.status_code != 200:
raise BentoMLException(
f"Error retrieving BentoService bundle. "
f"{response.status_code}: {response.text}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


def is_gcs_url(url):
"""
Check if url is an gs url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the URL is a GCS URL.

Check if url is an gs url
"""
try:
return urlparse(url).scheme in ["gs"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gs:// the standard way for Google cloud URI? Could we document this, if you have a reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yubozhao Should we put the documentation link as a comment?

Here is the documentation for the same https://cloud.google.com/storage/docs/gsutil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great. Would you mind include that link in the form of comments for this funciton?

Comment on lines 149 to 180
elif response.uri.type == BentoUri.GCS:
self._update_bento_upload_progress(
bento_service_metadata, UploadStatus.UPLOADING, 0
)

fileobj = io.BytesIO()
with tarfile.open(mode="w:gz", fileobj=fileobj) as tar:
tar.add(saved_bento_path, arcname=bento_service_metadata.name)
fileobj.seek(0, 0)

http_response = requests.put(response.uri.cloud_presigned_url, data=fileobj)

if http_response.status_code != 200:
self._update_bento_upload_progress(
bento_service_metadata, UploadStatus.ERROR
)
raise BentoMLException(
f"Error saving BentoService bundle to GCS. "
f"{http_response.status_code}: {http_response.text}"
)

self._update_bento_upload_progress(bento_service_metadata)

logger.info(
"Successfully saved BentoService bundle '%s:%s' to GCS: %s",
bento_service_metadata.name,
bento_service_metadata.version,
response.uri.uri,
)

return response.uri.uri

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like using the same code path as the S3 one. Let's refactor this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yubozhao Would you like us to merge this if condition with S3 if condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The only difference between those two statements is raising exceptions. We should be refactoring them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have merged them, will push the changes in a while.

Comment on lines 398 to 405
if bento_pb.uri.type == BentoUri.S3:
bento_pb.uri.s3_presigned_url = self.repo.get(
bento_pb.uri.cloud_presigned_url = self.repo.get(
bento_pb.name, bento_pb.version
)
if bento_pb.uri.type == BentoUri.GCS:
bento_pb.uri.cloud_presigned_url = self.repo.get(
bento_pb.name, bento_pb.version
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor these two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yubozhao As far as I can understand, we should merge these 2 if conditions. Please confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should refactor these two if statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@pep8speaks
Copy link

pep8speaks commented Aug 28, 2020

Hello @Korusuke, Thanks for updating this PR.

There are currently no PEP 8 issues detected in this PR. Cheers! 🍻

Comment last updated at 2020-09-06 19:19:53 UTC

@Korusuke Korusuke requested a review from yubozhao August 28, 2020 08:44
yubozhao
yubozhao previously approved these changes Sep 5, 2020
@parano parano merged commit e064ce1 into bentoml:master Sep 6, 2020
aarnphm pushed a commit to aarnphm/BentoML that referenced this pull request Jul 29, 2022
* Integrated Google Cloud Storage

Co-authored-by: korusuke <karan.sheth@somaiya.edu>

* e2e tests

* Addressed PR review comments

* formatting

* update setup file

* remove aws-sam-cli from test requirements

* restore s3_prsigned_url and add gcs_presigned_url

Co-authored-by: PrabhanshuAttri <contact@prabhanshu.com>
Co-authored-by: yubozhao <yubz86@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants