Multi gpu support #127

waytrue17 · 2022-08-04T22:24:03Z

Description of changes:
Enabling multi gpu support. It passes context information to hander functions so that model/data can be assigned to multiple gpu devices.

To enable this feature, customer will need to change the custom handler declaration to add the context. For example, from input_fn(input_data, content_type) to input_fn(input_data, content_type, context).
For backward compatibility, this implementation will not break existing use cases where no context gets passed e.g. input_fn(input_data, content_type) will still work.
Tested the implementation on a 8 gpu instance with SAGEMAKER_MODEL_SERVER_WORKERS=9. The workers were assigned to different gpus as expected:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    396466      C   /opt/conda/bin/python3.8         1729MiB |
|    1   N/A  N/A    396465      C   /opt/conda/bin/python3.8         1729MiB |
|    1   N/A  N/A    396467      C   /opt/conda/bin/python3.8         1729MiB |
|    2   N/A  N/A    396470      C   /opt/conda/bin/python3.8         1729MiB |
|    3   N/A  N/A    396462      C   /opt/conda/bin/python3.8         1729MiB |
|    4   N/A  N/A    396469      C   /opt/conda/bin/python3.8         1729MiB |
|    5   N/A  N/A    396463      C   /opt/conda/bin/python3.8         1729MiB |
|    6   N/A  N/A    396468      C   /opt/conda/bin/python3.8         1729MiB |
|    7   N/A  N/A    396464      C   /opt/conda/bin/python3.8         1729MiB |
+-----------------------------------------------------------------------------+

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sagemaker-bot · 2022-08-04T22:32:58Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 47d6f21
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

src/sagemaker_pytorch_serving_container/handler_service.py

src/sagemaker_pytorch_serving_container/transformer.py

sagemaker-bot · 2022-08-05T23:43:30Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 2a0dbda
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-06T01:21:51Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: d869551
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-06T01:50:45Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: c774dd0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-09T03:51:43Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: b5c5037
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-09T21:00:49Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 1ab8249
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-09T21:29:46Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 7fad13e
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-09T21:47:11Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 2d6ce94
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-10T00:12:31Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 0d293f4
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

ashishgupta023 · 2022-08-10T04:33:00Z

have you tested below inference use cases with the DLC container

customer provides an inference script with context (new way) ?
customer provides an inference script without the context (old way) ?

could you please attach the test details on the description ?

ashishgupta023 · 2022-08-10T04:35:41Z

I think this change will also be required for the MXNET DLC containers with MMS, instead of adding a new transformer and adapting the handler service in the pytorch toolkit, could we consider adapting the transformer and handler service in the inference toolkit to work with the context so that this change could be applied to both ? It would also help make it less error prone in the future since the change would be at a single place.

sagemaker-bot · 2022-08-10T18:39:12Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: ef19cb7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

waytrue17 · 2022-08-10T21:42:40Z

I think this change will also be required for the MXNET DLC containers with MMS, instead of adding a new transformer and adapting the handler service in the pytorch toolkit, could we consider adapting the transformer and handler service in the inference toolkit to work with the context so that this change could be applied to both ? It would also help make it less error prone in the future since the change would be at a single place.

Makes sense. I will split the code and re-run some tests. Will post the test results afterward

sagemaker-bot · 2022-08-17T20:34:02Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: ef19cb7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-18T22:42:46Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: ef19cb7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-08-18T23:22:05Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: ef19cb7
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Wei Chu added 2 commits July 28, 2022 16:17

multi-gpu support

1f1ea24

add fucntion wrapper

47d6f21

rohithkrn reviewed Aug 5, 2022

View reviewed changes

src/sagemaker_pytorch_serving_container/handler_service.py Outdated Show resolved Hide resolved

rohithkrn reviewed Aug 5, 2022

View reviewed changes

src/sagemaker_pytorch_serving_container/transformer.py Outdated Show resolved Hide resolved

Wei Chu added 2 commits August 5, 2022 16:14

fix for batch inference

58f7425

update unit test

2a0dbda

fix sanity

d869551

fix sanity

c774dd0

add test

b5c5037

fix sanity

1ab8249

fix sanity

7fad13e

fix test

2d6ce94

add test

0d293f4

fix protobuf version

ef19cb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi gpu support #127

Multi gpu support #127

waytrue17 commented Aug 4, 2022 •

edited

Loading

sagemaker-bot commented Aug 4, 2022

sagemaker-bot commented Aug 5, 2022

sagemaker-bot commented Aug 6, 2022

sagemaker-bot commented Aug 6, 2022

sagemaker-bot commented Aug 9, 2022

sagemaker-bot commented Aug 9, 2022

sagemaker-bot commented Aug 9, 2022

sagemaker-bot commented Aug 9, 2022

sagemaker-bot commented Aug 10, 2022

ashishgupta023 commented Aug 10, 2022

ashishgupta023 commented Aug 10, 2022 •

edited

Loading

sagemaker-bot commented Aug 10, 2022

waytrue17 commented Aug 10, 2022

sagemaker-bot commented Aug 17, 2022

sagemaker-bot commented Aug 18, 2022

sagemaker-bot commented Aug 18, 2022

Multi gpu support #127

Are you sure you want to change the base?

Multi gpu support #127

Conversation

waytrue17 commented Aug 4, 2022 • edited Loading

sagemaker-bot commented Aug 4, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 5, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 6, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 6, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 9, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 9, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 9, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 9, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 10, 2022

AWS CodeBuild CI Report

ashishgupta023 commented Aug 10, 2022

ashishgupta023 commented Aug 10, 2022 • edited Loading

sagemaker-bot commented Aug 10, 2022

AWS CodeBuild CI Report

waytrue17 commented Aug 10, 2022

sagemaker-bot commented Aug 17, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 18, 2022

AWS CodeBuild CI Report

sagemaker-bot commented Aug 18, 2022

AWS CodeBuild CI Report

waytrue17 commented Aug 4, 2022 •

edited

Loading

ashishgupta023 commented Aug 10, 2022 •

edited

Loading