This fork shows how to modify the SageMaker TensorFlow Serving Container to accept images using the reserved 'b64'
key when making requests to TensorFlow Serving's REST interface. Models also need to be modified to decode the images received as input. See container/sagemaker/tensorflow-serving.js, test/resources/models/create_cifar_image.py, and scripts/curl.sh for details.
SageMaker TensorFlow Serving Container is an a open source project that builds docker images for running TensorFlow Serving on Amazon SageMaker.
This documentation covers building and testing these docker images.
For information about using TensorFlow Serving on SageMaker, see: Deploying to TensorFlow Serving Endpoints in the SageMaker Python SDK documentation.
For notebook examples, see: Amazon SageMaker Examples.
Make sure you have installed all of the following prerequisites on your development machine:
For testing, you will also need:
To test GPU images locally, you will also need:
Note: Some of the build and tests scripts interact with resources in your AWS account. Be sure to
set your default AWS credentials and region using aws configure
before using these scripts.
Amazon SageMaker uses Docker containers to run all training jobs and inference endpoints.
The Docker images are built from the Dockerfiles in docker/.
The Dockerfiles are grouped based on the version of TensorFlow Serving they support. Each supported processor type (e.g. "cpu", "gpu") has a different Dockerfile in each group.
To build an image, run the ./scripts/build.sh
script:
./scripts/build.sh --version 1.11 --arch cpu
./scripts/build.sh --version 1.11 --arch gpu
If your are testing locally, building the image is enough. But if you want to your updated image
in SageMaker, you need to publish it to an ECR repository in your account. The
./scripts/publish.sh
script makes that easy:
./scripts/publish.sh --version 1.11 --arch cpu
./scripts/publish.sh --version 1.11 --arch gpu
Note: this will publish to ECR in your default region. Use the --region
argument to
specify a different region.
You can also run your container locally in Docker to test different models and input
inference requests by hand. Standard docker run
commands (or nvidia-docker run
for
GPU images) will work for this, or you can use the provided start.sh
and stop.sh
scripts:
./scripts/start.sh [--version x.xx] [--arch cpu|gpu|...]
./scripts/stop.sh [--version x.xx] [--arch cpu|gpu|...]
When the container is running, you can send test requests to it using any HTTP client. Here's
and an example using the curl
command:
curl -X POST --data-binary @test/resources/inputs/test.json \
-H 'Content-Type: application/json' \
-H 'X-Amzn-SageMaker-Custom-Attributes: tfs-model-name=half_plus_three' \
http://localhost:8080/invocations
Additional curl
examples can be found in ./scripts/curl.sh
.
The package includes automated tests and code checks. The tests use Docker to run the container
image locally, and do not access resources in AWS. You can run the tests and static code
checkers using tox
:
tox
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This library is licensed under the Apache 2.0 License.