This tutorial shows how to use TensorFlow Serving components running in Docker containers to serve the TensorFlow ResNet model and how to deploy the serving cluster with Kubernetes.
To learn more about TensorFlow Serving, we recommend TensorFlow Serving basic tutorial and TensorFlow Serving advanced tutorial.
To learn more about TensorFlow ResNet model, we recommend reading ResNet in TensorFlow.
- Part 1 gets your environment setup
- Part 2 shows how to run the local Docker serving image
- Part 3 shows how to deploy in Kubernetes.
Before getting started, first install Docker.
Let's clear our local models directory in case we already have one:
rm -rf /tmp/resnet
Deep residual networks, or ResNets for short, provided the breakthrough idea of identity mappings in order to enable training of very deep convolutional neural networks. For our example, we will download a TensorFlow SavedModel of ResNet for the ImageNet dataset.
mkdir /tmp/resnet
curl -s | \
tar --strip-components=2 -C /tmp/resnet -xvz
We can verify we have the SavedModel:
$ ls /tmp/resnet/*
saved_model.pb variables
Now we want to take a serving image and
commit all
changes to a new image $USER/resnet_serving
for Kubernetes deployment.
First we run a serving image as a daemon:
docker run -d --name serving_base tensorflow/serving
Next, we copy the ResNet model data to the container's model folder:
docker cp /tmp/resnet serving_base:/models/resnet
Finally we commit the container to serving the ResNet model:
docker commit --change "ENV MODEL_NAME resnet" serving_base \
Now let's stop the serving base container
docker kill serving_base
docker rm serving_base
Now let's start the container with the ResNet model so it's ready for serving, exposing the gRPC port 8500:
docker run -p 8500:8500 -t $USER/resnet_serving &
For the client, we will need to clone the TensorFlow Serving GitHub repo:
git clone
cd serving
Query the server with The client downloads an image and sends it over gRPC for classification into ImageNet categories.
tools/ python tensorflow_serving/example/
This should result in output like:
outputs {
key: "classes"
value {
dtype: DT_INT64
tensor_shape {
dim {
size: 1
int64_val: 286
outputs {
key: "probabilities"
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: 1
dim {
size: 1001
float_val: 2.41628322328e-06
float_val: 1.90121829746e-06
float_val: 2.72477100225e-05
float_val: 4.42638565801e-07
float_val: 8.98362372936e-07
float_val: 6.84421956976e-06
float_val: 1.66555237229e-05
float_val: 1.59407863976e-06
float_val: 1.2315689446e-06
float_val: 1.17812135159e-06
float_val: 1.46365800902e-05
float_val: 5.81210713335e-07
float_val: 6.59980651108e-05
float_val: 0.00129527016543
model_spec {
name: "resnet"
version {
value: 1538687457
signature_name: "serving_default"
It works! The server successfully classifies a cat image!
In this section we use the container image built in Part 0 to deploy a serving cluster with Kubernetes in the Google Cloud Platform.
Here we assume you have created and logged in a
gcloud project named
gcloud auth login --project tensorflow-serving
First we create a Google Kubernetes Engine cluster for service deployment.
$ gcloud container clusters create resnet-serving-cluster --num-nodes 5
Which should output something like:
Creating cluster resnet-serving-cluster...done.
Created [].
kubeconfig entry generated for resnet-serving-cluster.
resnet-serving-cluster us-central1-f 1.1.8 n1-standard-1 1.1.8 5 RUNNING
Set the default cluster for gcloud container command and pass cluster credentials to kubectl.
gcloud config set container/cluster resnet-serving-cluster
gcloud container clusters get-credentials resnet-serving-cluster
which should result in:
Fetching cluster endpoint and auth data.
kubeconfig entry generated for resnet-serving-cluster.
Let's now push our image to the Google Container Registry so that we can run it on Google Cloud Platform.
First we tag the $USER/resnet_serving
image using the Container Registry
format and our project name,
docker tag $USER/resnet_serving
Next we push the image to the Registry,
gcloud docker -- push
The deployment consists of 3 replicas of resnet_inference
server controlled by
a Kubernetes Deployment.
The replicas are exposed externally by a
Kubernetes Service along with
External Load Balancer.
We create them using the example Kubernetes config resnet_k8s.yaml.
kubectl create -f tensorflow_serving/example/resnet_k8s.yaml
With output:
deployment "resnet-deployment" created
service "resnet-service" created
To view status of the deployment and pods:
$ kubectl get deployments
resnet-deployment 3 3 3 3 5s
$ kubectl get pods
resnet-deployment-bbcbc 1/1 Running 0 10s
resnet-deployment-cj6l2 1/1 Running 0 10s
resnet-deployment-t1uep 1/1 Running 0 10s
To view status of the service:
$ kubectl get services
resnet-service 8500/TCP 1m
It can take a while for everything to be up and running.
$ kubectl describe service resnet-service
Name: resnet-service
Namespace: default
Labels: run=resnet-service
Selector: run=resnet-service
Type: LoadBalancer
LoadBalancer Ingress:
Port: <unset> 8500/TCP
NodePort: <unset> 30334/TCP
Endpoints: <none>
Session Affinity: None
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {service-controller } Normal CreatingLoadBalancer Creating load balancer
1m 1m 1 {service-controller } Normal CreatedLoadBalancer Created load balancer
The service external IP address is listed next to LoadBalancer Ingress.
We can now query the service at its external address from our local host.
$ tools/ python \
tensorflow_serving/example/ \
You have successfully deployed the ResNet model serving as a service in Kubernetes!