Skip to content

mmgxa/istio_kserve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Istio and KServe

Overview

In this repository, we deploy SDXL model on Kubernetes using KServe. The application is exposed via a Load Balancer. The resulting deployment can be monitored for various parameters and can be visualized in tools like Kiali, Prometheus, and Grafana.

Create SDXL MAR

We create the MAR file to be used by TorchServe/KServe.

python download_model.py 
bash zip_model.sh
docker pull pytorch/torchserve:0.8.1-gpu
docker run -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v `pwd`:/opt/src pytorch/torchserve:0.8.1-gpu bash

Inside the container, run the following commands

cd /opt/src
torch-model-archiver --model-name sdxl --version 1.0 --handler sdxl_handler.py --extra-files sdxl-1.0-model.zip -r requirements.txt

Test locally

We will test the MAR file locally using TorchServe.

docker build -t emlo:s19 . --no-cache 
docker compose up

#separate terminal
curl http://localhost:8081/models
python test.py

Upload to S3

aws s3 cp config.properties s3://emlo-s19-pt/config/
aws s3 cp sdxl.mar s3://emlo-s19-pt/model-store/

Setup

export REGION=...
export ACCOUNT_ID=...
export CLUSTER_NAME=emlo-s19-cluster
curl -fsSL https://raw.githubusercontent.com/aws/karpenter/"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml  > $(mktemp) \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

Create Cluster

Note: For this deployment, a VPC (with public subnets only) is created manually and then used for the cluster.

envsubst < 00_cluster.yaml | eksctl create cluster -f -

We can see that the desired nodes are up and running via kubectl get nodes -L node.kubernetes.io/instance-type

For the remaining setup, we

  • install components such as metrics-server, NVIDIA Data Center GPU Manager, Prometheus, Grafana, Kiali, etc.
  • create service accounts and policies for S3 read only access
  • install Istio and KServe etc.

All of these steps are mentioned in the setup guide.

Deploy SDXL

kubectl apply -f 06_sdxl.yaml

Make sure the pod is up and running.

kubectl get pods, isvc
kubectl logs torchserve-predictor-****

Running Inference

export INGRESS_HOST=$(kubectl -n istio-ingress get service istio-ingress -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
export INGRESS_PORT=$(kubectl -n istio-ingress get service istio-ingress -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

export MODEL_NAME=sdxl
export SERVICE_HOSTNAME=$(kubectl get inferenceservice torchserve -o jsonpath='{.status.url}' | cut -d "/" -f 3)
python 08_test_kserve.py

Generated Images

Prompt: a cat programming on a beach

Prompt: a cat playing football on a beach

Prompt: a cat looking at its reflection in water

Prompt: a cat talking on a mobile phone

Prompt: a cat staring at a chicken

Kiali Graph for the Predictor

GPU Usage

Prometheus

Grafana

Grafana Log

Kiali Logs

Kiali Log

Inference Demo

To see the logs dynamically, k9s is used.

Inference Log

Logs

kubectl get all -A -o yaml

TorchServe Predictor

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published