-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the KServe wrapper configuration loading more resiliant #2995
Make the KServe wrapper configuration loading more resiliant #2995
Conversation
…liant The many defaults defined cannot currently be used as if the matching keys are not present in the properties file, it's loading will fail. This patch fixes that and also ignores lines starting with # as they should be.
@sgaist Can you please run the KServe nightly test job on your branch and attach the logs |
LGTM. Thanks @sgaist |
@agunapal sure thing: Nightly run with image built from branchMNIST KServe V2 test begin Deploying the cluster inferenceservice.serving.kserve.io/torchserve-mnist-v2 created Waiting for pod to come up... Check status of the pod NAME READY STATUS RESTARTS AGE torchserve-mnist-v2-predictor-00001-deployment-86786467c8-jf6fl 1/2 Running 0 80s Name: torchserve-mnist-v2-predictor-00001-deployment-86786467c8-jf6fl Namespace: default Priority: 0 Service Account: default Node: minikube/192.168.49.2 Start Time: Wed, 06 Mar 2024 11:02:26 +0100 Labels: app=torchserve-mnist-v2-predictor-00001 component=predictor pod-template-hash=86786467c8 service.istio.io/canonical-name=torchserve-mnist-v2-predictor service.istio.io/canonical-revision=torchserve-mnist-v2-predictor-00001 serviceEnvelope=kservev2 serving.knative.dev/configuration=torchserve-mnist-v2-predictor serving.knative.dev/configurationGeneration=1 serving.knative.dev/configurationUID=567b233d-e417-44d5-ae8c-3b058306e4a7 serving.knative.dev/revision=torchserve-mnist-v2-predictor-00001 serving.knative.dev/revisionUID=c9e5d855-3f6d-43e3-93e6-ddb3310aa32c serving.knative.dev/service=torchserve-mnist-v2-predictor serving.knative.dev/serviceUID=d80115fc-cf2e-445e-9655-a30cc4b0e29f serving.kserve.io/inferenceservice=torchserve-mnist-v2 Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/min-scale: 1 internal.serving.kserve.io/storage-initializer-sourceuri: gs://kfserving-examples/models/torchserve/image_classifier/v2 prometheus.kserve.io/path: /metrics prometheus.kserve.io/port: 8082 serving.knative.dev/creator: system:serviceaccount:kserve:kserve-controller-manager serving.kserve.io/enable-metric-aggregation: false serving.kserve.io/enable-prometheus-scraping: false Status: Running IP: 10.244.0.20 IPs: IP: 10.244.0.20 Controlled By: ReplicaSet/torchserve-mnist-v2-predictor-00001-deployment-86786467c8 Init Containers: storage-initializer: Container ID: docker://a863740a574128de498d670aed4301fa1a55ead66a1dc36cac968daab6eb7186 Image: kserve/storage-initializer:v0.11.0 Image ID: docker-pullable://kserve/storage-initializer@sha256:962682077dc30c21113822f50b63400d759ce6c49518d0c9caa638b6f77c7fed Port: Host Port: Args: gs://kfserving-examples/models/torchserve/image_classifier/v2 /mnt/models State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 06 Mar 2024 11:02:32 +0100 Finished: Wed, 06 Mar 2024 11:03:12 +0100 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 100Mi Environment: Mounts: /mnt/models from kserve-provision-location (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqn8l (ro) Containers: kserve-container: Container ID: docker://9295f04563a315969891f45c593cf5db871783b79141874ea547a84e250f8bd1 Image: dev.local/pytorch/torchserve-kfs:latest-cpu Image ID: docker://sha256:2a08d2a37e9e1d8c4195c640b6be69511a163924947060fc05ff810e92b3928f Port: 8080/TCP Host Port: 0/TCP Args: torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties State: Running Started: Wed, 06 Mar 2024 11:03:20 +0100 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 256Mi Environment: TS_SERVICE_ENVELOPE: kservev2 PORT: 8080 K_REVISION: torchserve-mnist-v2-predictor-00001 K_CONFIGURATION: torchserve-mnist-v2-predictor K_SERVICE: torchserve-mnist-v2-predictor Mounts: /mnt/models from kserve-provision-location (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqn8l (ro) queue-proxy: Container ID: docker://448fb992060c5844233620ab05ce88137aede96fd9c1b037d81a82c00237467c Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f Image ID: docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP, 8112/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP State: Running Started: Wed, 06 Mar 2024 11:03:42 +0100 Ready: False Restart Count: 0 Requests: cpu: 25m Readiness: http-get http://:8012/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: SERVING_NAMESPACE: default SERVING_SERVICE: torchserve-mnist-v2-predictor SERVING_CONFIGURATION: torchserve-mnist-v2-predictor SERVING_REVISION: torchserve-mnist-v2-predictor-00001 QUEUE_SERVING_PORT: 8012 QUEUE_SERVING_TLS_PORT: 8112 CONTAINER_CONCURRENCY: 0 REVISION_TIMEOUT_SECONDS: 300 REVISION_RESPONSE_START_TIMEOUT_SECONDS: 0 REVISION_IDLE_TIMEOUT_SECONDS: 0 SERVING_POD: torchserve-mnist-v2-predictor-00001-deployment-86786467c8-jf6fl (v1:metadata.name) SERVING_POD_IP: (v1:status.podIP) SERVING_LOGGING_CONFIG: SERVING_LOGGING_LEVEL: SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"} SERVING_ENABLE_REQUEST_LOG: false SERVING_REQUEST_METRICS_BACKEND: prometheus TRACING_CONFIG_BACKEND: none TRACING_CONFIG_ZIPKIN_ENDPOINT: TRACING_CONFIG_DEBUG: false TRACING_CONFIG_SAMPLE_RATE: 0.1 USER_PORT: 8080 SYSTEM_NAMESPACE: knative-serving METRICS_DOMAIN: knative.dev/internal/serving SERVING_READINESS_PROBE: {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1} ENABLE_PROFILING: false SERVING_ENABLE_PROBE_REQUEST_LOG: false METRICS_COLLECTOR_ADDRESS: HOST_IP: (v1:status.hostIP) ENABLE_HTTP2_AUTO_DETECTION: false ROOT_CA: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqn8l (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-bqn8l: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true kserve-provision-location: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 79s default-scheduler Successfully assigned default/torchserve-mnist-v2-predictor-00001-deployment-86786467c8-jf6fl to minikube Normal Pulled 75s kubelet Container image "kserve/storage-initializer:v0.11.0" already present on machine Normal Created 73s kubelet Created container storage-initializer Normal Started 73s kubelet Started container storage-initializer Normal Pulled 29s kubelet Container image "dev.local/pytorch/torchserve-kfs:latest-cpu" already present on machine Normal Created 26s kubelet Created container kserve-container Normal Started 25s kubelet Started container kserve-container Normal Pulled 25s kubelet Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:65c427aaab3be9cea1afea32cdef26d5855c69403077d2dc3439f75c26a1e83f" already present on machine Normal Created 4s kubelet Created container queue-proxy Normal Started 3s kubelet Started container queue-proxy Warning Unhealthy 1s kubelet Readiness probe failed: Get "http://10.244.0.20:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)Wait for inference service to be ready Normal Scheduled 46s default-scheduler Successfully assigned default/torchserve-predictor-00001-deployment-76686c46d9-qb95x to minikube Wait for inference service to be ready I have only posted the two first experiments that explicitly use the nightly image. In this run they are custom built on my branch (tag: dev.local/pytorch/torchserve-kfs:latest-cpu) and not pulled In addition, here is a snippet of the service log. You can notice the model names are now "dict_keys" rather than a just a list.
@lxning You're welcome ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR refactors the properties loading when using the KServe wrapper so that if some configuration elements are missing the defaults defined are used unlike currently where the wrapper fails to start.
This patch also adds ignoring lines starting with a
#
as defined in the properties file format.Fixes #2994
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
model
rather than mnist/mnt/models/model-store
config.properties
in the/mnt/models/config.properties
Checklist: