For live reload, retry if model is not ready #213

pankajroark · 2023-02-08T14:01:28Z

We were retrying if we couldn't connect to the inference server, but not when we could connect but the model wasn't ready. We do that now. The effect of this issue was that if model took a long time to load, then the requests until then would fail. This leads to a bad user experience when calling predict right after applying a patch for a slow loading model. This retry behavior is consistent with non-reloadable models, where, if the model is deploy via kserve or knative, the request would wait until the model is ready.

Please note that all files under truss_container_fs are auto-generated and can be ignored for this review. Added a test for this case as well.

bolasim · 2023-02-08T14:07:07Z

truss/templates/control/control/endpoints.py

+        retry=(
+            retry_if_exception_type(ConnectionError)
+            | retry_if_exception_type(ModelNotReady)
+        ),
        stop=stop_after_attempt(INFERENCE_SERVER_START_WAIT_SECS),


should we increase this timeout if we're also waiting from the model to be ready

Good point. I think the tradeoff is that for some error cases it may increase the time to return an error. Let's start with this and then we can tune. This will not completely eliminate the issue, models can take any amount of time to load. In those cases it's fine to return the 503 error. Idea is that we retry a bit for the common simple cases.

For live reload, retry if model is not ready

88d7fa4

pankajroark requested review from zero1zero and bolasim February 8, 2023 14:01

bolasim approved these changes Feb 8, 2023

View reviewed changes

pankajroark merged commit 7177c1e into main Feb 8, 2023

pankajroark deleted the pg/control-retry-model-load branch February 8, 2023 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For live reload, retry if model is not ready #213

For live reload, retry if model is not ready #213

pankajroark commented Feb 8, 2023

bolasim Feb 8, 2023

pankajroark Feb 8, 2023

For live reload, retry if model is not ready #213

For live reload, retry if model is not ready #213

Conversation

pankajroark commented Feb 8, 2023

bolasim Feb 8, 2023

Choose a reason for hiding this comment

pankajroark Feb 8, 2023

Choose a reason for hiding this comment