-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy model but pod is evicted for many times before running #515
Comments
I think the pertinent line is:
Can you try with a larger cluster? Are you using minikube? If so maybe increase memory e.g.,
|
@cliveseldon Thanks for your reply. And I noticed this line,but I deployed a model which was running at last on the same node last time.
And I'm just curious for what it is doing to curl 127.0.0.1:8000/pause and need to kill pod.
|
The curl pause is a preStophandler telling the svc-orchestrator to do a graceful shutdown. It will only be called when Kubernetes has sent a termination signal to the pod. So I don't think this is connected with the issue. It sounds more like the resource requirements are invalid - maybe add more memory to resource requests? |
Ah,I see.Thanks for your help.
But I get 404 Not Found or 500 Internal Server Error randomly.
From the logs from model pod,I found that it meet some error internal lead to
|
Looks like you are sending the wrong size Tensor
|
But wrong request will result in 404? |
The request resulted in 500 as the python code failed on predicting with that input. 404 would be an incorrect URL to a path that didn't exist. |
But I just request the same URL, get 500 or 404 randomly. |
Yes. 404 is strange for the same path. The 404 could only be if the model is removed so the Ambassador path no longer exists. |
yeah,it is.So it's really strange to get 500 sometimes. Do you have some idea to locate this problem? |
I would solve the 500 first so you are always sending the correct payloads. |
Assuming fixed. Please reopen if still issue on latest Seldon 0.4.0 |
Sometimes the pod will be evicted many many time before running.
the logs from the evicted pod is as below:
The text was updated successfully, but these errors were encountered: