-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in "seldon-container-engine" with MLFLOW_SERVER #1922
Comments
Hello, I figured it out I had some extra quote in the secret create my bad. The initContainer classifier-model-initializer and the classifier executed successfully. Now I have a new issue in the "seldon-container-engine"
I follow the guide exactly for MLFLOW_SERVER can anybody guide me what is this issue please. Regards, |
@Nithinbs18 those logs seem to suggest that the Have you checked if there is anything in the Something worth mentioning is that if the environment is too large, creating it from scratch may take longer than the readiness / liveness probes. This is a problem with how the |
Hi @adriangonz , Thank you for your response. Thank you very much in advance. |
You can use the apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: model-a
spec:
name: model-a
- componentSpecs:
- spec:
containers:
- env:
- name: FOO
value: bar
# Note that name matches your nodes's name
name: classifier
predictors:
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: s3://seldon/
envSecretRefName: seldon-init-container-secret
name: classifier
name: default
replicas: 1
|
Hi @adriangonz , Greetings!! BR, |
Hey @Nithinbs18 , is the problem that you aren’t able to set the environment variable? Or is it that the proxy is still blocking access to Conda after setting it? |
Hey @adriangonz ,
And also a new issue that I face currently is when the Conda environment is getting created the classifies container keeps crash and the downloads start all over again and again. Regards, |
@Nithinbs18 I believe that the problem may be related to what's described in here: https://docs.seldon.io/projects/seldon-core/en/latest/servers/mlflow.html#conda-environment-creation The pre-packaged This slows down the start up time very aggresively, which can be a blocker when you take into account that Kubernetes has a timeout limit for pods to come up. If the pod exceeds this timeout (i.e. bc it's creating the Conda environment), Kubernetes will kill it. The immediate solution is to build your own inference server specifying your Conda environment at image build time. In other words, creating your own re-usable inference server, with your particular dependencies pre-installed. You can find more info on that here: https://docs.seldon.io/projects/seldon-core/en/latest/servers/custom.html Alternatively, you can also increase the timeouts for Kubernetes' liveness and readiness probes, thus giving more time to your pod to create the environment. You can find an example here on how those can be specified on your |
Hi @adriangonz, As suggested I tried to increase the timeout and it worked absolutely fine now. I could also set the ENV variables using the below format
The deployment works fine now but I have issues accessing them now I cannot make any prediction on my deployments I have installed everything i.e. seldon-core-operator, ambassador and the model deployment on the default namespace. I get errors when to access the model below are the errors;
Can you please suggest on this I have been stuck with this for some time now. |
Hey @Nithinbs18 , The inference URL format is something like:
Therefore, if your model is named
Could you try that one and see if it works? You can read more details on how to test your inference endpoints in the docs: https://docs.seldon.io/projects/seldon-core/en/latest/workflow/serving.html |
Hi @adriangonz , Thank you very much for your response. but I have no luck yet :(
Curl request and response;
Ambassador logs;
I followed the guide exactly but no luck |
Hi @adriangonz, I also tried using the Seldon client i.e.
I got the below error;
|
Your Ambassador ingress seems to be redirecting the request to an SSL endpoint. You can see on the logs from
I'm not sure what could be causing this. Is there any chance you've installed Ambassador in a different way or that you've tweaked any setting? It may also have to do with how your environment is set up. |
@Nithinbs18 have you been able to identify why is your ingress layer redirecting to an SSL endpoint? I will be closing this for now, since the original issue seems to have been resolved. Please re-open if that's still a problem. |
Hi @adriangonz , Thank you very much for your time means a lot. |
That's amazing! I'm really glad to hear that @Nithinbs18! It would actually be really useful if you could share your learnings with regards to Ambassador in #2007 , where we are exploring adding support for Ambassador's Edge stack. |
Hi @adriangonz @Nithinbs18, I'm facing the same issue but it looks like conda environment is being setup properly. Here's the issue.
Here're the logs for classifer container
These logs suggest that the conda env is being created properly right? When I'm using python client to send a request, I'm getting this in the ambasssador logs.
Python client
returns
All pods are in running state and I had port forwarded 8003->8080 as well. Can you guys throw some light on what could I be doing wrong? |
Hi @Utkagr Try with ; It should work if it still does not work please share your k8 manifest used to create the sdep. |
@Nithinbs18 Thanks for helping me out. It worked! But I'm not sure why?
Deployment is named mlflow-default-0-classifier above.
Is it because the metadata in manifest is named |
Hi @Utkagr it simple Deployment != SeldonDeployment |
Got it! Thanks a lot for helping out. |
Hi @adriangonz @Nithinbs18 , I am also facing the same issue with seldon(on aws eks cluster) + ambassador in deploying model using the MLFLOW server.
My pods are running fine :
My mlflow pod logs tail is as follows:
I installed amabassador API gateway with the link: Got the sevice URL/load-balancer-url by, Running the command
I am able to access the ambassador diagnostic dashboard at <load-balancer-external-ip/ambassador/v0/diag/> and the swagger seldon ui at <load-balancer-external-ip/seldon/seldon/mlflow/api/v1.0/doc/>. But, the prediction API sends, Any help would be welcome, |
Hi @akshay2490 Can you increase the "initialDelaySeconds" value to 450, and try once again? |
Hi @Nithinbs18, No luck. |
Dear team,
Greetings!!
I have been trying to deploy a model locally on my laptop using MLFLOW server, I have the appropriate credentials create as mentioned in Prepackaged Model Servers --> Handling credentials --> Create a secret containing the environment variables (https://docs.seldon.io/projects/seldon-core/en/v1.1.0/servers/overview.html#create-a-secret-containing-the-environment-variables)
I have my yaml ;
It always ends up with an error in the initContainer i.e. "classifier-model-initializer" with the below error
Could you please suggest anything that might be of help to me with this issue?
Thank you very much in advance.
Regards,
Nithin Bhardwaj
The text was updated successfully, but these errors were encountered: