Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline install timeout #414

Closed
gyliu513 opened this issue Nov 29, 2018 · 12 comments
Closed

Pipeline install timeout #414

gyliu513 opened this issue Nov 29, 2018 · 12 comments

Comments

@gyliu513
Copy link
Member

Following steps at here https://www.kubeflow.org/docs/guides/pipelines/deploy-pipelines-service/ , pipeline failed to start.

root@gyliu-c11:~# kubectl get nodes
NAME             STATUS    ROLES                          AGE       VERSION
172.16.250.138   Ready     etcd,management,master,proxy   6d        v1.11.1+icp-ee
172.16.250.140   Ready     worker                         6d        v1.11.1+icp-ee
root@gyliu-c11:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1+icp-ee", GitCommit:"5803c3b1f9422c43a963e0610b3a4cad565e127e", GitTreeState:"clean", BuildDate:"2018-09-04T09:29:02Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1+icp-ee", GitCommit:"5803c3b1f9422c43a963e0610b3a4cad565e127e", GitTreeState:"clean", BuildDate:"2018-09-04T09:29:02Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
PIPELINE_VERSION=0.1.2
kubectl create -f https://storage.googleapis.com/ml-pipeline/release/$PIPELINE_VERSION/bootstrapper.yaml

Then check job and pod log, found timeout.

root@gyliu-c11:~/test# kubectl get jobs
NAME                       DESIRED   SUCCESSFUL   AGE
deploy-ml-pipeline-rnqzc   1         0            10m
root@gyliu-c11:~/test# kubectl get pods
NAME                             READY     STATUS    RESTARTS   AGE
deploy-ml-pipeline-rnqzc-2pn4f   0/1       Error     0          10m
deploy-ml-pipeline-rnqzc-p4sfr   0/1       Error     0          5m

Job pod log:

+ set -e
+ '[' 1 -eq 0 ']'
+ ks env add default --namespace kubeflow
level=info msg="Using context \"default\" from kubeconfig file \"/root/.kube/config\""
level=info msg="Creating environment \"default\" with namespace \"kubeflow\", pointing to cluster at address \"https://10.0.0.1:443\""
+ ks apply default -c ambassador
level=error msg="find objects: Received status code '404' when trying to retrieve OpenAPI schema for cluster version 'v1.11.1+icp' from URL 'https://raw.githubusercontent.com/kubernetes/kubernetes/v1.11.1+icp/api/openapi-spec/swagger.json'"
level=error msg="find objects: Received status code '404' when trying to retrieve OpenAPI schema for cluster version 'v1.11.1+icp' from URL 'https://raw.githubusercontent.com/kubernetes/kubernetes/v1.11.1+icp/api/openapi-spec/swagger.json'"
Waiting for ML pipeline to be ready...
............................................................ML Pipeline not start successfully after 4 minutes. Timeout...
@IronPan
Copy link
Member

IronPan commented Dec 3, 2018

thanks @gyliu513
are you sure the ambassador failure was caused by the ks version?

@gyliu513
Copy link
Member Author

gyliu513 commented Dec 3, 2018

Yes, the new ksonnet works. @IronPan

@IronPan
Copy link
Member

IronPan commented Dec 3, 2018

I mean the test failure is caused by old version of ks instead of flakiness?
FYI there are e2e tests covers the deployment. I would be surprised if that;'s the case.

@IronPan
Copy link
Member

IronPan commented Dec 3, 2018

@jlewi Is the minimum KS version requirement for kubeflow changed?

@gyliu513
Copy link
Member Author

gyliu513 commented Dec 3, 2018

@IronPan @jlewi please refer to ksonnet/ksonnet#427 for the ksonnet PR, it was caused by the k8s version here v1.11.1+icp and it has been fixed in ksonnet 0.12.0, and here I was trying to bump the version to 0.13.1.

If we do not upgrade, then pipeline will be failed to be installed on all of the k8s distributions except native Kubernetes. Like it will be failed on AzureStack, OpenShift, IBM Cloud Private etc.

@jlewi
Copy link
Contributor

jlewi commented Dec 7, 2018

The instructions:
https://www.kubeflow.org/docs/guides/pipelines/deploy-pipelines-service/

Are still using pipeline's bootstrapper to deploy pipelines. This is running a pod on the cluster to deploy pipelines. This pod has ksonnet installed and the issue looks like that version is not new enough (0.13) to pickup ksonnet/ksonnet#427.

I think this is a pipeline issue; not a generic kubeflow issue.

If kubeflow users install via kfctl.sh then they will use whatever ksonnet version they have installed. So I think users would just need to pick a newer version of ksonnet.

@jlewi
Copy link
Contributor

jlewi commented Dec 7, 2018

@IronPan Where do things stand in terms of getting rid of the pipelines specific bootstrapper? I believe that would fix the issue since then pipelines would be installed by kfctl.sh which would use which ever version of ks the user has installed.

@IronPan
Copy link
Member

IronPan commented Dec 10, 2018

this is fixed. Please refer to https://www.kubeflow.org/docs/guides/pipelines/deploy-pipelines-service/ to deploy latest pipeline.

@IronPan IronPan closed this as completed Dec 10, 2018
@jinchihe
Copy link
Member

@IronPan From you reference, the latest pipeline only supported by GKE. Personally I suggest to reserve bootstrapper way to install pipeline with ICP, OpenShit etc... Agree? if yes, I can try a PR to fix the issue. Thanks.

@gyliu513
Copy link
Member Author

+1 to @jinchihe , we should support all Kubernetes distributions, someone may not using GKE but other Kubernetes distributions.

@IronPan comments? Thanks.

@jlewi
Copy link
Contributor

jlewi commented Jan 2, 2019

@jinchihe @gyliu513 What does bootstrapper have to do with support non GKE versions?

We should definitely support all Kubernetes distributions but this should be orthogonal to bootstrapper.

The idea of bootstrapper was that it fired of a K8s job that ran on the cluster and performed commands (e.g. ks) that a user would normally perform on their local client.

In Kubeflow we found this approach of doing things server side to be problematic and moved away from it. Some more info here:
https://github.com/kubeflow/kubeflow/blob/master/docs_dev/kubeflow_deployment.md#yaml-manifests

In particular, it limited the ability of the user to customize things.

@jinchihe
Copy link
Member

jinchihe commented Jan 3, 2019

@jlewi Thanks. Got the reason for moving away from bootstrapper. I noticed the last kubeflow installer can install pipeline component together, that's great. so I think we should try to enhance kubeflow installation related to support ICP if needed.

Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants