Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant access kubeflow dashboard after using "kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 8085:80" #332

Closed
TranThanh96 opened this issue Aug 23, 2022 · 27 comments
Labels
bug Something isn't working

Comments

@TranThanh96
Copy link

I tried install kubeflow on aws with s3 storage by following tutorial from https://awslabs.github.io/kubeflow-manifests/docs/deployment/
Everything works well except the last step to access kubeflow dashboard: kubectl port-forward svc/istio-ingressgateway -n istio-system --address 0.0.0.0 8085:80

After using port-fowarding, I cant access http://localhost:8080/
This page gave me 403 error: You don't have authorization to view this page!
How can I fix this?

@TranThanh96 TranThanh96 added the bug Something isn't working label Aug 23, 2022
@AlexandreBrown
Copy link
Contributor

@TranThanh96 Can you make sure your command is the same as the doc https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/#port-forward ?

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

@ryansteakley
Copy link
Contributor

Hey @TranThanh96, I responded to you on slack, can you additionally specify which deployment option you ran, was it the rds-s3?

@TranThanh96
Copy link
Author

@TranThanh96 Can you make sure your command is the same as the doc https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/#port-forward ?

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

ya, I tried both with and without --address 0.0.0.0
still cant access from browser

@TranThanh96
Copy link
Author

Hey @TranThanh96, I responded to you on slack, can you additionally specify which deployment option you ran, was it the rds-s3?

I use s3 only

@TranThanh96
Copy link
Author

after re-installing everything, I can reach the login page now

@ryansteakley
Copy link
Contributor

sounds good, verify you are able to login and run any samples you wish.

@TranThanh96
Copy link
Author

TranThanh96 commented Aug 24, 2022

sounds good, verify you are able to login and run any samples you wish.

@ryansteakley I cant see any example pipelines in dashboard
image

and I cant create a new notebook server, error: 0/1 nodes are available : 1 too many pods
log.txt

@ryansteakley
Copy link
Contributor

looks like you have several pods in crashloop backoff. Is your instance the same size or similar to the one described in https://awslabs.github.io/kubeflow-manifests/docs/deployment/prerequisites/ Did you follow the auto-setup python script?

@ryansteakley
Copy link
Contributor

run kubectl describe pod -n and similarily kubectl logs -n on the pods in failure state. and share anything you find there as well

@TranThanh96
Copy link
Author

@ryansteakley
Copy link
Contributor

Warning Failed 34m (x5 over 34m) kubelet Error: secret "mlpipeline-minio-artifact" not found in ml-pipeline logs. Can you check to see if this secret exists. Run kubectl get secrets -n kubeflow

@TranThanh96
Copy link
Author

secrets_kf_log.txt
seem like it exist
image

@ryansteakley
Copy link
Contributor

Can you verify that you are using v3.2.0 of kustomize? Run kubectl delete pods -n kubeflow --all and see if the pods come up normally.

@TranThanh96
Copy link
Author

yes, I am using kustomize v3.2.0
image

Tried kubectl delete pods -n kubeflow --all but the pod metadata-grpc-deployment-f8d68f687-mqs82 keep crashloopbackoff

image

@ryansteakley
Copy link
Contributor

What do you see when you login? Are any other pods still failing?

@TranThanh96
Copy link
Author

everything is good except those 3 pods keep crashloopbackoff
image

and I get some errors on Pipelines and Runs, any suggestions please?

image
image

errors on Runs:
image
image

and these pods:
image

@ryansteakley
Copy link
Contributor

Can you verify that the s3-secret you created is following this requirement. Configure a Secret (e.g. s3-secret) with your AWS credentials. These need to be long-term credentials from an IAM user and not temporary.

@TranThanh96
Copy link
Author

TranThanh96 commented Aug 24, 2022

yes, I can confirm that. How can I give you a evidence?

@ryansteakley
Copy link
Contributor

No way, to prove. Can you one more time describe the ml-pipeline pod. I would suggest restarting from a fresh cluster, and follow the cluster pre-req listed above.

@TranThanh96
Copy link
Author

Yes, this is 3rd times I re-install kubeflow on aws eks from a fresh cluster. and this error keep occurring

@ryansteakley
Copy link
Contributor

Sorry you are running into these problems, if you can please share the logs from the latest crashloopbackoff mlpipeline. Which version of AWS kubeflow are you running? I will try to reproduce your issue on my end and see if there is some underlying issue.

@TranThanh96
Copy link
Author

Sorry you are running into these problems, if you can please share the logs from the latest crashloopbackoff mlpipeline. Which version of AWS kubeflow are you running? I will try to reproduce your issue on my end and see if there is some underlying issue.

how can I get these log? I can provide it to you.
I am using this version:
KUBEFLOW_RELEASE_VERSION=v1.5.1
AWS_RELEASE_VERSION=v1.5.1-aws-b1.0.1

@ryansteakley
Copy link
Contributor

kubectl logs <ml-pipeline-pod> -n kubeflow i see you are running 2 node: t3.xlarge, we reccomend a minimum of 5 nodes and m5.xlarge. Stated here https://awslabs.github.io/kubeflow-manifests/docs/deployment/prerequisites/ if you have time try to re-create following the suggested cluster create command

@TranThanh96
Copy link
Author

kubectl logs -n kubeflow

image
This is log from ml-pipeline

@surajkota
Copy link
Contributor

surajkota commented Aug 25, 2022

@ryansteakley @TranThanh96 I think this is because of a bug related to missing mysql deployment in S3 only deployment option. It was fixed in main branch recently but not backported to release branch #310

@TranThanh96 Can you comment out this like - disable-mysql-pv-claim.yaml in awsconfigs/apps/pipeline/s3/kustomization.yaml and run

kustomize build awsconfigs/apps/pipeline/s3 | kubectl apply -f -

Please delete the pods which are in crashloopbackoff after doing this so that a new pod gets created

@TranThanh96
Copy link
Author

@ryansteakley @TranThanh96 I think this is because of a bug related to missing mysql deployment in S3 only deployment option. It was fixed in main branch recently but not backported to release branch #310

@TranThanh96 Can you comment out this like - disable-mysql-pv-claim.yaml in awsconfigs/apps/pipeline/s3/kustomization.yaml and run

kustomize build awsconfigs/apps/pipeline/s3 | kubectl apply -f -

Please delete the pods which are in crashloopbackoff after doing this so that a new pod gets created

yes, I try with rds + s3 deployment. everything works. So the problem is related to mysql

@surajkota
Copy link
Contributor

Thanks for reporting this issue. We have released a patch version (v1.5.1-aws-b1.0.2) to fix this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants