Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to enable kubeflow #958

Closed
titsuki opened this issue Feb 11, 2020 · 72 comments
Closed

Failed to enable kubeflow #958

titsuki opened this issue Feb 11, 2020 · 72 comments
Assignees

Comments

@titsuki
Copy link

titsuki commented Feb 11, 2020

inspection-report-20200211_171636.tar.gz

  • Error Message:
$ microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
Enabling juju...
Kubeflow could not be enabled:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
 22  116M   22 25.8M    0     0  51231      0  0:39:39  0:08:48  0:30:51 71301
curl: (18) transfer closed with 94828385 bytes remaining to read

Command '('microk8s-enable.wrapper', 'juju')' returned non-zero exit status 1
Failed to enable kubeflow
  • microk8s version:
$ sudo snap find microk8s
Name      Version  Publisher   Notes    Summary
microk8s  v1.17.2  canonical✓  classic  Kubernetes for workstations and appliances

Maybe this issue is same as #943 but I'm a microk8s newbie and cannot make a
correct judgement about it.

@mattk08
Copy link

mattk08 commented Feb 11, 2020

I'm having a similar problem enabling kubeflow, slightly different error

microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
Enabling juju...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (77) Problem with the SSL CA cert (path? access rights?)

Command '('microk8s-enable.wrapper', 'juju')' returned non-zero exit status 1
snap find microk8s
Name      Version  Publisher   Notes    Summary
microk8s  v1.17.2  canonical✓  classic  Kubernetes for workstations and appliances

@mattk08
Copy link

mattk08 commented Feb 11, 2020

#943 (comment)

suggested

microk8s.disable kubeflow
microk8s.enable kubeflow

but that did not help

@andrewcheny
Copy link

My error message looks like this:

Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 2
Failed to enable kubeflow

@andrewcheny
Copy link

I think this is the same as #943

@ktsakalozos
Copy link
Member

@titsuki I think the problem you are facing is slow internet connection. If I read the curl output correctly you are downloading with 71k and the juju client is 116M so curl at some point gave up.

@mattk08 are you behind a firewall/proxy? For some reason curl failed to initiate the ssl handshake and download the juju client.

@andrewcheny in your case the juju client is available but the bootstrap failed. Could you share the microk8s.inspect tarball and the result of mirok8s.juju controllers. Thank you.

@mattk08
Copy link

mattk08 commented Feb 12, 2020

are you behind a firewall/proxy? For some reason curl failed to initiate the ssl handshake and download the juju client.

@ktsakalozos I am behind a firewall (no proxy), however, this is the first issue I've had downloading any packages with any package manager.

I was able to find the URL curl was using in enable.juju.sh
run_with_sudo "${SNAP}/usr/bin/curl" -L https://launchpad.net/juju/$JUJU_SERIES/$JUJU_VERSION/+download/juju-$JUJU_VERSION-centos7.tar.gz -o "$SNAP_DATA/tmp/juju.tar.gz"

I was able to download the tar file fine outside of the script. I went ahead and just ran each command in the enable.juju.sh script by hand to enable juju and then tried running microk8s.enable kubeflow and that worked, I think, I might have another problem unrelated to this issue.

@grebennikov
Copy link
Contributor

@ktsakalozos seems I see the same problem, and it only affects ubuntu desktop for whatever reason.
Clean installation on 18.04 server, as well as inside multipass machine on the desktop works fine.
On the desktop itself the container start dying immediately when trying to bootstrap juju, and the node turns into "NotReady".
Tried on 2 different machine with the same result.

$ microk8s.kubectl get nodes
NAME STATUS ROLES AGE VERSION
andreygx1 NotReady 95m v1.17.2

$ microk8s.kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default test-db5495c67-gthfl 0/1 ContainerCreating 0 59m
ingress nginx-ingress-microk8s-controller-bmqq8 0/1 Unknown 0 85m
kube-system coredns-7b67f9f8c-88htl 0/1 Unknown 0 90m
kube-system dashboard-metrics-scraper-687667bb6c-nnvw5 0/1 Unknown 0 85m
kube-system heapster-v1.5.2-5c58f64f8b-hvl5t 0/4 Unknown 0 85m
kube-system hostpath-provisioner-7b9cb5cdb4-mt7fb 0/1 Unknown 0 85m
kube-system kubernetes-dashboard-5c848cc544-c5l9r 0/1 Unknown 0 85m
kube-system monitoring-influxdb-grafana-v4-6d599df6bf-65dkc 0/2 Unknown 0 85m

$ microk8s.inspect
[sudo] password for agrebennikov:
Sorry, try again.
[sudo] password for agrebennikov:
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster

WARNING: Docker is installed.
File "/etc/docker/daemon.json" does not exist.
You should create it and add the following lines:
{
"insecure-registries" : ["localhost:32000"]
}
and then restart docker with: sudo systemctl restart docker
Building the report tarball
Report tarball is at /var/snap/microk8s/1173/inspection-report-20200213_220150.tar.gz

@grebennikov
Copy link
Contributor

@grebennikov
Copy link
Contributor

The issue starts happening when rbac is enabled. In this case I can't spin up pods with "mountvolume.setup failed for volume "default-token-XXX": failed to sync secret cache: timed out waiting for the condition" error.
Once the rbac is disabled pods can be run again.

@ktsakalozos
Copy link
Member

@grebennikov could you help me reproduce this? When you say "it only affects ubuntu desktop" what do you mean?

If you sudo snap remove microk8s and reinstall it with sudo snap install microk8s --classic does the problem persist? Does the microk8s.reset have any effect? Thanks.

@titsuki
Copy link
Author

titsuki commented Feb 16, 2020

@ktsakalozos Thanks for your advice!
AFAIK, curl or wget are robust enough to download a large file under poor internet connection speed (but I think my internet connection speed isn't very lower than the average in Japan).
I tried curl resume option (i.e., -C -) and checked if retry-and-resume approach could work but it doesn't work. The juju package on lanuchpad ( https://launchpad.net/juju/ ) seems not capable to resume.
Isn't there any other better repository that hosts the juju package?

@ktsakalozos
Copy link
Member

Isn't there any other better repository that hosts the juju package?

Would you be able to manualy run the commands found in the microk8s.enable juju script at https://github.com/ubuntu/microk8s/blob/8ec87c8de7258e5c8f13f22e0cccbfd427a096fa/microk8s-resources/actions/enable.juju.sh ?

@titsuki
Copy link
Author

titsuki commented Feb 17, 2020

@ktsakalozos
I tried but I couldn't.

$ microk8s.enable juju
Installing Juju...
[sudo] itoyota のパスワード:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
 14  116M   14 16.6M    0     0  80574      0  0:25:12  0:03:36  0:21:36   99k
curl: (18) transfer closed with 104486649 bytes remaining to read
Failed to enable juju

@titsuki
Copy link
Author

titsuki commented Feb 17, 2020

@ktsakalozos I could observe the same error as $ microk8s.enable juju caused.

$ curl -L "https://launchpad.net/juju/2.7/2.7.1/+download/juju-2.7.1-centos7.tar.gz" --output juju-2.7.1-centos7.tar.gz 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
 13 74.0M   13 10.1M    0     0  31634      0  0:40:54  0:05:35  0:35:19  139k
curl: (18) transfer closed with 67051539 bytes remaining to read

@ktsakalozos
Copy link
Member

You could snap install juju and then cp /snap/juju/current/bin/juju /var/snap/microk8s/current/bin/. Note this is a hack.

@titsuki
Copy link
Author

titsuki commented Feb 17, 2020

@ktsakalozos Your idea works fine to me. snap seems 30~40 times faster than launchpad w.r.t. downloading speed.

@dgrahn
Copy link

dgrahn commented Feb 18, 2020

@ktsakalozos This also works to enable juju. But when I then try to enable Kubeflow, it gives me the following message.

Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
Enabling juju...
Deploying Kubeflow...
Kubeflow could not be enabled:
ERROR cannot load ssh client keys: mkdir /var/snap/microk8s/1173/juju/share: permission denied

Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 2
Failed to enable kubeflow

@evertonberz
Copy link

I am also facing the "Problem with the SSL CA cert" issue. It seems curl from snapd has different behavior than curl in OS. I sniffed the connection and curl-snapd has a FYN,ACK after the TCP handshake, while curl-OS issues a TLS1.2 Client Hello. No idea why.

[root@kubeflow-lab ~]# /var/lib/snapd/snap/microk8s/1173/usr/bin/curl  -L https://launchpad.net/juju/$JUJU_SERIES/$JUJU_VERSION/+download/juju-$JUJU_VERSION-centos7.tar.gz -o juju.tar.gz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (77) Problem with the SSL CA cert (path? access rights?)
[root@kubeflow-lab ~]# /var/lib/snapd/snap/microk8s/1173/usr/bin/curl --version
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets

image

[root@kubeflow-lab ~]# curl -L https://launchpad.net/juju/$JUJU_SERIES/$JUJU_VERSION/+download/juju-$JUJU_VERSION-centos7.tar.gz -o juju.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100  116M  100  116M    0     0  1670k      0  0:01:11  0:01:11 --:--:-- 2816k
[root@kubeflow-lab ~]# curl --version
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.44 zlib/1.2.7 libidn/1.28 libssh2/1.8.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets

image

@titsuki
Copy link
Author

titsuki commented Feb 24, 2020

@dgrahn I faced the same error but this error itself could solve by changing the permission.
There are some other problems:

  • microk8s cannot recognize juju is already installed or not if you use the copied juju.
  • microk8s doesn't seem to have a skip option that ignores already installed or enabled module.

@ktsakalozos
Copy link
Member

@titsuki we have recently addressed these issues with the PR #927 You could try the candidate channel to verify the fix: sudo snap install microk8s --classic --channel=candidate

@titsuki
Copy link
Author

titsuki commented Feb 24, 2020

@ktsakalozos Thanks for your advice. Fortunately, I could make juju enabled (by trying $ microk8s.enable juju many time) but $ microk8s.enable kubeflow caused another error.
This error says https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-54/archive?channel=stable is unavailable but when I checked manually this link seems to work fine:

Deploying Kubeflow...
Kubeflow could not be enabled:
Located bundle "cs:bundle/kubeflow-170"
Resolving charm: cs:~kubeflow-charmers/ambassador-54
Resolving charm: cs:~kubeflow-charmers/argo-controller-53
Resolving charm: cs:~kubeflow-charmers/argo-ui-54
Resolving charm: cs:~kubeflow-charmers/jupyter-controller-54
Resolving charm: cs:~kubeflow-charmers/jupyter-web-56
Resolving charm: cs:~kubeflow-charmers/katib-controller-52
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/katib-manager-50
Resolving charm: cs:~kubeflow-charmers/katib-ui-46
Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-11
Resolving charm: cs:~kubeflow-charmers/kubeflow-gatekeeper-14
Resolving charm: cs:~kubeflow-charmers/kubeflow-login-13
Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-16
Resolving charm: cs:~kubeflow-charmers/metacontroller-43
Resolving charm: cs:~kubeflow-charmers/metadata-api-6
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/metadata-ui-10
Resolving charm: cs:~kubeflow-charmers/minio-54
Resolving charm: cs:~kubeflow-charmers/modeldb-backend-49
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/modeldb-store-45
Resolving charm: cs:~kubeflow-charmers/modeldb-ui-44
Resolving charm: cs:~kubeflow-charmers/pipelines-api-58
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-53
Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-55
Resolving charm: cs:~kubeflow-charmers/pipelines-ui-53
Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-55
Resolving charm: cs:~kubeflow-charmers/pytorch-operator-55
Resolving charm: cs:~kubeflow-charmers/tf-job-dashboard-55
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-53
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-54": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-54": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-54/archive?channel=stable: dial tcp: i/o timeout

Command '('microk8s-juju.wrapper', 'deploy', 'cs:kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmp81lxmw43')' returned non-zero exit status 1

@JIElite
Copy link

JIElite commented Feb 25, 2020

I follow the kubeflow document to install kubeflow with Microk8s:

sudo snap install microk8s --classic --channel=candidate
microk8s.status --wait-ready
microk8s.enable dns dashboard storage
microk8s.enable gpu
microk8s.enable kubeflow

But with this error messages:

Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
[sudo] password for eli: 
Enabling juju...
Deploying Kubeflow...
Kubeflow could not be enabled:
Creating Juju controller "uk8s" on microk8s/localhost
Creating k8s resources for controller "controller-uk8s"
ERROR failed to bootstrap model: creating controller stack for controller: creating statefulset for controller: timed out waiting for controller pod: pending:  - 

Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
Failed to enable kubeflow

@knkski
Copy link
Contributor

knkski commented Feb 28, 2020

@dgrahn: Running these commands should fix the issue you're facing with the permission denied error:

https://github.com/ubuntu/microk8s/blob/050e98b/snap/hooks/configure#L380-L382
https://github.com/ubuntu/microk8s/blob/050e98b/snap/hooks/configure#L386

Where $SNAP_DATA is /var/snap/microk8s/current

Additionally, #989 will fix this issue generally, as it just includes the juju binary in the snap, instead of downloading it after snap install.

@knkski
Copy link
Contributor

knkski commented Feb 28, 2020

@JIElite: I haven't seen that error before. Can you create a new issue with the output from KUBEFLOW_DEBUG=true microk8s.enable kubeflow?

@egkiastas
Copy link

Same problem here on centos 7.7. Installed via snap:

sudo snap install microk8s --classic --channel=candidate
microk8s.enable dns dashboard storage
microk8s.enable kubeflow

Getting the error:

Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
Enabling juju...
Kubeflow could not be enabled:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (77) Problem with the SSL CA cert (path? access rights?)

Command '('microk8s-enable.wrapper', 'juju')' returned non-zero exit status 1
Failed to enable kubeflow

@knkski
Copy link
Contributor

knkski commented Mar 4, 2020

@egkiastas: Can you download https://launchpad.net/juju/2.7/2.7.3/+download/juju-2.7.3-k8s.tar.xz, and place the unzipped juju binary at /var/snap/microk8s/current/bin/juju, then run these commands, and try again?

https://github.com/ubuntu/microk8s/blob/0ebbaed/snap/hooks/configure#L378-L385

#989 will fix the issue you're running into by including the juju binary in the snap instead of downloading it. If you'd like to try that instead of the above, you should be able to by switching microk8s to the latest/edge channel (sudo snap switch microk8s --channel latest/edge && sudo snap refresh).

@amitmsingh
Copy link

amitmsingh commented Mar 4, 2020

@egkiastas: Can you download https://launchpad.net/juju/2.7/2.7.3/+download/juju-2.7.3-k8s.tar.xz, and place the unzipped juju binary at /var/snap/microk8s/current/bin/juju, then run these commands, and try again?

https://github.com/ubuntu/microk8s/blob/0ebbaed/snap/hooks/configure#L378-L385

#989 will fix the issue you're running into by including the juju binary in the snap instead of downloading it. If you'd like to try that instead of the above, you should be able to by switching microk8s to the latest/edge channel (sudo snap switch microk8s --channel latest/edge && sudo snap refresh).

I'm getting the same error as OP and I followed this, but it didn't help.

Here's my error:

`sudo microk8s.enable kubeflow

Enabling dns...

Enabling storage...

Enabling dashboard...

Enabling ingress...

Enabling rbac...

Enabling juju...

Deploying Kubeflow...

Kubeflow could not be enabled:

ERROR microk8s:

running: false

Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit
status 1

Failed to enable kubeflow`

@zshenkle
Copy link

zshenkle commented Apr 13, 2020

Hi. I'm very new to microk8s and kubernetes, but I'm afraid I'm getting a similar problem when trying to enable kubeflow, except with a slightly different error printout at the bottom. This happens when I try to enable:

$ microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling metallb:10.64.140.43-10.64.140.49...
Deploying Kubeflow...
Kubeflow could not be enabled:
Located bundle "cs:bundle/kubeflow-185"
Resolving charm: cs:~kubeflow-charmers/ambassador-78
Resolving charm: cs:~kubeflow-charmers/argo-controller-162
Resolving charm: cs:~kubeflow-charmers/argo-ui-78
Resolving charm: cs:~kubeflow-charmers/cert-manager-controller-9
Resolving charm: cs:~kubeflow-charmers/cert-manager-webhook-9
Resolving charm: cs:~kubeflow-charmers/dex-auth-20
Resolving charm: cs:~kubeflow-charmers/jupyter-controller-176
Resolving charm: cs:~kubeflow-charmers/jupyter-web-81
Resolving charm: cs:~kubeflow-charmers/katib-controller-76
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/katib-manager-75
Resolving charm: cs:~kubeflow-charmers/katib-ui-71
Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-36
Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-41
Resolving charm: cs:~kubeflow-charmers/metacontroller-68
Resolving charm: cs:~kubeflow-charmers/metadata-api-32
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/metadata-envoy-14
Resolving charm: cs:~kubeflow-charmers/metadata-grpc-13
Resolving charm: cs:~kubeflow-charmers/metadata-ui-34
Resolving charm: cs:~kubeflow-charmers/minio-78
Resolving charm: cs:~kubeflow-charmers/modeldb-backend-76
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/modeldb-store-69
Resolving charm: cs:~kubeflow-charmers/modeldb-ui-68
Resolving charm: cs:~kubeflow-charmers/oidc-gatekeeper-19
Resolving charm: cs:~kubeflow-charmers/pipelines-api-82
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-167
Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-163
Resolving charm: cs:~kubeflow-charmers/pipelines-ui-78
Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-102
Resolving charm: cs:~kubeflow-charmers/pipelines-visualization-13
Resolving charm: cs:~kubeflow-charmers/pytorch-operator-163
Resolving charm: cs:~kubeflow-charmers/seldon-core-15
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-159
added resource oci-image
added resource oci-image
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/argo-ui-78": cannot read entity archive: read tcp 10.1.71.99:36898->162.213.33.78:443: read: connection reset by peer
Command '('microk8s-juju.wrapper', 'deploy', 'cs:kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmpw2vspkr3')' returned non-zero exit status 1
Failed to enable kubeflow

I've made several attempts, and the charm that it could not add is different every time. For example, one time I tried to enable, the charm was "cs:~kubeflow-charmers/argo-controller-162". Furthermore, the 'Deploying Kubeflow' part takes about 10 minutes.

Is there a verbose mode while running a microk8s enable action, so I can see what 'Deploying Kubeflow...' gets stuck on? Are there any commands in the enable.kubeflow action file I can run from the terminal to help diagnose this problem? Thanks for any help you can give, I'll also attach my Inspector file, which I created after this failure.
inspection-report-20200413_163326.tar.gz

EDIT:
Well it appears this was fixed by assuring the following namespaces were in resolv.conf (using the resolvconf service):
nameserver 8.8.8.8
nameserver 8.8.4.4
Is this requirement mentioned in any microk8s documentation that you're aware of? Apologies if it was and I couldn't find it.

@knkski
Copy link
Contributor

knkski commented Apr 13, 2020

@zshenkle: For future reference, you can enable more verbose logging like this:

KUBEFLOW_DEBUG=true microk8s.enable kubeflow

As far as the nameserver issue you ran into, I'm not sure why you would have to edit those in. Did you put those in /etc/resolv.conf on the system that you installed microk8s onto? I haven't encountered that issue, but we've got some relevant documentation here that might need to call that out if it's broadly necessary:

https://microk8s.io/docs/addon-dns

cc @ktsakalozos ^^

@rickyking
Copy link

@zshenkle: For future reference, you can enable more verbose logging like this:

KUBEFLOW_DEBUG=true microk8s.enable kubeflow

As far as the nameserver issue you ran into, I'm not sure why you would have to edit those in. Did you put those in /etc/resolv.conf on the system that you installed microk8s onto? I haven't encountered that issue, but we've got some relevant documentation here that might need to call that out if it's broadly necessary:

https://microk8s.io/docs/addon-dns

cc @ktsakalozos ^^

Well if behind proxy, what should I set for DNS SERVER?

@zshenkle
Copy link

@zshenkle: For future reference, you can enable more verbose logging like this:

KUBEFLOW_DEBUG=true microk8s.enable kubeflow

As far as the nameserver issue you ran into, I'm not sure why you would have to edit those in. Did you put those in /etc/resolv.conf on the system that you installed microk8s onto? I haven't encountered that issue, but we've got some relevant documentation here that might need to call that out if it's broadly necessary:

https://microk8s.io/docs/addon-dns

cc @ktsakalozos ^^

@knkski thanks so much for that verbose logging tip! The machine I'm setting up microk8s on is a relatively new machine, and we apparently hadn't gotten around to adding all the necessary nameservers (IE: 8.8.8.8 and 8.8.4.4). And yes we effectively put those in /etc/resolv.conf via the systemd service resolvconf.

@kawa23
Copy link

kawa23 commented Apr 26, 2020

I also met this issue like this

(base) root@iZwz91n3l4i42m2cawbmccZ:~# sudo microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling metallb:10.64.140.43-10.64.140.49...
Waiting for DNS and storage plugins to finish setting up
Kubeflow could not be enabled:
timed out waiting for the condition on deployments/coredns
timed out waiting for the condition on deployments/hostpath-provisioner

Command '('microk8s-kubectl.wrapper', 'wait', '--for=condition=available', '-nkube-system', 'deployment/coredns', 'deployment/hostpath-provisioner', '--timeout=10m')' returned non-zero exit status 1
Failed to enable kubeflow

Here my snap list

(base) root@iZwz91n3l4i42m2cawbmccZ:~# snap list
Name      Version    Rev    Tracking          Publisher   Notes
core      16-2.44.3  9066   latest/stable     canonical✓  core
core18    20200311   1705   latest/stable     canonical✓  base
juju      2.7.6      11454  latest/stable     canonical✓  classic
lxd       4.0.1      14804  latest/stable     canonical✓  -
microk8s  v1.18.2    1357   latest/candidate  canonical✓  classic

And the microk8s.inspect

(base) root@iZwz91n3l4i42m2cawbmccZ:~# sudo microk8s.inspect
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/1357/inspection-report-20200426_183743.tar.gz

Any help would be greatly appreciated.

@knkski
Copy link
Contributor

knkski commented May 4, 2020

@kawa23, can you attach the /var/snap/microk8s/1357/inspection-report-20200426_183743.tar.gz file generated by microk8s.inspect?

@MathewStylianidis
Copy link

MathewStylianidis commented May 11, 2020

@kawa23 I have the exact same problem.

Could it be the case that my Kubernetes version is not supported yet? It is 1.18, while Kubeflow is on alpha status with Microk8s, and 1.18 kubernetes version is not even mentioned on https://www.kubeflow.org/docs/started/k8s/overview/#minimum-system-requirements.

@knkski
Copy link
Contributor

knkski commented May 11, 2020

@MathewStylianidis: It shouldn't be an issue with 1.18. Would you be able to run microk8s.inspect and attach the tarball that it generates here?

@MathewStylianidis
Copy link

@kawa23
Copy link

kawa23 commented Jun 5, 2020

@knkski @markshuttle Sorry for late reply, I have installed kubeflow based on k8s cluster with multipass, and removed the microk8s, so I can't get /var/snap/microk8s/1357/inspection-report-20200426_183743.tar.gz.

@atamahjoubfar
Copy link

I tried to install kubeflow and got the same error:

snap install microk8s --classic
microk8s.status --wait-ready
microk8s.enable dns dashboard storage
microk8s.enable gpu

Everything works up to here, but

microk8s.enable kubeflow

fails with this error:

Resolving charm: cs:~kubeflow-charmers/pytorch-operator-163
Resolving charm: cs:~kubeflow-charmers/seldon-core-15
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-159
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-78": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-78": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-78/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
Command '('microk8s-juju.wrapper', 'deploy', 'cs:kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmpeuizwk4k')' returned non-zero exit status 1
Failed to enable kubeflow

Any idea what is causing it?

@markshuttle
Copy link
Contributor

Looks like a networking issue - it needs to be able to retrieve the charms from jujucharms.com. I think we should do some connectivity tests before we launch into the orchestration of kubeflow, so we catch and warn more appropriately if that is going to fail as it has here.

@atamahjoubfar
Copy link

Looks like a networking issue - it needs to be able to retrieve the charms from jujucharms.com. I think we should do some connectivity tests before we launch into the orchestration of kubeflow, so we catch and warn more appropriately if that is going to fail as it has here.

But I can ping jujucharms.com from the same server.

@ktsakalozos
Copy link
Member

@atamahjoubfar could you share the output of:

KUBEFLOW_DEBUG=true microk8s.enable kubeflow

@atamahjoubfar
Copy link

@atamahjoubfar could you share the output of:

KUBEFLOW_DEBUG=true microk8s.enable kubeflow

It fails at

07:54:17 DEBUG httpbakery client.go:245 } -> error <nil>
07:54:17 INFO  cmd bundle.go:364 Resolving charm: cs:~kubeflow-charmers/tf-job-operator-159
07:54:17 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/tf-job-operator-159/meta/any?include=id&include=supported-series&include=published {
07:54:17 DEBUG httpbakery client.go:245 } -> error <nil>
07:54:17 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-78/meta/any?include=id&include=supported-series&include=published {
07:54:17 DEBUG httpbakery client.go:245 } -> error <nil>
07:54:29 DEBUG juju.api monitor.go:35 RPC connection died
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-78": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-78": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-78/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
07:54:29 DEBUG cmd supercommand.go:519 error stack: 
cannot retrieve charm "cs:~kubeflow-charmers/ambassador-78": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-78/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
/workspace/_build/src/github.com/juju/juju/rpc/client.go:178: 
/workspace/_build/src/github.com/juju/juju/api/apiclient.go:1187: 
/workspace/_build/src/github.com/juju/juju/api/client.go:459: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/store.go:68: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:543: cannot add charm "cs:~kubeflow-charmers/ambassador-78"
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:475: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:164: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:969: cannot deploy bundle
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:1540: 
Command '('microk8s-juju.wrapper', '--debug', 'deploy', 'cs:kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmp6uznal1p')' returned non-zero exit status 1
Failed to enable kubeflow

@knkski
Copy link
Contributor

knkski commented Jun 10, 2020

@atamahjoubfar: That looks like transient network issues. Does it work if you try it again?

@atamahjoubfar
Copy link

@atamahjoubfar: That looks like transient network issues. Does it work if you try it again?

I tried again, and unfortunately, the problem persists. Is there any way to manually download the charms and enable kubeflow from disk?

@knkski
Copy link
Contributor

knkski commented Jun 11, 2020

@atamahjoubfar, @markshuttle: I've opened up #1296 to do a basic network connectivity check before enabling Kubeflow

@atamahjoubfar: If you follow the installation instructions in the README here for deploying the Kubeflow bundle instead of via microk8s.enable kubeflow, does it work for you?

https://github.com/juju-solutions/bundle-kubeflow/

@knkski
Copy link
Contributor

knkski commented Jun 11, 2020

@kawa23, @MathewStylianidis: I believe the issue that you're facing is due to your hostname having capital letters, see this error message:

mai 11 15:06:58 SESTVSDV0002 microk8s.daemon-kubelet[24507]: E0511 15:06:58.330800 24507 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: csinodes.storage.k8s.io "sestvsdv0002" is forbidden: User "system:node:SESTVSDV0002" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope: can only access CSINode with the same name as the requesting node

If you change your hostname to only have lowercase letters, does it work for you?

@atamahjoubfar
Copy link

atamahjoubfar commented Jun 16, 2020

@knkski @markshuttle: I tried to follow these instructions:
https://github.com/juju-solutions/bundle-kubeflow/#setup-microk8s
But got the same error:

+ juju deploy -m kubeflow kubeflow --channel stable --overlay=/tmp/tmp1rzpf5m9
Located bundle "cs:bundle/kubeflow-185"
Resolving charm: cs:~kubeflow-charmers/ambassador-78
Resolving charm: cs:~kubeflow-charmers/argo-controller-162
...
Resolving charm: cs:~kubeflow-charmers/seldon-core-15
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-159
Executing changes:
- upload charm cs:~kubeflow-charmers/ambassador-78 for series kubernetes
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-78": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-78": cannot get archive: Get "https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-78/archive?channel=stable": dial tcp: i/o timeout
Command '('juju', 'deploy', '-m', 'kubeflow', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmp1rzpf5m9')' returned non-zero exit status 1.

I installed the microk8s from the edge channel, and now I can see the error message that you added in #1296.

Couldn't contact api.jujucharms.com
Please check your network connectivity before enabling Kubeflow.
Failed to enable kubeflow

However, from the same server, the following wget works:

wget https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-88/icon.svg

I can ping jujucharms.com, but not api.jujucharms.com. Any ideas?

@knkski
Copy link
Contributor

knkski commented Jun 23, 2020

@atamahjoubfar: Can you paste the output from this command?

microk8s.kubectl get configmap -n kube-system coredns -oyaml

@atamahjoubfar
Copy link

atamahjoubfar commented Jun 24, 2020

@atamahjoubfar: Can you paste the output from this command?

microk8s.kubectl get configmap -n kube-system coredns -oyaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        log . {
          class error
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"Corefile":".:53 {\n    errors\n    health {\n      lameduck 5s\n    }\n    ready\n    log . {\n      class error\n    }\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\n      pods insecure\n      fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n    forward . 8.8.8.8 8.8.4.4\n    cache 30\n    loop\n    reload\n    loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system"}}
  creationTimestamp: "2020-06-23T22:20:52Z"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    k8s-app: kube-dns
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:Corefile: {}
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:labels:
          .: {}
          f:addonmanager.kubernetes.io/mode: {}
          f:k8s-app: {}
    manager: kubectl
    operation: Update
    time: "2020-06-23T22:20:52Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "1124"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 588deb7d-bf61-4096-a553-da7696eea724

Does this help to identify the issue? Thank you.

@bipinm
Copy link

bipinm commented Jul 21, 2020

Got same error on Ubuntu 20.04 Desktop. I am not behind any firewall/proxy and can manually fetch https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-104/archive?channel=stable which downloads ambassador.zip (2.6 MB). I do not face this error on Unbutu 20.04 Server (in AWS)

Tried with both
microk8s v1.18.4
microk8s v1.18.6 1560 latest/edge

microk8s.kubectl get configmap -n kube-system coredns -oyaml gives almost the same output as above except for difference in creationTimestamp, time, uid, resourceVersion: "390"

Output of microk8s.enable kubeflow

...
Deploying Kubeflow...
Kubeflow could not be enabled:
Located bundle "cs:bundle/kubeflow-206"
Resolving charm: cs:~kubeflow-charmers/ambassador-104
Resolving charm: cs:~kubeflow-charmers/argo-controller-190
...
Resolving charm: cs:~kubeflow-charmers/seldon-core-44
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-188
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-104": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-104": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-104/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving

Command '('microk8s-juju.wrapper', 'deploy', 'cs:kubeflow-206', '--channel', 'stable', '--overlay', '/tmp/tmpf6ehtx10')' returned non-zero exit status 1
Failed to enable kubeflow

Errors from coredns pod

.:53
[INFO] plugin/reload: Running configuration MD5 = be0f52d3c13480652e0c73672f2fa263
CoreDNS-1.6.6
linux/amd64, go1.13.5, 6a7a75e
[INFO] 127.0.0.1:59495 - 39173 "HINFO IN 2577415620770853004.8780143945028532492. udp 57 false 512" NOERROR - 0 6.011622149s
[ERROR] plugin/errors: 2 2577415620770853004.8780143945028532492. HINFO: read udp 10.1.21.36:38427->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:37728 - 42831 "HINFO IN 2577415620770853004.8780143945028532492. udp 57 false 512" NOERROR - 0 2.000464057s
...
[ERROR] plugin/errors: 2 2577415620770853004.8780143945028532492. HINFO: read udp 10.1.21.36:44631->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:33385 - 46047 "AAAA IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000755655s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. AAAA: read udp 10.1.21.36:57766->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:36178 - 22767 "A IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000677982s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. A: read udp 10.1.21.36:44211->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:60455 - 56232 "AAAA IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000352953s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. AAAA: read udp 10.1.21.36:45297->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:50299 - 52151 "A IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000257257s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. A: read udp 10.1.21.36:40710->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:39253 - 3642 "AAAA IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000834815s
[ERROR] plugin/errors: 2 api.jujucharms.com. AAAA: read udp 10.1.21.36:38538->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:47372 - 10457 "A IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000768237s
[ERROR] plugin/errors: 2 api.jujucharms.com. A: read udp 10.1.21.36:33745->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:47852 - 7227 "AAAA IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000442768s
[ERROR] plugin/errors: 2 api.jujucharms.com. AAAA: read udp 10.1.21.36:42672->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:60290 - 23521 "A IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000334072s
[ERROR] plugin/errors: 2 api.jujucharms.com. A: read udp 10.1.21.36:52312->8.8.8.8:53: i/o timeout

@knkski
Copy link
Contributor

knkski commented Jul 22, 2020

@bipinm: I'm going to open up a new issue with that output, that looks like the crux of the issue, and unrelated to the original issue for this thread:

#1427

@bipinm
Copy link

bipinm commented Jul 23, 2020

@knkski Thanks. Was actually referring to post from @atamahjoubfar on Jun 10th. i got the same error.
Will continue to check further when i get time and post any updates in the new issue you have created

@knkski
Copy link
Contributor

knkski commented Jul 27, 2020

I think all of the issues in this thread have been resolved. If not, feel free to open up a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests