Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Terraform - SSH authentication failed #43

Closed
tinohager opened this issue Feb 9, 2021 · 11 comments
Closed

Windows Terraform - SSH authentication failed #43

tinohager opened this issue Feb 9, 2021 · 11 comments
Assignees
Labels
kind/documentation Improvements or additions to documentation

Comments

@tinohager
Copy link

tinohager commented Feb 9, 2021

Hi i try to create a k3s cluster on hetzner cloud with this terraform script, the script run in a timeout on connect the machine over ssh. I tried to manually create a server and assign the key so the key worked fine. But when I start the script, unfortunately it does not work where to look for the private key?

Error: timeout - last error: SSH authentication failed (root@XXX.XXX.XXX.XXX:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain

module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Connecting to remote host via SSH...
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Host: XXX.XXX.XXX.XXX
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): User: root
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Password: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Private key: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Certificate: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): SSH Agent: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Checking Host Key: false

My windows commands to generate a key

# Change the Windows Service config for "OpenSSH Authentication Agent"
sc config "ssh-agent" start=delayed-auto
sc start "ssh-agent"

# Create a private/public key pair
ssh-keygen -t ecdsa -b 521 -f myKey

ssh-add myKey
@tinohager tinohager changed the title Error: timeout - last error: SSH authentication failed (root@XXX.XXX.XXX.XXX:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain Windows Terraform - SSH authentication failed Feb 9, 2021
@xunleii xunleii self-assigned this Feb 10, 2021
@xunleii xunleii added the bug label Feb 10, 2021
@xunleii
Copy link
Owner

xunleii commented Feb 10, 2021

I don't know why Terraform don't use your SSH agent 🤔 ... Just to be sure, your k3s instance are instantiated with your public key ?

Also, if I remember, SSH agent is only available with Pagent on Windows (cf. https://www.terraform.io/docs/language/resources/provisioners/connection.html#agent), so I don't understand why it works for other instances.

I'm sorry, I never use Terraform on Windows directly (only on WSL), so I don't know how to resolve this issue :(

@tinohager
Copy link
Author

tinohager commented Feb 10, 2021

I have try now with Pagent i can connect with putty over pagent (without password) but it not work with terraform.

I am not sure if here should not be true

module.k3s.null_resource.k8s_ca_certificates_install[4] (remote-exec): SSH Agent: false

https://stackoverflow.com/a/58781305/6097503

@tinohager
Copy link
Author

tinohager commented Feb 10, 2021

I have installed an ubuntu on my windows machine with WSL2. But i have the same error...
I can connect with ssh over keyfile with this linux machine to the linux server in the hetzner cloud.

Error: timeout - last error: SSH authentication failed (root@XXX.XXX.XXX.XXX:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none], no supported methods remain

ssh-keygen
more /root/.ssh/id_rsa.pub

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
apt install terraform
git clone https://github.com/xunleii/terraform-module-k3s.git
cd terraform-module-k3s/examples/hcloud-k3s/
terraform init
terraform apply

@xunleii
Copy link
Owner

xunleii commented Feb 12, 2021

I don't see why it is not working. As I see, your are using the given example.

Have you something like that when you use WSL ?

module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Connecting to remote host via SSH...
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Host: XXX.XXX.XXX.XXX
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): User: root
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Password: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Private key: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Certificate: false
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): SSH Agent: true
module.k3s.null_resource.k8s_ca_certificates_install[0] (remote-exec): Checking Host Key: false

If yes, running the following commands can solve your problem ?

git clone https://github.com/xunleii/terraform-module-k3s.git
cd terraform-module-k3s/examples/hcloud-k3s/

ssh-keygen -f hcloud.id_rsa
ssh-add hcloud.id_rsa
terraform init
terraform apply --var ssh_key="$(cat hcloud.id_rsa.pub)"

Sorry if it tooks time to solve this problem, I never encounter this issue 😞

@tinohager
Copy link
Author

Thanks for your support I won't be back in the office until Monday then I will review your suggestion.

@tinohager
Copy link
Author

tinohager commented Feb 18, 2021

First i have this error Could not open a connection to your authentication agent.
I execute this eval "$(ssh-agent)"

After that i have problems with the certificates

module.k3s.null_resource.agents_label["k3s-agent-0_node|node.kubernetes.io/pool"]: Creation complete after 13s [id=4521186966452460074]
module.k3s.null_resource.agents_label["k3s-agent-1_node|node.kubernetes.io/pool"] (remote-exec): node/k3s-agent-1 labeled
module.k3s.null_resource.agents_label["k3s-agent-2_node|node.kubernetes.io/pool"] (remote-exec): node/k3s-agent-2 labeled
module.k3s.null_resource.agents_label["k3s-agent-1_node|node.kubernetes.io/pool"]: Creation complete after 14s [id=495787829675262765]
module.k3s.null_resource.agents_label["k3s-agent-2_node|node.kubernetes.io/pool"]: Creation complete after 14s [id=8595809587793038789]
module.k3s.null_resource.kubernetes_ready: Creating...
module.k3s.null_resource.kubernetes_ready: Creation complete after 0s [id=8453258558289144403]
kubernetes_service_account.bootstrap: Creating...
kubernetes_cluster_role_binding.boostrap: Creating...

Error: Post "https://XX.XX.XX.XX:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings": x509: certificate signed by unknown authority

Error: Post "https://XX.XX.XX.XX:6443/api/v1/namespaces/default/serviceaccounts": x509: certificate signed by unknown authority

@xunleii
Copy link
Owner

xunleii commented Feb 18, 2021

Thanks for your help. Firstly, good news, your cluster is provisioned. But in fact, this certificates error is really weird and should not occurs. Have you change something on any example/hcloud-k3s ?

Can you try this inside the directory example/hcloud-k3s ?

# add output with the generated kubeconfig
cat <<EOF >> outputs.tf

output "kubeconfig" {
  value = module.k3s.kube_config
}
EOF

# generate the kubeconfig
terraform output kubeconfig --raw > kubeconfig

# test if you can access on your side
KUBECONFIG=./kubeconfig kubectl version

I think, you will have the same certificates problem. If it doesn't work, can you compare the certificates inside the generated kubeconfig and the ones present in /etc/rancher/k3s/k3s.yaml on a control-plane node ? They must be differs.

@tinohager
Copy link
Author

tinohager commented Feb 19, 2021

Hi, i have reset my wsl ubuntu container and also create a new clean project in hetzner cloud. Now the terraform setup is completed without a failure.

No i have try to add the new cluster to my rancher, but it is always on pending.

kubectl get events

root@k3s-control-plane-0:~# kubectl get events
LAST SEEN   TYPE      REASON                    OBJECT                     MESSAGE
25m         Normal    Starting                  node/k3s-agent-0           Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-agent-0           invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-agent-0           Node k3s-agent-0 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-agent-0           Node k3s-agent-0 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-agent-0           Node k3s-agent-0 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-agent-0           Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-agent-0           Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-agent-0           Node k3s-agent-0 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-agent-0           Node k3s-agent-0 event: Registered Node k3s-agent-0 in Controller
25m         Normal    Starting                  node/k3s-agent-1           Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-agent-1           invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-agent-1           Node k3s-agent-1 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-agent-1           Node k3s-agent-1 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-agent-1           Node k3s-agent-1 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-agent-1           Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-agent-1           Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-agent-1           Node k3s-agent-1 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-agent-1           Node k3s-agent-1 event: Registered Node k3s-agent-1 in Controller
25m         Normal    Starting                  node/k3s-agent-2           Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-agent-2           invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-agent-2           Node k3s-agent-2 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-agent-2           Node k3s-agent-2 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-agent-2           Node k3s-agent-2 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-agent-2           Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-agent-2           Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-agent-2           Node k3s-agent-2 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-agent-2           Node k3s-agent-2 event: Registered Node k3s-agent-2 in Controller
26m         Normal    Starting                  node/k3s-control-plane-0   Starting kubelet.
26m         Warning   InvalidDiskCapacity       node/k3s-control-plane-0   invalid capacity 0 on image filesystem
26m         Normal    NodeHasSufficientMemory   node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeHasSufficientMemory
26m         Normal    NodeHasNoDiskPressure     node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeHasNoDiskPressure
26m         Normal    NodeHasSufficientPID      node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeHasSufficientPID
26m         Normal    NodeAllocatableEnforced   node/k3s-control-plane-0   Updated Node Allocatable limit across pods
26m         Normal    Starting                  node/k3s-control-plane-0   Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-control-plane-0   Node k3s-control-plane-0 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-control-plane-0   Node k3s-control-plane-0 event: Registered Node k3s-control-plane-0 in Controller
25m         Normal    Starting                  node/k3s-control-plane-1   Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-control-plane-1   invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-control-plane-1   Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-control-plane-1   Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-control-plane-1   Node k3s-control-plane-1 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-control-plane-1   Node k3s-control-plane-1 event: Registered Node k3s-control-plane-1 in Controller
25m         Normal    Starting                  node/k3s-control-plane-2   Starting kubelet.
25m         Warning   InvalidDiskCapacity       node/k3s-control-plane-2   invalid capacity 0 on image filesystem
25m         Normal    NodeHasSufficientMemory   node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeHasSufficientMemory
25m         Normal    NodeHasNoDiskPressure     node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeHasNoDiskPressure
25m         Normal    NodeHasSufficientPID      node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeHasSufficientPID
25m         Normal    NodeAllocatableEnforced   node/k3s-control-plane-2   Updated Node Allocatable limit across pods
25m         Normal    Starting                  node/k3s-control-plane-2   Starting kube-proxy.
25m         Normal    NodeReady                 node/k3s-control-plane-2   Node k3s-control-plane-2 status is now: NodeReady
25m         Normal    RegisteredNode            node/k3s-control-plane-2   Node k3s-control-plane-2 event: Registered Node k3s-control-plane-2 in Controller

kubectl get pods --show-labels --all-namespaces

NAMESPACE       NAME                                      READY   STATUS    RESTARTS   AGE   LABELS
cattle-system   cattle-cluster-agent-867b645bf4-852ll     0/1     Pending   0          17m   app=cattle-cluster-agent,pod-template-hash=867b645bf4
kube-system     coredns-854c77959c-8jmp2                  0/1     Pending   0          19m   k8s-app=kube-dns,pod-template-hash=854c77959c
kube-system     helm-install-traefik-qwl78                0/1     Pending   0          19m   controller-uid=d4d0cd35-6752-4089-9385-5f192a34d47c,helmcharts.helm.cattle.io/chart=traefik,job-name=helm-install-traefik
kube-system     local-path-provisioner-7c458769fb-l2g5h   0/1     Pending   0          19m   app=local-path-provisioner,pod-template-hash=7c458769fb
kube-system     metrics-server-86cbb8457f-cc9jk           0/1     Pending   0          19m   k8s-app=metrics-server,pod-template-hash=86cbb8457f

kubectl --namespace=kube-system describe pod helm-install-traefik-qwl78

root@k3s-control-plane-0:~# kubectl --namespace=kube-system describe pod helm-install-traefik-qwl78
Name:           helm-install-traefik-qwl78
Namespace:      kube-system
Priority:       0
Node:           <none>
Labels:         controller-uid=d4d0cd35-6752-4089-9385-5f192a34d47c
                helmcharts.helm.cattle.io/chart=traefik
                job-name=helm-install-traefik
Annotations:    helmcharts.helm.cattle.io/configHash: SHA256=1155364EEC7C9D81A413F9E187ED8628CD250E20343E081F0FB08A8BB4E101CD
Status:         Pending
IP:
IPs:            <none>
Controlled By:  Job/helm-install-traefik
Containers:
  helm:
    Image:      rancher/klipper-helm:v0.4.3
    Port:       <none>
    Host Port:  <none>
    Args:
      install
    Environment:
      NAME:              traefik
      VERSION:
      REPO:
      HELM_DRIVER:       secret
      CHART_NAMESPACE:   kube-system
      CHART:             https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz
      HELM_VERSION:
      TARGET_NAMESPACE:  kube-system
      NO_PROXY:          .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
    Mounts:
      /chart from content (rw)
      /config from values (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from helm-traefik-token-hv62r (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  values:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-values-traefik
    Optional:  false
  content:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik
    Optional:  false
  helm-traefik-token-hv62r:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  helm-traefik-token-hv62r
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  18m   default-scheduler  0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  18m   default-scheduler  0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  17m   default-scheduler  0/3 nodes are available: 3 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  17m   default-scheduler  0/6 nodes are available: 1 node(s) had taint {dedicated: gpu}, that the pod didn't tolerate, 5 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.

@xunleii
Copy link
Owner

xunleii commented Feb 19, 2021

I'm sorry for that, this behavior is "expected" but not documented.
All your nodes are currently uninitialized (which is describe by the taint node.cloudprovider.kubernetes.io/uninitialized: true), because they need the hcloud-cloud-controller-manager. (see https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/ for more documentation about this subject)

This is due to the k3s flag --kubelet-arg cloud-provider=external (cf. https://github.com/xunleii/terraform-module-k3s/blob/master/examples/hcloud-k3s/k3s.tf#L16). In order to fix that, you have two choices:

  • using the cloud-controller-manager from hetzner (I recommend it, because it gaves you the ability to annotate your node with some useful labels like failure-domain.beta.kubernetes.io/region or failure-domain.beta.kubernetes.io/zone, which can be used for high-availability applications on several availability zone). Be careful: this controller must only be used if your nodes are hosted on hetzner cloud. If you use another cloud provider or something else (vsphere for example), you must use the right cloud-controller-manager.
  • removing the flags from the list.

EDIT: also, the hcloud-k3s is an example, and I do not recommend using it as is; for example, one node has the taint dedicated: gpu

@xunleii xunleii added documentation and removed bug labels Feb 19, 2021
@tinohager
Copy link
Author

Okay I see. I'm probably a born tester for this project. 🤪
Thanks for your quick feedback and great support.

Maybe you could publish another example that generates a normal cluster that can be used in Rachner without further ado - that would be really great. Of course I would like to test it again.

@xunleii
Copy link
Owner

xunleii commented Feb 19, 2021

All testers are welcome 😉
Thanks too for your responses and for this issue; I need to write more documentation in order to make this project more easier to use and to debug

Also, I will try to add an example with a very simple cluster, on a less "exotic" provider (like GCP or AWS)

@xunleii xunleii removed the question label Feb 19, 2021
@xunleii xunleii added the kind/documentation Improvements or additions to documentation label Oct 5, 2021
@xunleii xunleii closed this as completed Nov 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants