Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace fails to start with DNS configuration issue #14050

Closed
8 of 23 tasks
SDAdham opened this issue Jul 28, 2019 · 18 comments
Closed
8 of 23 tasks

Workspace fails to start with DNS configuration issue #14050

SDAdham opened this issue Jul 28, 2019 · 18 comments
Labels
area/plugin-broker kind/bug Outline of a bug - must adhere to the bug report template. status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach

Comments

@SDAdham
Copy link

SDAdham commented Jul 28, 2019

Describe the bug

When starting up any workspace, I get the following message:

Starting Init Plugin Broker
Cleaning /plugins dir
Unified Che Plugin Broker
List of plugins and editors to install
- che-incubator/typescript/latest - Typescript language features
- eclipse/che-machine-exec-plugin/0.0.1 - Che Plug-in with che-machine-exec service to provide creation terminal or tasks for Eclipse CHE workspace machines.
- eclipse/che-theia/7.0.0-next - Eclipse Theia, get the latest release each day.
Starting Che plugins and editor processing
Starting VS Code and Theia plugins processing
Downloading VS Code extension for plugin 'che-incubator/typescript/latest'
Get https://github.com/che-incubator/ms-code.typescript/releases/download/v1.35.1/che-typescript-language-1.35.1.vsix: x509: certificate is valid for <myDomain>, not github.com
Error: Failed to run the workspace: "Plugins installation process failed. Error: Plugin broking process for workspace workspacetmujzgxq3e83ylyx failed with error: Get https://github.com/che-incubator/ms-code.typescript/releases/download/v1.35.1/che-typescript-language-1.35.1.vsix: x509: certificate is valid for <myDomain>, not github.com"

Che version

  • latest
  • nightly
  • other: please specify

7.0.0-rc-5.0-SNAPSHOT

Steps to reproduce

  1. Deploy che using either of the following:
  • chectl server:start --installer=helm --domain=<myDomain> --multiuser --platform=k8s --tls --self-signed-cert
  • chectl server:start --installer=helm --domain=<myDomain> --multiuser --platform=k8s --tls or;
  • chectl server:start --installer=helm --domain=<myDomain> --multiuser --platform=k8s
  1. Create an organization
  2. Create & startup any workspace

Expected behavior

Runtime

  • kubernetes (include output of kubectl version)
  • Openshift (include output of oc version)
  • minikube (include output of minikube version and kubectl version)
  • minishift (include output of minishift version and oc version)
  • docker-desktop + K8S (include output of docker version and kubectl version)
  • other: (please specify)

Screenshots

image

Installation method

  • chectl
  • che-operator
  • minishift-addon
  • I don't know

chectl/0.0.2-a74ad81 linux-x64 node-v10.4.1

Environment

  • my computer
    • Windows
    • Linux
    • macOS
  • Cloud
    • Amazon
    • Azure
    • GCE
    • other (please specify)
  • other: please specify

private cloud hosted by ubuntu server

Additional context

Looks relevant to #12999 & #13685 maybe?

@SDAdham
Copy link
Author

SDAdham commented Jul 29, 2019

My guts is telling that this is rooted from https://github.com/eclipse/che/blob/caae00e21b2fd316560a406f2e1d8ad98edf1dad/workspace-loader/devfile.yaml however, I still tend to believe that it might be related to my environment? i.e running Kuberentes behind load balancer? But not sure how is that can cause certificate issue on third parties?

If it is related to taht devfile.yaml, then it would make sense we also check https://github.com/eclipse/che/blob/caae00e21b2fd316560a406f2e1d8ad98edf1dad/dashboard/devfile.yaml ?

@SDAdham
Copy link
Author

SDAdham commented Jul 29, 2019

hi @metlos and @AndrienkoAleksandr , i see that you guys are ones who contributed in the yaml file mentioned above, can you please let me know if you have any thoughts that could help please?

@benoitf
Copy link
Contributor

benoitf commented Jul 29, 2019

it might be related to #14035 if you're using self-certificates ?

@sleshchenko
Copy link
Member

@SDAdham Please provide more info about your Che installation, command how you deploy.

@SDAdham
Copy link
Author

SDAdham commented Jul 29, 2019

@sleshchenko : please see Steps to reproduce in the description, I tried all ways
@benoitf : it doesn't work with or without self-signed certificate

I reached out to kubernetes community and based on their intel, it's got something to do with the image itself whether or not the ca-certificates package is installed or not, considering that it's Alpine, I'm not sure how to check https://www.cyberciti.biz/faq/10-alpine-linux-apk-command-examples/#8 and see if I can find ca-certificates

@sunix sunix added kind/bug Outline of a bug - must adhere to the bug report template. area/plugin-broker status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Jul 29, 2019
@skabashnyuk
Copy link
Contributor

CC @davidfestal @ibuziuk

@sleshchenko
Copy link
Member

Looks like DNS issue,

Get https://github.com/che-incubator/...age-1.35.1.vsix: x509: certificate is valid for <myDomain>, not github.com"

indicates that github.com host might be resolved with Che Server IP.

@skryzhny
Copy link

I doubt it related to che, more like infrastructure or DNS problem.
Can you do curl like curl -v https://github.com/ from workspace pod?

@SDAdham
Copy link
Author

SDAdham commented Jul 29, 2019

yes, it's working from the kubernetes worker that hosts the che's pod

Not working from che's pod:
image

@skryzhny
Copy link

from che workspace pod please do the command: cat /etc/resolv.conf
If output contains nameserver 127...... then stop/disable dnsmasq service on host and restart kubernetes cluster.

@l0rd l0rd added status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Jul 29, 2019
@l0rd
Copy link
Contributor

l0rd commented Jul 29, 2019

@SDAdham please let us know how it goes with @skryzhny suggestion. Is this a new Kubernetes cluster you are trying out?

@l0rd
Copy link
Contributor

l0rd commented Jul 29, 2019

@skabashnyuk is there any particular reason why you have labelled it team/osio? It looks to me a certificate issue.

@skabashnyuk
Copy link
Contributor

Wrong decision. team label removed.

@SDAdham
Copy link
Author

SDAdham commented Jul 31, 2019

Hello everyone, I'm sorry for being inactive, I've been extremely busy in these couple of days and I might not be available in the next upcoming days.

@skryzhny / @sleshchenko you're right, it's an issue with the certificates, and apparently the DNS based on cert-manager/cert-manager#641 , I have no idea how to solve it, but did some diggings and can confirm you're correct

I'll see what I can do about:

from che workspace pod please do the command: cat /etc/resolv.conf
If output contains nameserver 127...... then stop/disable dnsmasq service on host and restart kubernetes cluster.

I'll see if I can do this, the thing is I'm using juju/conjure up for my kubernetes cluster and I don't know how to achieve this exactly, because not only the problem is happening on the che's pod side but also the cert-manager's pod as well, and I'd rather to solve this issue for all not only che.

Since this is not related to che, if you guys want to close this case, it's fine for me as I will understand, but if you want to keep it open as courtesy (to provide advises) I'll be much appreciated since this issue might be happening for others as well and if I managed to solve it, I'll write a full summary here of what I did.

Thanks @skabashnyuk, @l0rd, @skryzhny and @sleshchenko!

@SDAdham
Copy link
Author

SDAdham commented Jul 31, 2019

I'm not sure if I'm mistaken, but I think this issue is best moved to che's documentation https://github.com/eclipse/che-docs/ space, as maybe this might be handy to be added to the troubleshoot installation doc?

@slemeur slemeur changed the title Workspace fails to start Workspace fails to start with DNS configuration issue Jul 31, 2019
@SDAdham
Copy link
Author

SDAdham commented Aug 3, 2019

determined the issue, it was bind9 service that was down and causing this issue, once fixed it, my cert-manager was able to work normally, I expect the same for the che pods, it's all 100% dns issues

@SDAdham SDAdham closed this as completed Aug 3, 2019
@SDAdham SDAdham reopened this Aug 4, 2019
@SDAdham
Copy link
Author

SDAdham commented Aug 4, 2019

@skryzhny : running cat /etc/resolv.conf from che's pod gives the following

nameserver 10.152.183.209
options ndots:5

cert-manager is working and no more certificate issues, it's only che's pod afaik

@SDAdham
Copy link
Author

SDAdham commented Aug 25, 2019

This issue has been resolved, the whole issue was actually produced by maas where when it would compose new machine, it will automatically add search into the DNS settings. Removing those search values from the DNS configuration helped resolving the issue immediately and now that the DNS is stable.

Bug has been reported to the MAAS team https://bugs.launchpad.net/maas/+bug/1841334

Thanks everyone!

@SDAdham SDAdham closed this as completed Aug 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/plugin-broker kind/bug Outline of a bug - must adhere to the bug report template. status/analyzing An issue has been proposed and it is currently being analyzed for effort and implementation approach
Projects
None yet
Development

No branches or pull requests

7 participants