-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vSphere/VMC Support #147
Comments
Hi, @akutz!
Thanks so much for taking the time to add vSphere/VMC to the CNCF cross cloud dashboard. I hope this note has been helpful. |
Hi @lixuna, Thank you so much for the response! Some quick notes:
Are the environment variables available to the process that runs
I wasn't asking about state with respect to the credentials but rather if we were creating dynamic, public IPs and needed to know which ones were created during the stand-up so we could remove them during the tear down.
I actually tried viewing several presentations and slide decks, but I was told I needed to request permission. I requested it, but have yet to be granted said permission. Perhaps we're referring to different slides? Do you perhaps have a link? Thank you again for your help. If you'd like to join our Zoom meeting, please feel free to join at REDACTED. It's currently in progress. Thanks again! |
@taylor can you please help with the first 2 questions? |
Hi @lixuna, Thank you, my colleague @figo found the YouTube link https://www.youtube.com/watch?v=m-WK-pOs5TA. Thank you again! I had crossed my CIs with K8s |
The Cross-cloud CI KubeCon slides are at: The recording for the Intro is at https://youtu.be/wb7aCAk1VFU, the recording for Deep Dive is at https://youtu.be/m-WK-pOs5TA. |
Hi @lixuna, Is there some property that we can use to determine if the environment is running against stable or |
Howdy, @akutz. All of the setup and tear down should be handled by Terraform. General steps for adding a new cloud provider to the cross-cloud provisioner:
There is a vSphere terraform module available https://www.terraform.io/docs/providers/vsphere/index.html Review the terraform templates in the cross-cloud repo. Examples
After the terraform cloud provisioning is done then the kubernetes deployment itself is all handled by the rest of the cross-cloud code including using stable/HEAD etc. In the Deep Dive video, https://youtu.be/m-WK-pOs5TA?t=156, you can see where the provisioning happens using terraform. If y'all can get the same working for vSphere/VMC then we'll be good to go. |
Hi @taylor, Cool beard. Makes me miss mine: Anyway, thank you for the info. We're aware of the vSphere Terraform provider. Oh, lordy, are we aware :) One of the challenging aspects to this project has been navigating around the differences encountered when using Terraform in combination with vSphere on VMC. Some issues include:
The major challenge is the public IPs. With vSphere on VMC you cannot directly assign public IPs to VMs. You can provide public access to VMs on a private network by requesting a public IP and connecting it to the private IP via NAT or using AWS Elastic Load Balancing (ELB). We're likely pursuing the latter option. Essentially we're handling the challenges created by vSphere on VMC one step at a time. Part of this process has been coming to better understand just what is required of a provider for the Cross-Cloud project. The existing examples are helpful, but please be aware that we need to understand the reasons behind certain actions since we cannot simply connect things to an analogue property ( Thank you again! |
Hi @taylor, I'm watching the YouTube deep dive now. It's very helpful. Quick question, it looks like you're mapping the K8s public IPs to DNS names using a Thanks again! |
After reviewing all of the provided and linked materials, I still have the following, outstanding questions:
Thank you again for your assistance! |
Hi @taylor @lixuna @denverwilliams Could you allow myself/colleagues to test cncf crosscloud vSphere support against this default DNS/etcd server initially, i am able to curl it, but got the following error later on:
Thanks very much |
One other question I had was what version of Docker you use to build the image at the root of this project? I ask because if we need to add anything to the image, such as a custom Terraform provider, one way to do so would be to leverage Docker's multi-stage builds. I've used them before, and they're pretty nice. However, they're only supported in Docker 17.05 and higher. If you use a compatible version then we'd be able to edit the Docker file to look like: FROM golang:latest
WORKDIR /go/src/github.com/vmware/vmc-terraform-provider/
COPY vsphere/vmc-terraform-provider/* .
RUN go get -d -v && go build
# Switch to the second stage of the build, nothing in the first stage
# is preserved except for what is explicitly copied to the following
# stage.
FROM crosscloudci/debian-go:latest
MAINTAINER "Denver Williams <denver@debian.nz>"
...
# Copy the build artifact from the first stage of the build
COPY --from=0 /go/src/github.com/vmware/vmc-terraform-provider/vmc-terraform-provider /cncf/vsphere/vmc-terraform-provider/ |
Could you please provide an e-mail address of someone we could contact to discuss the aforementioned topics? We appreciate the help you've already provided, but perhaps y'all are not the correct people to whom we should be speaking? Thank you again. cc @clintkitson |
@figo, using the default etcd cluster we have running is fine. Any connection reset by peer should be temporary. Are you getting that every time? |
@akutz, right folks. Just busy ;) I'm pulling @denverwilliams into the loop. Re: re other methods of contact
|
Hi @taylor, Thanks. I wasn't sure if you were the correct people or just busy. After 12 days since the last response, I just figured we were bugging the wrong group of misfits :) |
@figo, I just tested with
successfully |
Hi @taylor, I've asked @figo to please hold off on his question above so we can first focus on the general process. The way DNS is managed is unclear. Are your answers implicit affirmation that what @figo found is correct? Or is it one of the possibilities I outlined above? We are not necessarily going to be the people on our team (or off of it) that maintain this in the future, so I'd like to document things clearly for those future folk, be them ourselves or another group of crazed outlanders. Thanks! |
** What version of Docker you use to build the image at the root of this project?
** How is the same kubeconfig file created for use by the deployed environment used by provision.sh to communicate with the remote K8s endpoint? The host in the kubeconfig file would have a .local suffix. Are you modifying the running container's /etc/hosts file, running a local DNS server, or modifying some static DNS server referenced by the container?
|
Hi @denverwilliams, Thank you very much for the response. That does answer the DNS question. We still have the following, outstanding inquiries:
If you could possibly help us with the above questions as well, I would be immensely grateful. Finally, is there any possibility that you've looked at generating a Container Linux Config or Ignition Config in addition to the Cloud-Init config? Thanks again! |
@taylor thanks for reply, |
@figo, maybe your corporate network has a proxy that is dropping those connections? |
@taylor not sure, but thanks for your reply, and we are good with continuing the work on non-corporate network, we can come back for investigation later. |
@taylor @denverwilliams our script is able to add record successfully for address like:
server confirmed with:
but dig the address got SERVFAIL,
i am new to skydns, by googling the error does not help much, |
|
@figo in regards to the skydns issue, i forgot to add "vsphere.local" as an accepted domain to our Coredns config. It has been added now, so hopefully you will be able to resolve names for *.vsphere.local
|
@denverwilliams a million thanks, it magically works now, this is something we will never able to figure out by ourself, it will be great to document this in some place. |
Hi @denverwilliams, Thank you for the reply. Per your suggestion regarding determining the environment type, you're basically recommending the same approach I outlined, correct? Use one of the existing property values and test for the prefix |
it is supported by VMC backend (https://cloud.vmware.com/vmc-aws) both master and worker VMs are based on coreos template. the config data passing to VM is based on coreos ignition. the loadbalancer module has not been implemented yet. notes: there is SSH key included for debugging purpose, will be removed later. both public master ip address and master ips issue will be handled later. Resolves: crosscloudci#147
This patch provides the vSphere Terraform necessary to complete support for the vSphere provider. Currently the design is dependent on a VMware Cloud on AWS (VMC) (https://cloud.vmware.com/vmc-aws) backend, but the intent is to work on vanilla vSphere as well. New config templates have been provided to sit alongside the existing Cloud-Init config templates. The new templates are based on the CoreOS Container Linux Config / Ignition format (https://coreos.com/os/docs/latest/provisioning.html). Notes: * The load balancer module has not yet been implemented. * There is an SSH key included for debugging purposes. It will be removed prior to submitting this work as a PR. * The public master IP address(es) and the master IP address(es) will be handled prior to submitting this work as a PR. Fixes crosscloudci#147
Hi @denverwilliams / @taylor / @lixuna, FYI, there's a branch that mostly supports deploying to vSphere via the Docker image: $ docker run \
> -v $(pwd)/data:/cncf/data \
> -e VSPHERE_SERVER=$VSPHERE_SERVER \
> -e VSPHERE_USER=$VSPHERE_USER \
> -e VSPHERE_PASSWORD=$VSPHERE_PASSWORD \
> -e VSPHERE_AWS_ACCESS_KEY_ID=$VSPHERE_AWS_ACCESS_KEY_ID \
> -e VSPHERE_AWS_SECRET_ACCESS_KEY=$VSPHERE_AWS_SECRET_ACCESS_KEY \
> -e VSPHERE_AWS_REGION=$VSPHERE_AWS_REGION \
> -e CLOUD=vsphere \
> -e COMMAND=deploy \
> -e NAME=cross-cloud \
> -e BACKEND=file \
> -ti provisioning The full log from the above command is here. However, it hangs here: Apply complete! Resources: 38 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
State path: /cncf/data/cross-cloud/terraform.tfstate
real 2m46.212s
user 0m5.490s
sys 0m1.820s
❤ Trying to connect to cluster with kubectl...... Apparently it cannot communicate with the cluster? If y'all would like, I can provide you credentials in a secure manner that will let you test as well. Please e-mail me for access, thanks! |
Hi @taylor / @denverwilliams, Your DNS server disallows recursion, and as such the value we're providing to the public IP list for master nodes is failing to resolve beyond the Query Public DNS Entry$ dig master.akutz.vsphere.local @147.75.69.23
; <<>> DiG 9.10.6 <<>> master.akutz.vsphere.local @147.75.69.23
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38507
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;master.akutz.vsphere.local. IN A
;; ANSWER SECTION:
master.akutz.vsphere.local. 300 IN CNAME xapi-20180614203519432300000002-c6821688ccc48489.elb.us-west-2.amazonaws.com.
;; Query time: 96 msec
;; SERVER: 147.75.69.23#53(147.75.69.23)
;; WHEN: Thu Jun 14 15:38:07 CDT 2018
;; MSG SIZE rcvd: 145 Query CNAME$ dig xapi-20180614201228551500000001-04a63f1f82619497.elb.us-west-2.amazonaws.com @147.75.69.23
; <<>> DiG 9.10.6 <<>> xapi-20180614201228551500000001-04a63f1f82619497.elb.us-west-2.amazonaws.com @147.75.69.23
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 25304
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;xapi-20180614201228551500000001-04a63f1f82619497.elb.us-west-2.amazonaws.com. IN A
;; Query time: 76 msec
;; SERVER: 147.75.69.23#53(147.75.69.23)
;; WHEN: Thu Jun 14 15:38:55 CDT 2018
;; MSG SIZE rcvd: 105 Query Public DNS with
|
Hi @figo @akutz
This will give you kubeconfig file that is using the lb_host_name as the cluster api endpoint, which means you won't need to worry about using the dns server for external Client ---> Server Communications. You will also need to add *.elb.us-west-2.amazonaws.com to the api cert so that the api server will accept the lb name being used directly, this would involve adding it to this line https://github.com/akutz/cross-cloud/blob/feature/vsphere/vsphere/modules.tf#L167 so it looks something like this
|
Hi @denverwilliams, Thanks for the reply. We just spoke about this, and I decided to use an elastic IP with the LB so I could pass the IP instead of FQDN. That should work with the existing config, right? |
FYI, the reason I think we should stick with IPs are:
|
@akutz Unfortunately you might run into a nasty caveat when using ips directly, because cloud-init data needs to be injected early it is generated before cloud resources are created (Including the certificates which need to have the ip injected into them). So to use the ip directly you would need to be able to reserve/predict what this ip is so that it is available when cloud-init/cert data is created. In the past when we were using load balances we built out the certificate dns name based on the region by doing something like the following *.elb.${ var.aws.region }.amazonaws.com" As for the etcd/dns server, there shouldn't need to be any changes for this to work as the LB's FQDN should be publicly resolvable on the internet. So if a client is using the etcd/dns server and a request comes in to resolve the LB's FQDN it will just forward it on to 8.8.8.8 and get a response. |
Hi @denverwilliams, Well, with AWS's elastic IP service we could grab the IP super early and then inform the LB which IP will be used. But even so, I'm not sure how using an IP for |
This patch provides the vSphere Terraform necessary to complete support for the vSphere provider. Currently the design is dependent on a VMware Cloud on AWS (VMC) (https://cloud.vmware.com/vmc-aws) backend, but the intent is to work on vanilla vSphere as well. New config templates have been provided to sit alongside the existing Cloud-Init config templates. The new templates are based on the CoreOS Container Linux Config / Ignition format (https://coreos.com/os/docs/latest/provisioning.html). Notes: * The load balancer module has not yet been implemented. * There is an SSH key included for debugging purposes. It will be removed prior to submitting this work as a PR. * The public master IP address(es) and the master IP address(es) will be handled prior to submitting this work as a PR. Fixes crosscloudci#147
Hi @denverwilliams, Okay, my branch now uses an requests an AWS elastic IP and supplies that to the LB. The elastic IP is also assigned to $ kubectl --kubeconfig data/kubeconfig get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 15m However, the deploy still hangs here: Apply complete! Resources: 39 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
State path: /cncf/data/akutz/terraform.tfstate
real 3m8.952s
user 0m7.200s
sys 0m6.970s
❤ Trying to connect to cluster with kubectl...... FWIW, here's what dig returns: $ dig master.akutz.vsphere.local @147.75.69.23
; <<>> DiG 9.10.6 <<>> master.akutz.vsphere.local @147.75.69.23
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38507
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;master.akutz.vsphere.local. IN A
;; ANSWER SECTION:
master.akutz.vsphere.local. 300 IN CNAME xapi-20180614203519432300000002-c6821688ccc48489.elb.us-west-2.amazonaws.com.
;; Query time: 96 msec
;; SERVER: 147.75.69.23#53(147.75.69.23)
;; WHEN: Thu Jun 14 15:38:07 CDT 2018
;; MSG SIZE rcvd: 145 |
Hi @denverwilliams, Maybe I'm not waiting long enough for the provision to complete? How long does it generally take once it hits the step that verifies the cluster is up and running? I just ran this manually, and it returns data! $ kubectl --kubeconfig data/kubeconfig get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"} That's the same command that is run from |
Hi @denverwilliams, FYI, on a new deploy it's stuck waiting: State path: /cncf/data/akutz/terraform.tfstate
real 2m57.824s
user 0m6.690s
sys 0m6.220s
❤ Trying to connect to cluster with kubectl................. However, I can query the cluster locally: $ kubectl --kubeconfig data/kubeconfig get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"} At this point I'm just lost as to why the query from my local system works and the one in the container is still waiting. I mean, the |
Hi @denverwilliams, By the way...
I'm not sure this is accurate based on the evidence I've presented above. Your shared DNS server disallows recursion and refuses to provide an answer when queried with the LB's FQDN. Now, maybe the caller will lookup against your DNS server, fail, and then lookup against Google (8.8.8.8), but it certainly doesn't seem to be happening on the server-side on your DNS server. You can easily test this:
|
Hi @denverwilliams, So I added this line in export KUBECONFIG=${TF_VAR_data_dir}/kubeconfig
dig $(cat "$KUBECONFIG" | grep server: | awk '{print $2}' | cut -c9-) Here's the output: ; <<>> DiG 9.10.3-P4-Debian <<>> master.akutz.vsphere.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 48652
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;master.akutz.vsphere.local. IN A
;; Query time: 7 msec
;; SERVER: 192.168.65.1#53(192.168.65.1)
;; WHEN: Fri Jun 15 02:15:12 UTC 2018
;; MSG SIZE rcvd: 44
❤ Trying to connect to cluster with kubectl................... It appears that Docker container running |
Hi @denverwilliams, I've updated the docker run \
-v $(pwd)/data:/cncf/data \
-e VSPHERE_SERVER=$VSPHERE_SERVER \
-e VSPHERE_USER=$VSPHERE_USER \
-e VSPHERE_PASSWORD=$VSPHERE_PASSWORD \
-e VSPHERE_AWS_ACCESS_KEY_ID=$VSPHERE_AWS_ACCESS_KEY_ID \
-e VSPHERE_AWS_SECRET_ACCESS_KEY=$VSPHERE_AWS_SECRET_ACCESS_KEY \
-e VSPHERE_AWS_REGION=$VSPHERE_AWS_REGION \
-e CLOUD=vsphere \
-e COMMAND=deploy \
-e NAME=cross-cloud \
-e BACKEND=file \
-ti provisioning \
shell
root@1e15d0d266d1:/cncf# However, when I examined # cat /etc/resolv.conf
# Generated by dhcpcd from eth0.dhcp
# /etc/resolv.conf.head can replace this line
domain kutz
nameserver 192.168.65.1
# /etc/resolv.conf.tail can replace this line So I'm curious how cc @figo |
Hi @denverwilliams, So I forgot that Docker containers inherit the DNS of the Docker daemon process, and that $ docker run \
--rm \
--dns 147.75.69.23 --dns 8.8.8.8 \
-v $(pwd)/data:/cncf/data \
-e VSPHERE_SERVER=$VSPHERE_SERVER \
-e VSPHERE_USER=$VSPHERE_USER \
-e VSPHERE_PASSWORD=$VSPHERE_PASSWORD \
-e VSPHERE_AWS_ACCESS_KEY_ID=$VSPHERE_AWS_ACCESS_KEY_ID \
-e VSPHERE_AWS_SECRET_ACCESS_KEY=$VSPHERE_AWS_SECRET_ACCESS_KEY \
-e VSPHERE_AWS_REGION=$VSPHERE_AWS_REGION \
-e CLOUD=vsphere \
-e COMMAND=deploy \
-e NAME=akutz \
-e BACKEND=file \
-ti provisioning And boom goes the cloud-o-mite: State path: /cncf/data/akutz/terraform.tfstate
real 2m50.607s
user 0m5.800s
sys 0m6.090s
❤ Trying to connect to cluster with kubectl...✓
❤ Ensure that the kube-system namespaces exists.✓
❤ Ensure that ClusterRoles are available.✓
❤ Ensure that ClusterRoleBindings are available.✓ It works! Might I recommend to @lixuna and others that the existing examples be updated to note that cc @figo |
The FAQ indicates the number of cores for each node, but that doesn't really give me an idea of what kind of compute power is expected of a core. A Xeon 2GHz? An i5 1.2GHz? An ARM? I ask because I'm working out resource shares for these nodes, and those are specified in CPU speed, not count. |
Hi @taylor / @lixuna / @denverwilliams, It's working :) And with cloud-init as well! Using @taylor's idea, we created an Ignition bootstrap file to run cloud-init. May I suggest considering the addition of direct Ignition support sometime in the near future? As of now, CoreOS has just deprecated support for cloud-init, but it's only a matter of time before support is removed altogether. Here's the patch that uses Ignition to bootstrap cloud-init. Perhaps others will find it useful as well. |
FWIW, re: Ignition support. Maybe it's not such a priority. I honestly didn't look to see what OS the other providers are using. I just assumed CoreOS is popular for this sort of thing. |
Good afternoon, @akutz. We'll be on the lookout for the PR! I'll send a followup email with a few options for sharing the credentials to the team. |
To Re-produce the failures for HEAD on cncf.ci you can deploy with the following variables set.
|
added in v1.4.0 |
Hi,
My team at VMware is working on adding support for vSphere/VMC to the CNCF cross cloud dashboard. We have a few questions that we hope you can answer:
provision.sh
. We need to know if such information is available toprovision.sh
as well as Terraform.provision.sh
write information to disk during a deploy in order to reuse said information during the destroy step?Thank you for your time. We've found the existing code and PRs to be useful in getting our work up and running. However, the following would be incredibly useful for us and people in the future:
Thank you again!
The text was updated successfully, but these errors were encountered: