Updating Calico to hosted install #768

heschlie · 2016-12-01T19:14:45Z

In order to get the hosted intall to work we had to make some changes to the deployment, the most disruptive being that we need to write to the hyperkubes /opt/cni/bin directory, so when Calico is enabled we mount that directory to the host. Our calico/cni:v1.5.2 image deploys flannel, host-local, and loopback CNI binaries alongside the calico binaries.

Some other minor changes were:

adding the pod-cidr to the proxy pods to get around some issues with Vagrant networking
We use docker to run hyperkube with kubectl to deploy the calico.yaml manifest
Removed all systemd Calico configuration
Using one CNI conf for all nodes
Vagrant files needed some touching as the VMs needed a bit more CPU/memory

We will want to update the remaining docs to use the same changes, but I wanted this reviewed before going through that process.

caseydavenport · 2016-12-01T19:55:28Z

single-node/calico.yaml

@@ -0,0 +1,185 @@
+# This ConfigMap is used to configure a self-hosted Calico installation.


Can we combine this into a single manifest for both single / multi nodes? I think the only difference should be the etcd_endpoints, correct? Would really like to see one manifest.

I was thinking the same, but my concern is does it matter that the file would be in the multi-node install? I'm thinking no?

Pointed the single node to the same file and removed the extra, spun up the single node to make sure it went ok, looks good.

caseydavenport · 2016-12-01T19:57:23Z

single-node/user-data

@@ -149,7 +159,7 @@ EOF
        cat << EOF > $TEMPLATE
 [Unit]
 Requires=network-online.target
-After=network-online.target
+After=network-online.targetETCDENDPOINTS


philips · 2016-12-01T22:24:17Z

single-node/user-data

+# We need to overwrite this for a hosted Calico install
+if [ "${USE_CALICO}" = "true" ]; then
+    export CALICO_OPTS="--volume cni-bin,kind=host,source=/opt/cni/bin \
+                        --mount volume=cni-bin,target=/opt/cni/bin"


kube-flannel uses /etc/cni, I think we should do the same thing here https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml#L67

That looks to be for the CNI conf files, not the CNI binaries correct? This is needed so our install-cni container can drop in the calico binaries into hyperkube.

ack, right.

aaronlevy · 2016-12-06T20:10:18Z

multi-node/vagrant/Vagrantfile

      v.gui = false
    end
  end

  config.vm.provider :virtualbox do |vb|
-    vb.cpus = 1
+    vb.cpus = 2


Is this necessary / safe assumption?

I had run into some stability issues with one, but that may have been while I was trying to run the policy controller on the master, I'll retest and and revert back if possible.

aaronlevy · 2016-12-06T20:16:23Z

multi-node/vagrant/Vagrantfile

@@ -161,6 +162,9 @@ Vagrant.configure("2") do |config|

      controller.vm.provision :file, :source => CONTROLLER_CLOUD_CONFIG_PATH, :destination => "/tmp/vagrantfile-user-data"
      controller.vm.provision :shell, :inline => "mv /tmp/vagrantfile-user-data /var/lib/coreos-vagrant/", :privileged => true
+
+      controller.vm.provision :file, :source => CALICO_MANIFEST, :destination => "/tmp/calico.yaml"
+      controller.vm.provision :shell, :inline => "mkdir -p /srv/kubernetes/manifests && mv /tmp/calico.yaml /srv/kubernetes/manifests/", :privileged => true


Ideally we're not relying on vagrant for anything beyond machine provisioning (this isn't completely true even now, but hopefully not adding more external configuration). What do you think about in-lining the calico manifest in the script (gated on USE_CALICO`)?

I was initially trying to avoid putting it into the script, but if that would be preferred I don't see a problem with that.

aaronlevy · 2016-12-06T20:18:24Z

multi-node/generic/controller-install.sh

-    curl --silent -H "Content-Type: application/json" -XPOST -d"$(cat /srv/kubernetes/manifests/calico-system.json)" "http://127.0.0.1:8080/api/v1/namespaces/" > /dev/null
+
+    # Deploy Calico
+    docker run --rm --net=host -v /srv/kubernetes/manifests:/host/manifests $HYPERKUBE_IMAGE_REPO:$K8S_VER /hyperkube kubectl apply -f /host/manifests/calico.yaml


This looks like it will get run on every boot - the kubectl create should fail "already exists", which is fine - but if the exit code propagates to the docker run, then this could cause the script to exit (I'm just not sure about the actual behavior -- ignore if this has been tested / is safe).

This appears to fail on reboot long before it gets to this point. Specifically it fails here:

coreos-kubernetes/multi-node/generic/controller-install.sh

Line 53 in 6bb3bb5

local REQUIRED=('ADVERTISE_IP' 'POD_NETWORK' 'ETCD_ENDPOINTS' 'SERVICE_IP_RANGE' 'K8S_SERVICE_IP' 'DNS_SERVICE_IP' 'K8S_VER' 'HYPERKUBE_IMAGE_REPO' 'USE_CALICO')

As ETCD_ENDPOINTS seems to be unset on reboot.

But on further review it should not report a failure if it did get to this point

Opened #775 to track this

aaronlevy · 2016-12-06T20:20:47Z

Couple minor comments/questions

heschlie · 2016-12-09T22:05:05Z

I believe I hit all the necessary places. The kube-aws docs aren't pulled from here anymore correct?

aaronlevy · 2016-12-09T22:29:10Z

single-node/Vagrantfile

@@ -12,6 +12,7 @@ $update_channel = "alpha"

 CLUSTER_IP="10.3.0.1"
 NODE_IP = "172.17.4.99"
+NODE_VCPUS = 2


Did this turn out to be a requirement? It's probably generally fine to do (wouldn't work on single core machine -- not sure how common that would be). If we are keeping this, can we make this configurable in the multi-node installation too.

aaronlevy · 2016-12-09T22:35:55Z

single-node/user-data

    echo "Waiting for Kubernetes API..."
    until curl --silent "http://127.0.0.1:8080/version"
    do
        sleep 5
    done
    echo
-    echo "K8S: Calico Policy"
-    curl --silent -H "Content-Type: application/json" -XPOST -d"$(cat /srv/kubernetes/manifests/calico-system.json)" "http://127.0.0.1:8080/api/v1/namespaces/" > /dev/null
+    docker run --rm --net=host -v /srv/kubernetes/manifests:/host/manifests $HYPERKUBE_IMAGE_REPO:$K8S_VER /hyperkube kubectl apply -f /host/manifests/calico.yaml


If possible, can we switch this to rkt? We require rkt already due to running the kubelet via kubelet-wrapper, but if you want to use rkt as your runtime, we do not require the use of docker.

It should be mostly equivalent, but to implement --rm takes a little more coordination.

When executing rkt run, add :
--uuid-file-save=/var/run/calico-install.uuid

Then after execution
/usr/bin/rkt rm --uuid-file=/var/run/calico-install.uuid

I'm getting device or resource busy when running rm against that pod, the full set of commands:

/usr/bin/rkt run --uuid-file-save=/var/run/calico-install.uuid \ --net=host \ --volume manifests,kind=host,source=/etc/kubernetes/manifests \ --mount volume=manifests,target=/host/manifests \ $HYPERKUBE_IMAGE_REPO:$K8S_VER --exec=/hyperkube -- kubectl apply -f /host/manifests/calico.yaml /usr/bin/rkt rm --uuid-file=/var/run/calico-install.uuid

Any suggestions? Should I just leave the pod in an exited state?

The one other trick that might be worth trying is rkt status --wait $uuid between the run and rm

If a wait doesn't help, I'd be happy to dig a bit deeper

Disregard my last comment, I remembered the actual issue here. This is caused by a shared bind mount of /var/lib/rkt (created by the rkt fly kubelet) interacting poorly with rkt rm: rkt/rkt#3181

There's unfortunately not a good fix yet 😦

aaronlevy · 2016-12-09T22:36:32Z

Couple more minor comments.

/cc @robszumski and @joshix for docs changes

aaronlevy · 2016-12-10T00:17:40Z

Based on @euank feedback (#768 (comment)) -- sorry to ask this, but prob can just switch back to using docker for now and add a TODO for switching to rkt once the issue is resolved

heschlie · 2016-12-10T00:28:28Z

No problem, thanks for the fast feedback :)

aaronlevy · 2016-12-10T00:40:55Z

@heschlie can you squash the cleanup commits / where appropriate and we can get this merged?

robszumski · 2016-12-10T18:47:44Z

Documentation/deploy-master.md

+    --volume cni-bin,kind=host,source=/opt/cni/bin \
+    --mount volume=cni-bin,target=/opt/cni/bin
+    ```
+  - Add `ExecStartPre=/usr/bin/mkdir -p /opt/cni/bin`


I wish there was a cleaner way to describe all of this, but I'm at a loss

robszumski · 2016-12-10T18:51:05Z

Documentation/deploy-master.md

 * Connects containers to the flannel overlay network, which enables the "one IP per pod" concept.
 * Enforces network policy created through the Kubernetes policy API, ensuring pods talk to authorized resources only.

-When creating `/etc/systemd/system/calico-node.service`:
+Finally policy agent is the last major piece of the calico.yaml. It monitors the API for changes related to network policy and configures Calico to implement that policy. 


I think you can drop "finally" here.

The policy agent is the last major piece of the Calico system.

It should also read "The policy controller"

heschlie · 2016-12-12T17:03:19Z

commits squashed!

aaronlevy

lgtm

aaronlevy · 2016-12-12T18:49:19Z

@heschlie looks like there are some conflicts now. Can you rebase your changes

aaronlevy

removing approval until conflicts resolved

heschlie · 2016-12-12T19:07:36Z

@aaronlevy and done!

aaronlevy · 2016-12-12T21:52:34Z

single-node/user-data

@@ -31,11 +31,19 @@ export K8S_SERVICE_IP=10.3.0.1
 export DNS_SERVICE_IP=10.3.0.10

 # Whether to use Calico for Kubernetes network policy.
-export USE_CALICO=false
+export USE_CALICO=true


Can we leave this as default off for now. It's also default to off in multi-node installations and I'd like to not also change default behavior in this large of a PR.

yep sorry about that it should have been set to false, must have missed it, amended the last commit.

aaronlevy · 2016-12-12T21:53:37Z

Thanks. One last minor comment to address - then squash and we should be good.

aaronlevy · 2016-12-12T22:07:23Z

Perfect. Can you squash that last commit

aaronlevy · 2016-12-12T23:53:36Z

Documentation/deploy-master.md

+
+```sh
+$ curl -H "Content-Type: application/json" -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"
+>>>>>>> 625c7d7... Updating Calico to hosted install


Some rebase cruft left here.

In order to get the hosted intall to work we had to make some changes to the deployment, the most disruptive being that we need to write to the hyperkubes `/opt/cni/bin` directory, so when Calico is enabled we mount that directory to the host. Some other minor changes were adding the pod-cidr to the proxy pods to get around some issues with Vagrant networking We use docker to run hyperkube with kubectl to deploy the calico.yaml manifest, this should be changed to rkt once the noted bug is fixed Removed all systemd Calico configuration Using one CNI conf for all nodes Updated bare metal docs to self-hosted Calico install Updated upgrade doc, noted this new install is not compatible with old systemd install method.

aaronlevy

lgtm

caseydavenport reviewed Dec 1, 2016

View reviewed changes

philips reviewed Dec 1, 2016

View reviewed changes

heschlie mentioned this pull request Dec 6, 2016

Calico self hosted integration kubernetes-retired/kube-aws#124

Merged

aaronlevy reviewed Dec 6, 2016

View reviewed changes

heschlie force-pushed the master branch from be3f9e7 to 7087374 Compare December 9, 2016 17:30

heschlie changed the title ~~WIP: Updating Calico to hosted install~~ Updating Calico to hosted install Dec 9, 2016

aaronlevy reviewed Dec 9, 2016

View reviewed changes

euank mentioned this pull request Dec 10, 2016

"sharing" the rkt data directory across a shared bindmount breaks rkt rm rkt/rkt#3181

Open

heschlie force-pushed the master branch from 9e2e236 to 32a04df Compare December 10, 2016 00:29

robszumski reviewed Dec 10, 2016

View reviewed changes

heschlie force-pushed the master branch from 32a04df to 625c7d7 Compare December 12, 2016 17:02

aaronlevy approved these changes Dec 12, 2016

View reviewed changes

aaronlevy suggested changes Dec 12, 2016

View reviewed changes

aaronlevy reviewed Dec 12, 2016

View reviewed changes

heschlie force-pushed the master branch from 1e9501c to 8726c4a Compare December 12, 2016 21:57

heschlie force-pushed the master branch from 8726c4a to 68bc237 Compare December 12, 2016 22:26

aaronlevy reviewed Dec 13, 2016

View reviewed changes

heschlie force-pushed the master branch from 68bc237 to 0f572a6 Compare December 13, 2016 00:32

aaronlevy approved these changes Dec 13, 2016

View reviewed changes

aaronlevy merged commit 8838b53 into coreos:master Dec 13, 2016

heschlie mentioned this pull request Jan 11, 2017

Docs: update Calico versions and add calico-policy-controller to api output #735

Closed

heschlie mentioned this pull request Jan 23, 2017

kubelet does not honor calico cni certificate configuration #754

Closed

		@@ -0,0 +1,185 @@
		# This ConfigMap is used to configure a self-hosted Calico installation.

Updating Calico to hosted install #768

Updating Calico to hosted install #768

Conversation

heschlie commented Dec 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heschlie Dec 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy commented Dec 6, 2016

heschlie commented Dec 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy commented Dec 9, 2016

aaronlevy commented Dec 10, 2016

heschlie commented Dec 10, 2016

aaronlevy commented Dec 10, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heschlie commented Dec 12, 2016

aaronlevy left a comment

Choose a reason for hiding this comment

aaronlevy commented Dec 12, 2016

aaronlevy left a comment

Choose a reason for hiding this comment

heschlie commented Dec 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy commented Dec 12, 2016

aaronlevy commented Dec 12, 2016

Choose a reason for hiding this comment

aaronlevy left a comment

Choose a reason for hiding this comment

heschlie Dec 7, 2016 •

edited

Loading