Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for refactoring cloud provider API #128

Merged
merged 1 commit into from
Feb 28, 2017
Merged

Conversation

wlan0
Copy link
Member

@wlan0 wlan0 commented Dec 1, 2016

@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune

This PR is a copy of the rectified contents of this PR - kubernetes/kubernetes#37037

PTAL

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 1, 2016
@philips
Copy link
Contributor

philips commented Dec 2, 2016

@wlan0 I would love to have language on our expected upgrade/downgrade path as that is becoming a common theme as Kubernetes matures.

@wlan0
Copy link
Member Author

wlan0 commented Dec 3, 2016

@philips Thanks for bringing this up. This is an important concern and I totally missed it. I'll add it to the document as well, but for now this is the upgrade path -

Upgrading kubelet and proxy

The kubelet and proxy runs on every node in the kubernetes cluster. Based on your setup (systemd/other), you can follow the normal upgrade steps for it. This change does not affect the kubelet and proxy upgrade steps for your setup.

Upgrading plugins

Plugins such as cni, flex volumes can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.

Upgrading kubernetes core services

The master node components (kube-controller-manager,kube-scheduler, kube-apiserver etc.) can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.

Applying the cloud-controller-manager

This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using

kubectl apply -f cloud-controller-manager.yml

This will start the cloud specific controller manager in your kuberentes setup.

The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using

kubectl delete -f cloud-controller-manager.yml

As always, any comments/thoughts/improvements/suggestions are welcome :)


Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209)

Admission controllers are meant to perform some aspects of initialization of objects. In this case, an ExternalAdmissionController will be configured to first mark the Node object as unusable by setting the NodeStatus as {"CloudProviderInitialized": "False"} (This exact keyword has not been decided yet), and then an external controller (residing in the cloud controller manager) will update it by removing the NodeStatus. In order for this to work, Nodes should be unusable until these Statuses are removed. This is possible because of the above mentioned PRs by @justinsb.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flexible extension of admission control by third parties (out of tree) will be handled in proposal #132. In the meantime, an admission controller can remain in tree for this proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this @smarterclayton

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt didn't like the admission control approach last I heard, perhaps he can elaborate here.

Copy link
Member

@liggitt liggitt Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is CloudProviderInitialized? if it's a condition, it would need to be paired with scheduler changes to recognize conditions, and that could be accomplished by requiring the presence of a CloudProviderInitialized=True condition, which wouldn't need another admission plugin. If it's a taint, kubelets can register themselves with taints already, which also doesn't need an admission plugin

Copy link
Member Author

@wlan0 wlan0 Jan 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt I'm also convinced now that taints is the better approach. Will update doc shortly. While I was drafting this proposal, taints weren't ready yet.


### 6. Deployment

This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is kubectl apply -f required? In kube-mesos, I'm thinking to use one scheduler daemon to communicate with Mesos to reduce the number of binaries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k82cn It is definitely not required. It is the recommended way, but as a power user, you are free to run it in your own way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlan0 , great, good to know.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great overall. The only things I'd like to see more detail on are transition plan and code/lib evolution. Something like:

release X: add a `--cloud=external` flag, disabled by default.  Users can set this and manually run a cloud controller to test or start the migration.  Announce deprecation of built-in cloud providers.

release X+1: convert all supported turnups to use cloud controller by default, still allow opt-out.  Run a monolithic cloud  controller binary with existing lib.

release X+2: still allow opt-out, but warn loudly.  Break cloud controller into N individual binaries.

release X+3: remove legacy cloud providers

and

step1) break controller-manager into 2 binaries with no other changes.  new cloud-controller takes a flag for which cloud-provider to use

step2) factor out all new controller loops from kubelet etc

step3) convert cloud controller to a lib, new main() for each cloud provider, still using cloud provider API

step4) move cloud providers out of tree

etc. Just more detail on the steps along the way - this is going to be VERY complex, and I want to think up-front about how to make a series of dozens of small, stable changes, rather than a few multi-thousand-line changes.


The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.

The routeController configures routes for hosts in Google Cloud.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not just google cloud, also AWS, I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least GCP, AWS and Azure implement the Routes interface in cloudprovider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenStack also implements it now that kubernetes/kubernetes#32663 is merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping


Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.

Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton ETA on a first cut for initializers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt had some opinions on how the node readiness should work using the scheduler rather than admission control. Jordan, can you elaborate a bit here?

Copy link
Member

@davidopp davidopp Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this problem should be addressed using kubernetes/kubernetes#31647 (node registers with a taint to prevent pods from scheduling), not kubernetes/kubernetes#36209 (node creation admission controller).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using taints (kubernetes/kubernetes#31647):

  • requires no changes in the scheduler (good)
  • requires no changes for a node with that taint to be explicitly tolerated by pods (good, I think)
  • requires nodes to be started with known taints (not sure how big a deal this is... how do we envision cloud providers informing the initial set of taints a kubelet should register with?)

Allowing the scheduler to require the presence of a particular node condition (kubernetes/kubernetes#37224):

  • requires a support for a condition predicate in the scheduler
  • requires changing the scheduler config file if a new condition is desired
  • doesn't require modifying kubelet startup... addition/removal of the conditions can be done by an external client

I can see scenarios where either approach would be preferable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think taints are preferable. I believe they are equally expressive the condition approach, and the rules are more obvious (not hidden in the scheduler config). Note that addition/removal of taints can be done by an external client -- they are just like labels. (Arguably we need some ACLs to restrict this, but that's another story). The only benefit I can see for using node conditions is that you don't need redundant node conditions and taints, so you avoid redundancy and don't have to worry about setting both. But I think the benefits of taints outweigh its downsides.

Copy link
Member

@liggitt liggitt Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that addition/removal of taints can be done by an external client -- they are just like labels

Except that addition by an external client could not be done synchronously with node creation time... it would lag, allowing for the possibility of the node being briefly schedulable. Node initializers could solve that, though I'm not sure how close those are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; I was just trying to point out that one of the properties you listed for the node condition approach ("doesn't require modifying kubelet startup... addition/removal of the conditions can be done by an external client") when comparing to taints, is actually a property of taints as well. I agree having kubelet apply the taints when it registers is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looks like the node condition PR is in a state of limbo. Unless that gets revived we only have one option - taints.

@wlan0
Copy link
Member Author

wlan0 commented Dec 9, 2016

@thockin @roberthbailey Addressed all of your comments. The doc feels like its in a good position to get merged.

@derekwaynecarr
Copy link
Member

In the future, I would like a strategy for controlling consumption of cloud api call quota.

3. To instantiate a reference to the current node object
- Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it
- Periodically poll the cloud provider to figure out if the node has any new IP addresses associated with it
- In case of Google cloud, it allows the provider to configure routes for the node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear. Kubelet doesn't have anything to do with the routes itself, it leaves all that to the route provider. What kubelet does (wrongly, see kubernetes/kubernetes#34398) is set the "NetworkUnavailable" condition whenever a cloud provider is specified. That's incorrect because the only thing that clears this condition is the route controller, and the route controller is only required if the networking setup uses direct routing between nodes for the pod network, which many plugins do not (including those with overlays).

In any case, I'd clarify to say something like "Assumes cloud providers require routes for network plugins (not always true) and blocks readiness until cloud routes are configured"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Setting that condition is the Kubelet task. I'll rectify the doc.

@davidopp
Copy link
Member

It would be great to add a diagram to this proposal showing the "final state" of the components after this is fully implemented.

Just to make sure I understand the plan for controller manager (the main component I'm interested in): the basic idea is that the cloud-provider-specific bits will be factored into a separate binary, which will be different for each cloud provider and will run as a regular Kubernetes pod?

1. Cloud dependent controller-manager binaries
2. Cloud independent controller-manager binaries - This is the existing `controller-manager` thats shipped with kubernetes releases presently.

The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the "started using" parenthetical statement since that presumes a specific implementation, but in reality we may implement it differently for each cloud provider.


### 6. Deployment, Upgrades and Downgrades

This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For instance, once instance of external cloud controller manager should be run." <-- please remove this statement, as it presumes a specific implementation of the external controller manager. I would rather be able to run multiple instances for higher availability than design in a SPOF.


##### 7.1 Transition plan

Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase "Announce"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really announce a deprecation until we have a replacement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shoot for 1.6 breaking at least some of the cloud-specific control loops into a new process? It's operationally more complex but proves it can be done.

thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin In my understanding, that's exactly what's going on in kubernetes/kubernetes#34273

We're gonna make it possible to use the now in-tree cloudproviders using the new flow with this new, temporary binary cloud-controller-manager, so I think it is ok to announce that the long-term goal is using the external flow and therefore deprecating the current solution (while still supporting it quite a long time)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An Announce of deprecation of built-in cloud providers will also be done.

The grammar could actually be a bit cleaner :(

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few comments unresolved here.


The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.

The routeController configures routes for hosts in Google Cloud.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping


##### 7.1 Transition plan

Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really announce a deprecation until we have a replacement.


##### 7.1 Transition plan

Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we shoot for 1.6 breaking at least some of the cloud-specific control loops into a new process? It's operationally more complex but proves it can be done.

thoughts?


The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries.

Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this dependency have the potential to make the external cloud-controller-manager much larger in terms of code size than it would be otherwise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as our controller-manager will shrink notably in size when all cloud provider deps have moved out.

No, I think those libraries are lightweight. Anyway, some logic have to exist, so vendoring in that logic from core is absolutely the best thing to do

Copy link

@mfischer-zd mfischer-zd Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next question: what would be the communication protocol between the controllers? Too tight a binding, as opposed to some stable RPC-like protocol, could force us to rebuild the cloud-dependent controller every time kube-controller-manager is upgraded. We'd like to avoid that if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't be the case. No communication will take place between the controllers.

After all, they are controllers. The official controller-manager will modify k8s objects normally and don't take anything related to a cloud into account while your custom-deployed cloud-controller-manager binary will only react to cloud events

Copy link

@mfischer-zd mfischer-zd Dec 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to some documentation that explains it? I'm at the edge of my understanding now - it's becoming clear that the "plugin" model I'm thinking of is not what's being discussed here. I also tried to search online for a generic description of how K8S controllers are supposed to work, without success.

Things I'm interested in include:

  • "reacting to cloud events" - how? What's the API, if any?
  • How to ensure this custom cloud-controller-manager would be a singleton to avoid duplicate and possibly conflicting event consumption/reaction

Sorry to pollute the discussion with this meta-questioning, but I appreciate being pointed in the right direction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfischer-zd It might help to look up how the informer framework works in kubernetes. This is my favourite document about the informer framework - http://jayunit100.blogspot.com/2015/10/kubernetes-informers-controllers.html

Things I'm interested in include:

"reacting to cloud events" - how? What's the API, if any?

Reading up about informer framework is essential to understanding this. There is no cloud API per se. This is an example flow - When a kubelet registers a node with the apiserver, the controller manager reacts to this creation by talking to the cloudprovider and fetching information about the node. The information about the creation is brought to the controller manager via the informers.

How to ensure this custom cloud-controller-manager would be a singleton to avoid duplicate and possibly conflicting event consumption/reaction

First of all, multiple custom controller managers can be simultaneously with leader election. Secondly, Kubernetes objects are idempotent by design. Two objects will not step on each other because of this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found the following, in case someone runs into this thread someday:

@wlan0
Copy link
Member Author

wlan0 commented Jan 5, 2017

ping @thockin. I've updated the doc after considering all the comments here. PTAL

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to flog this proposal too much more. Just expand and clarify the expectations through transition, and think about the partially-upgraded case, which is a huge reality in things like GKE...


The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared datastructures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity.

Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for now" or "for the duration of the transition" or "A reference form of" ... ? Basically, anyone who wants to write a cloud controller COULD. They don't HAVE TO use this lib, but it's a nice starting point.


Volumes need cloud providers, but they only need SPECIFIC cloud providers. The real problem is that cloud providers can take params from cloud config. The majority of the logic for volume management, currently resides in the controller-manager.

There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little anxious about the timeline for flex volumes, still. We may need to revisit this. Any ideas for plan B?

This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using

```
kubectl apply -f cloud-controller-manager.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when we update the master and get the new controller, but the nodes have not been updated? How does the taints/initializers stuff work in the interim? Do we need an interlock like we have done before? e.g. one release has both old and new master logic and switches only when all nodes report updated kubelets, and then the subsequent release removes old logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin, we can prevent this scenario where we've updated master but the nodes haven't updated by adding the taint logic into the master itself. If the master had a node admission controller that added these taints, then we'll only need to update master and any nodes that had the old logic would've talked to the cloudprovider themselves, and the nodes with the new logic wouldn't need to add any new params.

There is a flip side, where there will be updated nodes, but older masters will not have the admission controller for adding taints. To fix this, we'll need the interlock approach in the cloud-controller-manager. The CCM will retroactively process nodes and update cloud specific details in the first release and then this logic will be removed in the subsequent release.

I thought of this approach of using admission controllers instead of setting the taint in kubelet because of an issue faced by @luxas. It was restrictively difficult for kubeadm to update every kubelet with new params and he suggested that we use a different approach instead. After discussing, we found that the admission controller approach is preferrable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using an admission controller that taints every node before the node has initialized is preferable.


##### 7.1 Transition plan

Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you flesh this out just a bit more? Specifically - the cloud-controller-manager binary is the same control loops that were in the main controller-manager, but re-factored as described here. Still uses the exact same cloudprovider libs. Yes? In other words, is it EXPECTED that everyone switches to CCM, or is it just optional and "beta" ?


Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager.

Release 1.7: In this release, all of the supported turnups will be converted to use cloud controller by default. At this point users will still be allowed to opt-out. Users will have the option to run a monolithic cloud controller binary. The cloud controller library will still continue to use existing library. A deprecation announcement will be made.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code will be factored to reduce literal duplication between CM and CCM as much as possible?


Release 1.8: The main change aimed for this release is to break up the various cloud providers into individual binaries. Users will still be allowed to opt-out. There will be a second warning to inform users about the deprecation of the `--cloud-provider` option in the controller-manager.

Release 1.9: All of the legacy cloud providers will be completely removed in this version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be December - it sucks to plan a year ahead, but this is a huge and important changeset.

@wlan0
Copy link
Member Author

wlan0 commented Jan 18, 2017

ping @thockin. I've updated the doc. PTAL

Moving on to the kubelet, the following cloud provider dependencies exist in kubelet.

- Find the cloud nodename of the host that kubelet is running on for the following reasons :
1. To obtain the config map for the kubelet, if one already exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we will very very likely move away from this model to one where ConfigMaps can be shared between nodes, which will obviate the ConfigMap dependency on the nodename. For more info see kubernetes/kubernetes#29459


Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.

Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mml this could be solution to the problem we talked about in our meeting on Wed. (possible for pods to be scheduled to nodes that aren't ready (properly configured) yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlan0, @justinsb, and I discussed today and fleshed the idea out some more. I assume that will get added soon.


Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.

This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with static pods e.g. KubeProxy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK taints are not comprehended by kubelet, so static pods would start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't count on that behavior. Any pods that want to run when a NoExecute taint is present will need a toleration.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty happy with this. There's always more detail that can be added, and we should revisit this as we understand each step more.

@mml to have a look, but I am LGTM on this.

@wlan0 if no new comments or changes by Thursday, let's merge it.


Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.

This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK taints are not comprehended by kubelet, so static pods would start.


Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.

Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlan0, @justinsb, and I discussed today and fleshed the idea out some more. I assume that will get added soon.


RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.

NodeController does a lot more than just talk to the cloud. It does the following operations -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to break up the NodeController?

Kube-apiserver uses the cloud provider for two purposes

1. Distribute SSH Keys - This can be moved to the cloud dependent controller manager
2. Admission Controller for PV - This can be refactored using the taints approach used in Kubelet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also the admission extension proposal: #132


### 6. Deployment, Upgrades and Downgrades

This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addon manager isn't currently required, but we need an equivalent that is.

1. To obtain the config map for the kubelet, if one already exists
2. To uniquely identify current node using nodeInformer
3. To instantiate a reference to the current node object
- Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how this information is going to be discovered in the proposal below. The zone and id are generally needed in order to call the cloud provider to get more information about the node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this information currently retrieved from the local metadata server, like EC2 and GCE do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything the node needs to bootstrap could be fed in as flags/config locally. Example: hostname override. There is a class of variables that need to be known to get things going and are exceedingly unlikely to change, but are also difficult to express generically.

Node setup could get the region/zone from metadata or API or config volume (openstack) or SMBios or whever, and pass it to kubelet. We don't need an internal API for this.

@errordeveloper
Copy link
Member

@wlan0 great work so far! Cannot wait for this to land, and would be keen to help. I can see there are just a few threads and @thockin is already quite happy with it overall. Please let me know if I can help.

@keontang
Copy link

I have read this proposal and the pr "start breaking up controller manager into two pieces" kubernetes/kubernetes#34273.

But i am still confused about how to use it.

|____cloud-controller-manager
| |____app
| | |____BUILD
| | |____controllermanager.go
| | |____options
| | | |____BUILD
| | | |____options.go
| |____BUILD
| |____controller-manager.go (main function)

Does it means for each cloudprovider, we should add the cloud special code into cloud-controller-manager directory, and update main function just like https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager?

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Feb 23, 2017
Automatic merge from submit-queue (batch tested with PRs 41540, 41808, 41710, 41838, 41840)

kubeadm: Remove the --cloud-provider flag for beta init UX

**What this PR does / why we need it**:

We decided the `--cloud-provider` flag promises way too much compared to what it really does. There is a lot you have to do as an user in order to make the current cloud provider integrations to work. And since we're promising to support the `kubeadm init` UX on a beta level in v1.6, we can't have this flag in the UX. A lot is gonna change here... see proposal: kubernetes/community#128

Once we find a cloudprovider solution we can support (probably using the new flow), we'll just add it.
For now, we'll just document how to do cloudprovider integrations by hand instead. 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
kubeadm: Remove the --cloud-provider flag for beta init UX
```
@jbeda @dmmcquay @mikedanese @roberthbailey @pires @errordeveloper
@keontang
Copy link

@wlan0

@thockin
Copy link
Member

thockin commented Feb 28, 2017

@wlan0 - anythign blocking this PROPOSAL from being merged, in your opinion?

@wlan0
Copy link
Member Author

wlan0 commented Feb 28, 2017

Nope

@thockin thockin merged commit 28e993f into kubernetes:master Feb 28, 2017
@errordeveloper
Copy link
Member

Really glad to see progress on this! 🥇 🎉

@thockin
Copy link
Member

thockin commented Mar 1, 2017 via email

@rrati
Copy link

rrati commented Apr 7, 2017

@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune @wlan0 @bgrant0607 I've got some available bandwidth and would like to help out with this effort. Is there a current status reflected somewhere?

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Apr 8, 2017
Automatic merge from submit-queue

add rancher credential provider

This adds rancher as a credential provider in kubernetes.

@erictune This might be a good opportunity to discuss adding a provision for people to have their own credential providers that is similar to the new cloud provider changes (kubernetes/community#128). WDYT?

```
release-note
Added Rancher Credential Provider to use Rancher Registry credentials when running in a Rancher cluster
```
ruebenramirez pushed a commit to ruebenramirez/community that referenced this pull request Apr 22, 2017
calebamiles added a commit to calebamiles/community that referenced this pull request Jan 10, 2018
This KEP is a rewrite of an existing [design proposal][] which had been
given prior [approval][]. Having done a bit of due diligence, mark
@thockin as the approver. It is expected that WG Cloud Provider will
find additional approvers once PRs which implement this KEP begin
landing.

[design proposal]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md
[approval]: kubernetes#128 (review)
justaugustus pushed a commit to justaugustus/enhancements that referenced this pull request Sep 3, 2018
This KEP is a rewrite of an existing [design proposal][] which had been
given prior [approval][]. Having done a bit of due diligence, mark
@thockin as the approver. It is expected that WG Cloud Provider will
find additional approvers once PRs which implement this KEP begin
landing.

[design proposal]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md
[approval]: kubernetes/community#128 (review)
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
rmohr pushed a commit to rmohr/community that referenced this pull request May 4, 2022
Signed-off-by: Fabian Deutsch <fdeutsch@redhat.com>
danehans pushed a commit to danehans/community that referenced this pull request Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.