Proposal for refactoring cloud provider API #128

wlan0 · 2016-12-01T22:41:11Z

@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune

This PR is a copy of the rectified contents of this PR - kubernetes/kubernetes#37037

PTAL

philips · 2016-12-02T21:45:49Z

@wlan0 I would love to have language on our expected upgrade/downgrade path as that is becoming a common theme as Kubernetes matures.

wlan0 · 2016-12-03T01:52:58Z

@philips Thanks for bringing this up. This is an important concern and I totally missed it. I'll add it to the document as well, but for now this is the upgrade path -

Upgrading kubelet and proxy

The kubelet and proxy runs on every node in the kubernetes cluster. Based on your setup (systemd/other), you can follow the normal upgrade steps for it. This change does not affect the kubelet and proxy upgrade steps for your setup.

Upgrading plugins

Plugins such as cni, flex volumes can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.

Upgrading kubernetes core services

The master node components (kube-controller-manager,kube-scheduler, kube-apiserver etc.) can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup.

Applying the cloud-controller-manager

This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using

kubectl apply -f cloud-controller-manager.yml

This will start the cloud specific controller manager in your kuberentes setup.

The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using

kubectl delete -f cloud-controller-manager.yml

As always, any comments/thoughts/improvements/suggestions are welcome :)

smarterclayton · 2016-12-03T23:56:16Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209)
+
+Admission controllers are meant to perform some aspects of initialization of objects. In this case, an ExternalAdmissionController will be configured to first mark the Node object as unusable by setting the NodeStatus as {"CloudProviderInitialized": "False"} (This exact keyword has not been decided yet), and then an external controller (residing in the cloud controller manager) will update it by removing the NodeStatus. In order for this to work, Nodes should be unusable until these Statuses are removed. This is possible because of the above mentioned PRs by @justinsb.


Flexible extension of admission control by third parties (out of tree) will be handled in proposal #132. In the meantime, an admission controller can remain in tree for this proposal.

Thanks for sharing this @smarterclayton

@liggitt didn't like the admission control approach last I heard, perhaps he can elaborate here.

what is CloudProviderInitialized? if it's a condition, it would need to be paired with scheduler changes to recognize conditions, and that could be accomplished by requiring the presence of a CloudProviderInitialized=True condition, which wouldn't need another admission plugin. If it's a taint, kubelets can register themselves with taints already, which also doesn't need an admission plugin

@liggitt I'm also convinced now that taints is the better approach. Will update doc shortly. While I was drafting this proposal, taints weren't ready yet.

k82cn · 2016-12-05T09:28:09Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+### 6. Deployment
+
+This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run.


Is kubectl apply -f required? In kube-mesos, I'm thinking to use one scheduler daemon to communicate with Mesos to reduce the number of binaries.

@k82cn It is definitely not required. It is the recommended way, but as a power user, you are free to run it in your own way.

@wlan0 , great, good to know.

thockin

This looks great overall. The only things I'd like to see more detail on are transition plan and code/lib evolution. Something like:

release X: add a `--cloud=external` flag, disabled by default.  Users can set this and manually run a cloud controller to test or start the migration.  Announce deprecation of built-in cloud providers.

release X+1: convert all supported turnups to use cloud controller by default, still allow opt-out.  Run a monolithic cloud  controller binary with existing lib.

release X+2: still allow opt-out, but warn loudly.  Break cloud controller into N individual binaries.

release X+3: remove legacy cloud providers

and

step1) break controller-manager into 2 binaries with no other changes.  new cloud-controller takes a flag for which cloud-provider to use

step2) factor out all new controller loops from kubelet etc

step3) convert cloud controller to a lib, new main() for each cloud provider, still using cloud provider API

step4) move cloud providers out of tree

etc. Just more detail on the steps along the way - this is going to be VERY complex, and I want to think up-front about how to make a series of dozens of small, stable changes, rather than a few multi-thousand-line changes.

thockin · 2016-12-07T05:58:58Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.
+
+The routeController configures routes for hosts in Google Cloud.


not just google cloud, also AWS, I think?

At least GCP, AWS and Azure implement the Routes interface in cloudprovider.

OpenStack also implements it now that kubernetes/kubernetes#32663 is merged

thockin · 2016-12-07T06:05:39Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.
+
+Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209)


@smarterclayton ETA on a first cut for initializers?

@liggitt had some opinions on how the node readiness should work using the scheduler rather than admission control. Jordan, can you elaborate a bit here?

I think this problem should be addressed using kubernetes/kubernetes#31647 (node registers with a taint to prevent pods from scheduling), not kubernetes/kubernetes#36209 (node creation admission controller).

using taints (kubernetes/kubernetes#31647):

requires no changes in the scheduler (good)

requires no changes for a node with that taint to be explicitly tolerated by pods (good, I think)

requires nodes to be started with known taints (not sure how big a deal this is... how do we envision cloud providers informing the initial set of taints a kubelet should register with?)

Allowing the scheduler to require the presence of a particular node condition (kubernetes/kubernetes#37224):

requires a support for a condition predicate in the scheduler

requires changing the scheduler config file if a new condition is desired

doesn't require modifying kubelet startup... addition/removal of the conditions can be done by an external client

I can see scenarios where either approach would be preferable

I think taints are preferable. I believe they are equally expressive the condition approach, and the rules are more obvious (not hidden in the scheduler config). Note that addition/removal of taints can be done by an external client -- they are just like labels. (Arguably we need some ACLs to restrict this, but that's another story). The only benefit I can see for using node conditions is that you don't need redundant node conditions and taints, so you avoid redundancy and don't have to worry about setting both. But I think the benefits of taints outweigh its downsides.

Note that addition/removal of taints can be done by an external client -- they are just like labels

Except that addition by an external client could not be done synchronously with node creation time... it would lag, allowing for the possibility of the node being briefly schedulable. Node initializers could solve that, though I'm not sure how close those are

Agreed; I was just trying to point out that one of the properties you listed for the node condition approach ("doesn't require modifying kubelet startup... addition/removal of the conditions can be done by an external client") when comparing to taints, is actually a property of taints as well. I agree having kubelet apply the taints when it registers is better.

Also, looks like the node condition PR is in a state of limbo. Unless that gets revived we only have one option - taints.

wlan0 · 2016-12-09T07:12:30Z

@thockin @roberthbailey Addressed all of your comments. The doc feels like its in a good position to get merged.

derekwaynecarr · 2016-12-09T21:56:57Z

In the future, I would like a strategy for controlling consumption of cloud api call quota.

dcbw · 2016-12-10T16:08:13Z

contributors/design-proposals/cloud-provider-refactoring.md

+      3. To instantiate a reference to the current node object
+ - Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it
+ - Periodically poll the cloud provider to figure out if the node has any new IP addresses associated with it
+ - In case of Google cloud, it allows the provider to configure routes for the node


This is unclear. Kubelet doesn't have anything to do with the routes itself, it leaves all that to the route provider. What kubelet does (wrongly, see kubernetes/kubernetes#34398) is set the "NetworkUnavailable" condition whenever a cloud provider is specified. That's incorrect because the only thing that clears this condition is the route controller, and the route controller is only required if the networking setup uses direct routing between nodes for the pod network, which many plugins do not (including those with overlays).

In any case, I'd clarify to say something like "Assumes cloud providers require routes for network plugins (not always true) and blocks readiness until cloud routes are configured"

That's right. Setting that condition is the Kubelet task. I'll rectify the doc.

davidopp · 2016-12-11T22:29:53Z

It would be great to add a diagram to this proposal showing the "final state" of the components after this is fully implemented.

Just to make sure I understand the plan for controller manager (the main component I'm interested in): the basic idea is that the cloud-provider-specific bits will be factored into a separate binary, which will be different for each cloud provider and will run as a regular Kubernetes pod?

roberthbailey · 2016-12-12T07:07:03Z

contributors/design-proposals/cloud-provider-refactoring.md

+1. Cloud dependent controller-manager binaries
+2. Cloud independent controller-manager binaries - This is the existing `controller-manager` thats shipped with kubernetes releases presently.
+
+The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries.


please remove the "started using" parenthetical statement since that presumes a specific implementation, but in reality we may implement it differently for each cloud provider.

roberthbailey · 2016-12-12T07:18:37Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+### 6. Deployment, Upgrades and Downgrades
+
+This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run.


"For instance, once instance of external cloud controller manager should be run." <-- please remove this statement, as it presumes a specific implementation of the external controller manager. I would rather be able to run multiple instances for higher availability than design in a SPOF.

roberthbailey · 2016-12-12T07:21:54Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+##### 7.1 Transition plan
+
+Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default.  Users can set this and manually run a cloud controller to test or start the migration.  An Announce of deprecation of built-in cloud providers will also be done.


lowercase "Announce"

We can't really announce a deprecation until we have a replacement.

Can we shoot for 1.6 breaking at least some of the cloud-specific control loops into a new process? It's operationally more complex but proves it can be done.

thoughts?

@thockin In my understanding, that's exactly what's going on in kubernetes/kubernetes#34273

We're gonna make it possible to use the now in-tree cloudproviders using the new flow with this new, temporary binary cloud-controller-manager, so I think it is ok to announce that the long-term goal is using the external flow and therefore deprecating the current solution (while still supporting it quite a long time)

An Announce of deprecation of built-in cloud providers will also be done.

The grammar could actually be a bit cleaner :(

thockin

There are still a few comments unresolved here.

thockin · 2016-12-13T06:49:19Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.
+
+The routeController configures routes for hosts in Google Cloud.


thockin · 2016-12-13T06:56:41Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+##### 7.1 Transition plan
+
+Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default.  Users can set this and manually run a cloud controller to test or start the migration.  An Announce of deprecation of built-in cloud providers will also be done.


We can't really announce a deprecation until we have a replacement.

thockin · 2016-12-13T06:59:28Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+##### 7.1 Transition plan
+
+Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default.  Users can set this and manually run a cloud controller to test or start the migration.  An Announce of deprecation of built-in cloud providers will also be done.


Can we shoot for 1.6 breaking at least some of the cloud-specific control loops into a new process? It's operationally more complex but proves it can be done.

thoughts?

mfischer-zd · 2016-12-18T17:31:47Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries.
+
+Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops.


Does this dependency have the potential to make the external cloud-controller-manager much larger in terms of code size than it would be otherwise?

I see it as our controller-manager will shrink notably in size when all cloud provider deps have moved out.

No, I think those libraries are lightweight. Anyway, some logic have to exist, so vendoring in that logic from core is absolutely the best thing to do

Next question: what would be the communication protocol between the controllers? Too tight a binding, as opposed to some stable RPC-like protocol, could force us to rebuild the cloud-dependent controller every time kube-controller-manager is upgraded. We'd like to avoid that if possible.

That won't be the case. No communication will take place between the controllers.

After all, they are controllers. The official controller-manager will modify k8s objects normally and don't take anything related to a cloud into account while your custom-deployed cloud-controller-manager binary will only react to cloud events

Can you point me to some documentation that explains it? I'm at the edge of my understanding now - it's becoming clear that the "plugin" model I'm thinking of is not what's being discussed here. I also tried to search online for a generic description of how K8S controllers are supposed to work, without success.

Things I'm interested in include:

"reacting to cloud events" - how? What's the API, if any?

How to ensure this custom cloud-controller-manager would be a singleton to avoid duplicate and possibly conflicting event consumption/reaction

Sorry to pollute the discussion with this meta-questioning, but I appreciate being pointed in the right direction.

@mfischer-zd It might help to look up how the informer framework works in kubernetes. This is my favourite document about the informer framework - http://jayunit100.blogspot.com/2015/10/kubernetes-informers-controllers.html

Things I'm interested in include:

"reacting to cloud events" - how? What's the API, if any?

Reading up about informer framework is essential to understanding this. There is no cloud API per se. This is an example flow - When a kubelet registers a node with the apiserver, the controller manager reacts to this creation by talking to the cloudprovider and fetching information about the node. The information about the creation is brought to the controller manager via the informers.

How to ensure this custom cloud-controller-manager would be a singleton to avoid duplicate and possibly conflicting event consumption/reaction

First of all, multiple custom controller managers can be simultaneously with leader election. Secondly, Kubernetes objects are idempotent by design. Two objects will not step on each other because of this.

I also found the following, in case someone runs into this thread someday:

Controller basics

Leader Election API

wlan0 · 2017-01-05T22:04:44Z

ping @thockin. I've updated the doc after considering all the comments here. PTAL

thockin

I don't want to flog this proposal too much more. Just expand and clarify the expectations through transition, and think about the partially-upgraded case, which is a huge reality in things like GKE...

thockin · 2017-01-09T07:41:20Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared datastructures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity. 
+
+Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops.


"for now" or "for the duration of the transition" or "A reference form of" ... ? Basically, anyone who wants to write a cloud controller COULD. They don't HAVE TO use this lib, but it's a nice starting point.

thockin · 2017-01-09T07:44:36Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Volumes need cloud providers, but they only need SPECIFIC cloud providers.  The real problem is that cloud providers can take params from cloud config. The majority of the logic for volume management, currently resides in the controller-manager.
+
+There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core.


I am a little anxious about the timeline for flex volumes, still. We may need to revisit this. Any ideas for plan B?

thockin · 2017-01-09T07:47:53Z

contributors/design-proposals/cloud-provider-refactoring.md

+This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using
+
+```
+kubectl apply -f cloud-controller-manager.yml


What happens when we update the master and get the new controller, but the nodes have not been updated? How does the taints/initializers stuff work in the interim? Do we need an interlock like we have done before? e.g. one release has both old and new master logic and switches only when all nodes report updated kubelets, and then the subsequent release removes old logic?

@thockin, we can prevent this scenario where we've updated master but the nodes haven't updated by adding the taint logic into the master itself. If the master had a node admission controller that added these taints, then we'll only need to update master and any nodes that had the old logic would've talked to the cloudprovider themselves, and the nodes with the new logic wouldn't need to add any new params.

There is a flip side, where there will be updated nodes, but older masters will not have the admission controller for adding taints. To fix this, we'll need the interlock approach in the cloud-controller-manager. The CCM will retroactively process nodes and update cloud specific details in the first release and then this logic will be removed in the subsequent release.

I thought of this approach of using admission controllers instead of setting the taint in kubelet because of an issue faced by @luxas. It was restrictively difficult for kubeadm to update every kubelet with new params and he suggested that we use a different approach instead. After discussing, we found that the admission controller approach is preferrable.

Yes, using an admission controller that taints every node before the node has initialized is preferable.

thockin · 2017-01-09T07:50:08Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+##### 7.1 Transition plan
+
+Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager.


Can you flesh this out just a bit more? Specifically - the cloud-controller-manager binary is the same control loops that were in the main controller-manager, but re-factored as described here. Still uses the exact same cloudprovider libs. Yes? In other words, is it EXPECTED that everyone switches to CCM, or is it just optional and "beta" ?

thockin · 2017-01-09T07:51:39Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager.
+
+Release 1.7: In this release, all of the supported turnups will be converted to use cloud controller by default. At this point users will still be allowed to opt-out.  Users will have the option to run a monolithic cloud controller binary. The cloud controller library will still continue to use existing library. A deprecation announcement will be made. 


Code will be factored to reduce literal duplication between CM and CCM as much as possible?

thockin · 2017-01-09T07:52:58Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Release 1.8: The main change aimed for this release is to break up the various cloud providers into individual binaries. Users will still be allowed to opt-out. There will be a second warning to inform users about the deprecation of the `--cloud-provider` option in the controller-manager.
+
+Release 1.9: All of the legacy cloud providers will be completely removed in this version


This will be December - it sucks to plan a year ahead, but this is a huge and important changeset.

wlan0 · 2017-01-18T20:49:11Z

ping @thockin. I've updated the doc. PTAL

mtaufen · 2017-01-20T20:55:46Z

contributors/design-proposals/cloud-provider-refactoring.md

+Moving on to the kubelet, the following cloud provider dependencies exist in kubelet.
+
+ - Find the cloud nodename of the host that kubelet is running on for the following reasons :
+      1. To obtain the config map for the kubelet, if one already exists


Note that we will very very likely move away from this model to one where ConfigMaps can be shared between nodes, which will obviate the ConfigMap dependency on the nodename. For more info see kubernetes/kubernetes#29459

mtaufen · 2017-01-20T21:06:21Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.
+
+Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. 


@mml this could be solution to the problem we talked about in our meeting on Wed. (possible for pods to be scheduled to nodes that aren't ready (properly configured) yet).

@wlan0, @justinsb, and I discussed today and fleshed the idea out some more. I assume that will get added soon.

mtaufen · 2017-01-20T21:07:59Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. 
+
+This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints. 


How does this interact with static pods e.g. KubeProxy?

AFAIK taints are not comprehended by kubelet, so static pods would start.

I wouldn't count on that behavior. Any pods that want to run when a NoExecute taint is present will need a toleration.

thockin

I'm pretty happy with this. There's always more detail that can be added, and we should revisit this as we understand each step more.

@mml to have a look, but I am LGTM on this.

@wlan0 if no new comments or changes by Thursday, let's merge it.

thockin · 2017-01-23T22:11:23Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. 
+
+This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints. 


AFAIK taints are not comprehended by kubelet, so static pods would start.

thockin · 2017-01-23T22:11:39Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.
+
+Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. 


@wlan0, @justinsb, and I discussed today and fleshed the idea out some more. I assume that will get added soon.

bgrant0607 · 2017-01-24T07:58:30Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.
+
+NodeController does a lot more than just talk to the cloud. It does the following operations -


Do you plan to break up the NodeController?

bgrant0607 · 2017-01-24T07:59:31Z

contributors/design-proposals/cloud-provider-refactoring.md

+Kube-apiserver uses the cloud provider for two purposes
+
+1. Distribute SSH Keys - This can be moved to the cloud dependent controller manager
+2. Admission Controller for PV - This can be refactored using the taints approach used in Kubelet


See also the admission extension proposal: #132

bgrant0607 · 2017-01-24T08:01:44Z

contributors/design-proposals/cloud-provider-refactoring.md

+
+### 6. Deployment, Upgrades and Downgrades
+
+This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. 


The addon manager isn't currently required, but we need an equivalent that is.

bgrant0607 · 2017-01-24T08:12:34Z

contributors/design-proposals/cloud-provider-refactoring.md

+      1. To obtain the config map for the kubelet, if one already exists
+      2. To uniquely identify current node using nodeInformer
+      3. To instantiate a reference to the current node object
+ - Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it


It's not clear how this information is going to be discovered in the proposal below. The zone and id are generally needed in order to call the cloud provider to get more information about the node.

Isn't this information currently retrieved from the local metadata server, like EC2 and GCE do?

Anything the node needs to bootstrap could be fed in as flags/config locally. Example: hostname override. There is a class of variables that need to be known to get things going and are exceedingly unlikely to change, but are also difficult to express generically.

Node setup could get the region/zone from metadata or API or config volume (openstack) or SMBios or whever, and pass it to kubelet. We don't need an internal API for this.

errordeveloper · 2017-02-11T08:26:56Z

@wlan0 great work so far! Cannot wait for this to land, and would be keen to help. I can see there are just a few threads and @thockin is already quite happy with it overall. Please let me know if I can help.

keontang · 2017-02-23T06:47:29Z

I have read this proposal and the pr "start breaking up controller manager into two pieces" kubernetes/kubernetes#34273.

But i am still confused about how to use it.

|____cloud-controller-manager
| |____app
| | |____BUILD
| | |____controllermanager.go
| | |____options
| | | |____BUILD
| | | |____options.go
| |____BUILD
| |____controller-manager.go (main function)

Does it means for each cloudprovider, we should add the cloud special code into cloud-controller-manager directory, and update main function just like https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager?

@jbeda

Automatic merge from submit-queue (batch tested with PRs 41540, 41808, 41710, 41838, 41840) kubeadm: Remove the --cloud-provider flag for beta init UX **What this PR does / why we need it**: We decided the `--cloud-provider` flag promises way too much compared to what it really does. There is a lot you have to do as an user in order to make the current cloud provider integrations to work. And since we're promising to support the `kubeadm init` UX on a beta level in v1.6, we can't have this flag in the UX. A lot is gonna change here... see proposal: kubernetes/community#128 Once we find a cloudprovider solution we can support (probably using the new flow), we'll just add it. For now, we'll just document how to do cloudprovider integrations by hand instead. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: **Release note**: ```release-note kubeadm: Remove the --cloud-provider flag for beta init UX ``` @jbeda @dmmcquay @mikedanese @roberthbailey @pires @errordeveloper

keontang · 2017-02-24T06:46:23Z

@wlan0

thockin · 2017-02-28T01:04:03Z

@wlan0 - anythign blocking this PROPOSAL from being merged, in your opinion?

wlan0 · 2017-02-28T01:06:02Z

Nope

errordeveloper · 2017-02-28T04:37:23Z

Really glad to see progress on this! 🥇 🎉

thockin · 2017-03-01T08:18:55Z

Yeah, I nominate this overall changeset as yak-shave of the year :)

…

On Mon, Feb 27, 2017 at 8:37 PM, Ilya Dmitrichenko ***@***.*** > wrote: Really glad to see progress on this! 🥇 🎉 — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#128 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVAMEjLKRJluUrCkkMo60nKSPQDXXks5rg6SFgaJpZM4LB-qk> .

rrati · 2017-04-07T18:45:30Z

@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune @wlan0 @bgrant0607 I've got some available bandwidth and would like to help out with this effort. Is there a current status reflected somewhere?

@erictune

Automatic merge from submit-queue add rancher credential provider This adds rancher as a credential provider in kubernetes. @erictune This might be a good opportunity to discuss adding a provision for people to have their own credential providers that is similar to the new cloud provider changes (kubernetes/community#128). WDYT? ``` release-note Added Rancher Credential Provider to use Rancher Registry credentials when running in a Rancher cluster ```

@thockin

This KEP is a rewrite of an existing [design proposal][] which had been given prior [approval][]. Having done a bit of due diligence, mark @thockin as the approver. It is expected that WG Cloud Provider will find additional approvers once PRs which implement this KEP begin landing. [design proposal]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md [approval]: kubernetes#128 (review)

@thockin

This KEP is a rewrite of an existing [design proposal][] which had been given prior [approval][]. Having done a bit of due diligence, mark @thockin as the approver. It is expected that WG Cloud Provider will find additional approvers once PRs which implement this KEP begin landing. [design proposal]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md [approval]: kubernetes/community#128 (review)

Signed-off-by: Fabian Deutsch <fdeutsch@redhat.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 1, 2016

smarterclayton reviewed Dec 3, 2016

View reviewed changes

k82cn reviewed Dec 5, 2016

View reviewed changes

thockin reviewed Dec 7, 2016

View reviewed changes

thockin self-assigned this Dec 7, 2016

roberthbailey self-assigned this Dec 7, 2016

wlan0 force-pushed the master branch from 101149e to b0780cb Compare December 9, 2016 07:11

dcbw reviewed Dec 10, 2016

View reviewed changes

roberthbailey reviewed Dec 12, 2016

View reviewed changes

thockin reviewed Dec 13, 2016

View reviewed changes

luxas mentioned this pull request Dec 18, 2016

Make it possible to build a cloudprovider outside the kubernetes repo and run on stock kubernetes kubernetes/kubernetes#2770

Closed

luxas self-assigned this Dec 18, 2016

mfischer-zd reviewed Dec 18, 2016

View reviewed changes

liggitt mentioned this pull request Dec 19, 2016

Custom condition scheduler predicate kubernetes/kubernetes#37224

Closed

bboreham mentioned this pull request Dec 19, 2016

Support out-of-process and out-of-tree cloud providers kubernetes/enhancements#88

Closed

wlan0 force-pushed the master branch from b0780cb to 12508d1 Compare January 5, 2017 22:03

thockin reviewed Jan 9, 2017

View reviewed changes

Proposal for refactoring cloud provider API

9715bc1

wlan0 force-pushed the master branch from 12508d1 to 9715bc1 Compare January 18, 2017 20:48

wlan0 mentioned this pull request Jan 19, 2017

add rancher credential provider kubernetes/kubernetes#40160

Merged

mtaufen reviewed Jan 20, 2017

View reviewed changes

wlan0 mentioned this pull request Jan 23, 2017

add admission controller to set node taints kubernetes/kubernetes#40288

Closed

thockin reviewed Jan 23, 2017

View reviewed changes

bgrant0607 reviewed Jan 24, 2017

View reviewed changes

mtaufen mentioned this pull request Jan 30, 2017

Dynamic Kubelet Configuration Proposal kubernetes/kubernetes#29459

Closed

luxas mentioned this pull request Feb 19, 2017

kubeadm: Remove the --cloud-provider flag for beta init UX kubernetes/kubernetes#41710

Merged

paralin mentioned this pull request Feb 21, 2017

Digital Ocean load balancers / cloud provider kubernetes/kubernetes#41444

Closed

thockin merged commit 28e993f into kubernetes:master Feb 28, 2017

tamalsaha mentioned this pull request Mar 1, 2017

Track Kube's refactoring cloud provider API voyagermesh/voyager#36

Closed

wlan0 mentioned this pull request Mar 8, 2017

Make kubernetes cloudprovider independent rancher/rancher#8107

Closed

calebamiles mentioned this pull request Jan 10, 2018

move cloud provider kep #1583

Merged

MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021

Proposal for refactoring cloud provider API (kubernetes#128)

db6549a

rmohr pushed a commit to rmohr/community that referenced this pull request May 4, 2022

Update MAINTAINERS.md (kubernetes#128)

9604d02

Signed-off-by: Fabian Deutsch <fdeutsch@redhat.com>

danehans pushed a commit to danehans/community that referenced this pull request Jul 18, 2023

Quoting title for early disclosure request (kubernetes#128)

1cf7d53


		Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209)

		Admission controllers are meant to perform some aspects of initialization of objects. In this case, an ExternalAdmissionController will be configured to first mark the Node object as unusable by setting the NodeStatus as {"CloudProviderInitialized": "False"} (This exact keyword has not been decided yet), and then an external controller (residing in the cloud controller manager) will update it by removing the NodeStatus. In order for this to work, Nodes should be unusable until these Statuses are removed. This is possible because of the above mentioned PRs by @justinsb.


		### 6. Deployment

		This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run.


		The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations.

		The routeController configures routes for hosts in Google Cloud.


		Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed.

		Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209)


		### 6. Deployment, Upgrades and Downgrades

		This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run.


		##### 7.1 Transition plan

		Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done.


		The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries.

		Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops.


		The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared datastructures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity.

		Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops.


		Volumes need cloud providers, but they only need SPECIFIC cloud providers. The real problem is that cloud providers can take params from cloud config. The majority of the logic for volume management, currently resides in the controller-manager.

		There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core.


		##### 7.1 Transition plan

		Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager.


		Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager.

		Release 1.7: In this release, all of the supported turnups will be converted to use cloud controller by default. At this point users will still be allowed to opt-out. Users will have the option to run a monolithic cloud controller binary. The cloud controller library will still continue to use existing library. A deprecation announcement will be made.


		Release 1.8: The main change aimed for this release is to break up the various cloud providers into individual binaries. Users will still be allowed to opt-out. There will be a second warning to inform users about the deprecation of the `--cloud-provider` option in the controller-manager.

		Release 1.9: All of the legacy cloud providers will be completely removed in this version


		Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints.

		This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints.


		RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary.

		NodeController does a lot more than just talk to the cloud. It does the following operations -

Proposal for refactoring cloud provider API #128

Proposal for refactoring cloud provider API #128

Conversation

wlan0 commented Dec 1, 2016

philips commented Dec 2, 2016

wlan0 commented Dec 3, 2016

Upgrading kubelet and proxy

Upgrading plugins

Upgrading kubernetes core services

Applying the cloud-controller-manager

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Dec 19, 2016 • edited Loading

Choose a reason for hiding this comment

wlan0 Jan 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidopp Dec 11, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Dec 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlan0 commented Dec 9, 2016

derekwaynecarr commented Dec 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidopp commented Dec 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfischer-zd Dec 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfischer-zd Dec 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlan0 commented Jan 5, 2017

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlan0 commented Jan 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Dec 19, 2016 •

edited

Loading

wlan0 Jan 4, 2017 •

edited

Loading

davidopp Dec 11, 2016 •

edited

Loading

liggitt Dec 19, 2016 •

edited

Loading

mfischer-zd Dec 18, 2016 •

edited

Loading

mfischer-zd Dec 18, 2016 •

edited

Loading