-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for refactoring cloud provider API #128
Conversation
@wlan0 I would love to have language on our expected upgrade/downgrade path as that is becoming a common theme as Kubernetes matures. |
@philips Thanks for bringing this up. This is an important concern and I totally missed it. I'll add it to the document as well, but for now this is the upgrade path - Upgrading kubelet and proxyThe kubelet and proxy runs on every node in the kubernetes cluster. Based on your setup (systemd/other), you can follow the normal upgrade steps for it. This change does not affect the kubelet and proxy upgrade steps for your setup. Upgrading pluginsPlugins such as cni, flex volumes can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup. Upgrading kubernetes core servicesThe master node components (kube-controller-manager,kube-scheduler, kube-apiserver etc.) can be upgraded just as you normally upgrade them. This change does not affect the plugin upgrade steps for your setup. Applying the cloud-controller-managerThis is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using
This will start the cloud specific controller manager in your kuberentes setup. The downgrade steps are also the same as before for all the components except the cloud-controller-manager. In case of the cloud-controller-manager, the deployment should be deleted using
As always, any comments/thoughts/improvements/suggestions are welcome :) |
|
||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209) | ||
|
||
Admission controllers are meant to perform some aspects of initialization of objects. In this case, an ExternalAdmissionController will be configured to first mark the Node object as unusable by setting the NodeStatus as {"CloudProviderInitialized": "False"} (This exact keyword has not been decided yet), and then an external controller (residing in the cloud controller manager) will update it by removing the NodeStatus. In order for this to work, Nodes should be unusable until these Statuses are removed. This is possible because of the above mentioned PRs by @justinsb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flexible extension of admission control by third parties (out of tree) will be handled in proposal #132. In the meantime, an admission controller can remain in tree for this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing this @smarterclayton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt didn't like the admission control approach last I heard, perhaps he can elaborate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is CloudProviderInitialized? if it's a condition, it would need to be paired with scheduler changes to recognize conditions, and that could be accomplished by requiring the presence of a CloudProviderInitialized=True condition, which wouldn't need another admission plugin. If it's a taint, kubelets can register themselves with taints already, which also doesn't need an admission plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt I'm also convinced now that taints is the better approach. Will update doc shortly. While I was drafting this proposal, taints weren't ready yet.
|
||
### 6. Deployment | ||
|
||
This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is kubectl apply -f
required? In kube-mesos, I'm thinking to use one scheduler daemon to communicate with Mesos to reduce the number of binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@k82cn It is definitely not required. It is the recommended way, but as a power user, you are free to run it in your own way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlan0 , great, good to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great overall. The only things I'd like to see more detail on are transition plan and code/lib evolution. Something like:
release X: add a `--cloud=external` flag, disabled by default. Users can set this and manually run a cloud controller to test or start the migration. Announce deprecation of built-in cloud providers.
release X+1: convert all supported turnups to use cloud controller by default, still allow opt-out. Run a monolithic cloud controller binary with existing lib.
release X+2: still allow opt-out, but warn loudly. Break cloud controller into N individual binaries.
release X+3: remove legacy cloud providers
and
step1) break controller-manager into 2 binaries with no other changes. new cloud-controller takes a flag for which cloud-provider to use
step2) factor out all new controller loops from kubelet etc
step3) convert cloud controller to a lib, new main() for each cloud provider, still using cloud provider API
step4) move cloud providers out of tree
etc. Just more detail on the steps along the way - this is going to be VERY complex, and I want to think up-front about how to make a series of dozens of small, stable changes, rather than a few multi-thousand-line changes.
|
||
The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations. | ||
|
||
The routeController configures routes for hosts in Google Cloud. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not just google cloud, also AWS, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least GCP, AWS and Azure implement the Routes
interface in cloudprovider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenStack also implements it now that kubernetes/kubernetes#32663 is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
|
||
Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed. | ||
|
||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of ExternalAdmissionControllers introduced by @justinsb in these PRs - (https://github.com/kubernetes/kubernetes/pull/36210, https://github.com/kubernetes/kubernetes/pull/36209) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton ETA on a first cut for initializers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt had some opinions on how the node readiness should work using the scheduler rather than admission control. Jordan, can you elaborate a bit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this problem should be addressed using kubernetes/kubernetes#31647 (node registers with a taint to prevent pods from scheduling), not kubernetes/kubernetes#36209 (node creation admission controller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using taints (kubernetes/kubernetes#31647):
- requires no changes in the scheduler (good)
- requires no changes for a node with that taint to be explicitly tolerated by pods (good, I think)
- requires nodes to be started with known taints (not sure how big a deal this is... how do we envision cloud providers informing the initial set of taints a kubelet should register with?)
Allowing the scheduler to require the presence of a particular node condition (kubernetes/kubernetes#37224):
- requires a support for a condition predicate in the scheduler
- requires changing the scheduler config file if a new condition is desired
- doesn't require modifying kubelet startup... addition/removal of the conditions can be done by an external client
I can see scenarios where either approach would be preferable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think taints are preferable. I believe they are equally expressive the condition approach, and the rules are more obvious (not hidden in the scheduler config). Note that addition/removal of taints can be done by an external client -- they are just like labels. (Arguably we need some ACLs to restrict this, but that's another story). The only benefit I can see for using node conditions is that you don't need redundant node conditions and taints, so you avoid redundancy and don't have to worry about setting both. But I think the benefits of taints outweigh its downsides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that addition/removal of taints can be done by an external client -- they are just like labels
Except that addition by an external client could not be done synchronously with node creation time... it would lag, allowing for the possibility of the node being briefly schedulable. Node initializers could solve that, though I'm not sure how close those are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; I was just trying to point out that one of the properties you listed for the node condition approach ("doesn't require modifying kubelet startup... addition/removal of the conditions can be done by an external client") when comparing to taints, is actually a property of taints as well. I agree having kubelet apply the taints when it registers is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looks like the node condition PR is in a state of limbo. Unless that gets revived we only have one option - taints.
@thockin @roberthbailey Addressed all of your comments. The doc feels like its in a good position to get merged. |
In the future, I would like a strategy for controlling consumption of cloud api call quota. |
3. To instantiate a reference to the current node object | ||
- Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it | ||
- Periodically poll the cloud provider to figure out if the node has any new IP addresses associated with it | ||
- In case of Google cloud, it allows the provider to configure routes for the node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unclear. Kubelet doesn't have anything to do with the routes itself, it leaves all that to the route provider. What kubelet does (wrongly, see kubernetes/kubernetes#34398) is set the "NetworkUnavailable" condition whenever a cloud provider is specified. That's incorrect because the only thing that clears this condition is the route controller, and the route controller is only required if the networking setup uses direct routing between nodes for the pod network, which many plugins do not (including those with overlays).
In any case, I'd clarify to say something like "Assumes cloud providers require routes for network plugins (not always true) and blocks readiness until cloud routes are configured"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Setting that condition is the Kubelet task. I'll rectify the doc.
It would be great to add a diagram to this proposal showing the "final state" of the components after this is fully implemented. Just to make sure I understand the plan for controller manager (the main component I'm interested in): the basic idea is that the cloud-provider-specific bits will be factored into a separate binary, which will be different for each cloud provider and will run as a regular Kubernetes pod? |
1. Cloud dependent controller-manager binaries | ||
2. Cloud independent controller-manager binaries - This is the existing `controller-manager` thats shipped with kubernetes releases presently. | ||
|
||
The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the "started using" parenthetical statement since that presumes a specific implementation, but in reality we may implement it differently for each cloud provider.
|
||
### 6. Deployment, Upgrades and Downgrades | ||
|
||
This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. For instance, once instance of external cloud controller manager should be run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For instance, once instance of external cloud controller manager should be run." <-- please remove this statement, as it presumes a specific implementation of the external controller manager. I would rather be able to run multiple instances for higher availability than design in a SPOF.
|
||
##### 7.1 Transition plan | ||
|
||
Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase "Announce"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really announce a deprecation until we have a replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we shoot for 1.6 breaking at least some of the cloud-specific control loops into a new process? It's operationally more complex but proves it can be done.
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin In my understanding, that's exactly what's going on in kubernetes/kubernetes#34273
We're gonna make it possible to use the now in-tree cloudproviders using the new flow with this new, temporary binary cloud-controller-manager, so I think it is ok to announce that the long-term goal is using the external flow and therefore deprecating the current solution (while still supporting it quite a long time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An Announce of deprecation of built-in cloud providers will also be done.
The grammar could actually be a bit cleaner :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few comments unresolved here.
|
||
The volumeController uses the cloudprovider to create, delete, attach and detach volumes to nodes. For instance, the logic for provisioning, attaching, and detaching a EBS volume resides in the AWS cloudprovider. The volumeController uses this code to perform its operations. | ||
|
||
The routeController configures routes for hosts in Google Cloud. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
|
||
##### 7.1 Transition plan | ||
|
||
Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really announce a deprecation until we have a replacement.
|
||
##### 7.1 Transition plan | ||
|
||
Release 1.6: Add a `--cloud-provider=external` flag for the controller-manager. This flag will disable the cloud specific controller loops. This flag will be disabled by default. Users can set this and manually run a cloud controller to test or start the migration. An Announce of deprecation of built-in cloud providers will also be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we shoot for 1.6 breaking at least some of the cloud-specific control loops into a new process? It's operationally more complex but proves it can be done.
thoughts?
|
||
The cloud dependent binaries will run those loops that rely on cloudprovider as a kubernetes system service (started using `kubectl apply -f`). The rest of the controllers will be run in the cloud independent controller manager. The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider makes implementation simple. If drilled down further into the controller loops, it would be significantly harder to disentangle, due to tight coupling of the cloud provider specific code with the rest of the data structures/functions within a controller. There would be a lot of duplicated code, which needs to be kept in sync, across the "dependent" loop implementations. For example, StatefulSet would not have gotten to beta in 1.5 if we had to change 7 node controller loops in various cloud provider binaries. | ||
|
||
Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this dependency have the potential to make the external cloud-controller-manager much larger in terms of code size than it would be otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it as our controller-manager will shrink notably in size when all cloud provider deps have moved out.
No, I think those libraries are lightweight. Anyway, some logic have to exist, so vendoring in that logic from core is absolutely the best thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next question: what would be the communication protocol between the controllers? Too tight a binding, as opposed to some stable RPC-like protocol, could force us to rebuild the cloud-dependent controller every time kube-controller-manager is upgraded. We'd like to avoid that if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't be the case. No communication will take place between the controllers.
After all, they are controllers. The official controller-manager will modify k8s objects normally and don't take anything related to a cloud into account while your custom-deployed cloud-controller-manager binary will only react to cloud events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to some documentation that explains it? I'm at the edge of my understanding now - it's becoming clear that the "plugin" model I'm thinking of is not what's being discussed here. I also tried to search online for a generic description of how K8S controllers are supposed to work, without success.
Things I'm interested in include:
- "reacting to cloud events" - how? What's the API, if any?
- How to ensure this custom cloud-controller-manager would be a singleton to avoid duplicate and possibly conflicting event consumption/reaction
Sorry to pollute the discussion with this meta-questioning, but I appreciate being pointed in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfischer-zd It might help to look up how the informer framework works in kubernetes. This is my favourite document about the informer framework - http://jayunit100.blogspot.com/2015/10/kubernetes-informers-controllers.html
Things I'm interested in include:
"reacting to cloud events" - how? What's the API, if any?
Reading up about informer framework is essential to understanding this. There is no cloud API per se. This is an example flow - When a kubelet registers a node with the apiserver, the controller manager reacts to this creation by talking to the cloudprovider and fetching information about the node. The information about the creation is brought to the controller manager via the informers.
How to ensure this custom cloud-controller-manager would be a singleton to avoid duplicate and possibly conflicting event consumption/reaction
First of all, multiple custom controller managers can be simultaneously with leader election. Secondly, Kubernetes objects are idempotent by design. Two objects will not step on each other because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found the following, in case someone runs into this thread someday:
ping @thockin. I've updated the doc after considering all the comments here. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to flog this proposal too much more. Just expand and clarify the expectations through transition, and think about the partially-upgraded case, which is a huge reality in things like GKE...
|
||
The decision to run entire controller loops, rather than only the very minute parts that rely on cloud provider was made because it makes the implementation simple. Otherwise, the shared datastructures and utility functions have to be disentangled, and carefully separated to avoid any concurrency issues. This approach among other things, prevents code duplication and improves development velocity. | ||
|
||
Note that the controller loop implementation will continue to reside in the core repository. It takes in cloudprovider.Interface as an input in its constructor. Vendor maintained cloud-controller-manager binary will link these controllers in. It will also initialize the cloudprovider object and pass it into the controller loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for now" or "for the duration of the transition" or "A reference form of" ... ? Basically, anyone who wants to write a cloud controller COULD. They don't HAVE TO use this lib, but it's a nice starting point.
|
||
Volumes need cloud providers, but they only need SPECIFIC cloud providers. The real problem is that cloud providers can take params from cloud config. The majority of the logic for volume management, currently resides in the controller-manager. | ||
|
||
There is an undergoing effort to move all of the volume logic from the controller-manager into plugins called Flex Volumes. In the Flex volumes world, all of the vendor specific code will be packaged in a separate binary as a plugin. After discussing with @thockin, this was decidedly the best approach to remove all cloud provider dependency for volumes out of kubernetes core. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little anxious about the timeline for flex volumes, still. We may need to revisit this. Any ideas for plan B?
This is the only step that is different in the upgrade process. In order to complete the upgrade process, you need to apply the cloud-controller-manager deployment to the setup. A deployment descriptor file will be provided with this change. You need to apply this change using | ||
|
||
``` | ||
kubectl apply -f cloud-controller-manager.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when we update the master and get the new controller, but the nodes have not been updated? How does the taints/initializers stuff work in the interim? Do we need an interlock like we have done before? e.g. one release has both old and new master logic and switches only when all nodes report updated kubelets, and then the subsequent release removes old logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin, we can prevent this scenario where we've updated master but the nodes haven't updated by adding the taint logic into the master itself. If the master had a node admission controller that added these taints, then we'll only need to update master and any nodes that had the old logic would've talked to the cloudprovider themselves, and the nodes with the new logic wouldn't need to add any new params.
There is a flip side, where there will be updated nodes, but older masters will not have the admission controller for adding taints. To fix this, we'll need the interlock approach in the cloud-controller-manager. The CCM will retroactively process nodes and update cloud specific details in the first release and then this logic will be removed in the subsequent release.
I thought of this approach of using admission controllers instead of setting the taint in kubelet because of an issue faced by @luxas. It was restrictively difficult for kubeadm to update every kubelet with new params and he suggested that we use a different approach instead. After discussing, we found that the admission controller approach is preferrable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using an admission controller that taints every node before the node has initialized is preferable.
|
||
##### 7.1 Transition plan | ||
|
||
Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you flesh this out just a bit more? Specifically - the cloud-controller-manager binary is the same control loops that were in the main controller-manager, but re-factored as described here. Still uses the exact same cloudprovider libs. Yes? In other words, is it EXPECTED that everyone switches to CCM, or is it just optional and "beta" ?
|
||
Release 1.6: Add the first implementation of the cloud-controller-manager binary. This binary's purpose is to let users run two controller managers and address any issues that they uncover, that we might have missed. It also doubles as a reference implementation to the external cloud controller manager for the future. Since the cloud-controller-manager runs cloud specific controller loops, it is important to ensure that the kube-controller-manager does not run these loops as well. This is done by leaving the `--cloud-provider` flag unset in the kube-controller-manager. | ||
|
||
Release 1.7: In this release, all of the supported turnups will be converted to use cloud controller by default. At this point users will still be allowed to opt-out. Users will have the option to run a monolithic cloud controller binary. The cloud controller library will still continue to use existing library. A deprecation announcement will be made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code will be factored to reduce literal duplication between CM and CCM as much as possible?
|
||
Release 1.8: The main change aimed for this release is to break up the various cloud providers into individual binaries. Users will still be allowed to opt-out. There will be a second warning to inform users about the deprecation of the `--cloud-provider` option in the controller-manager. | ||
|
||
Release 1.9: All of the legacy cloud providers will be completely removed in this version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be December - it sucks to plan a year ahead, but this is a huge and important changeset.
ping @thockin. I've updated the doc. PTAL |
Moving on to the kubelet, the following cloud provider dependencies exist in kubelet. | ||
|
||
- Find the cloud nodename of the host that kubelet is running on for the following reasons : | ||
1. To obtain the config map for the kubelet, if one already exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we will very very likely move away from this model to one where ConfigMaps can be shared between nodes, which will obviate the ConfigMap dependency on the nodename. For more info see kubernetes/kubernetes#29459
|
||
Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed. | ||
|
||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mml this could be solution to the problem we talked about in our meeting on Wed. (possible for pods to be scheduled to nodes that aren't ready (properly configured) yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. | ||
|
||
This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with static pods e.g. KubeProxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK taints are not comprehended by kubelet, so static pods would start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't count on that behavior. Any pods that want to run when a NoExecute taint is present will need a toleration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. | ||
|
||
This approach requires kubelet to be started with known taints. This will make the node unschedulable until these taints are removed. The external cloud controller manager will asynchronously update the node objects and remove the taints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK taints are not comprehended by kubelet, so static pods would start.
|
||
Scrubbing DNS, after discussing with @thockin, was found to be redundant. So, it can be disregarded. It is being removed. | ||
|
||
Finally, Node initialization needs to be addressed. This is the trickiest part. Pods will be scheduled even on uninitialized nodes. This can lead to scheduling pods on incompatible zones, and other weird errors. Therefore, an approach is needed where kubelet can create a Node, but mark it as "NotReady". Then, some asynchronous process can update it and mark it as ready. This is now possible because of the concept of Taints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
RouteController and serviceController are entirely cloud specific. Therefore, it is really simple to move these two controller loops out of the cloud-independent binary and into the cloud dependent binary. | ||
|
||
NodeController does a lot more than just talk to the cloud. It does the following operations - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to break up the NodeController?
Kube-apiserver uses the cloud provider for two purposes | ||
|
||
1. Distribute SSH Keys - This can be moved to the cloud dependent controller manager | ||
2. Admission Controller for PV - This can be refactored using the taints approach used in Kubelet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also the admission extension proposal: #132
|
||
### 6. Deployment, Upgrades and Downgrades | ||
|
||
This change will introduce new binaries to the list of binaries required to run kubernetes. The change will be designed such that these binaries can be installed via `kubectl apply -f` and the appropriate instances of the binaries will be running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addon manager isn't currently required, but we need an equivalent that is.
1. To obtain the config map for the kubelet, if one already exists | ||
2. To uniquely identify current node using nodeInformer | ||
3. To instantiate a reference to the current node object | ||
- Find the InstanceID, ProviderID, ExternalID, Zone Info of the node object while initializing it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear how this information is going to be discovered in the proposal below. The zone and id are generally needed in order to call the cloud provider to get more information about the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything the node needs to bootstrap could be fed in as flags/config locally. Example: hostname override. There is a class of variables that need to be known to get things going and are exceedingly unlikely to change, but are also difficult to express generically.
Node setup could get the region/zone from metadata or API or config volume (openstack) or SMBios or whever, and pass it to kubelet. We don't need an internal API for this.
I have read this proposal and the pr "start breaking up controller manager into two pieces" kubernetes/kubernetes#34273. But i am still confused about how to use it.
Does it means for each cloudprovider, we should add the cloud special code into cloud-controller-manager directory, and update main function just like https://github.com/wlan0/kubernetes/tree/to_show/cmd/external-controller-manager? |
Automatic merge from submit-queue (batch tested with PRs 41540, 41808, 41710, 41838, 41840) kubeadm: Remove the --cloud-provider flag for beta init UX **What this PR does / why we need it**: We decided the `--cloud-provider` flag promises way too much compared to what it really does. There is a lot you have to do as an user in order to make the current cloud provider integrations to work. And since we're promising to support the `kubeadm init` UX on a beta level in v1.6, we can't have this flag in the UX. A lot is gonna change here... see proposal: kubernetes/community#128 Once we find a cloudprovider solution we can support (probably using the new flow), we'll just add it. For now, we'll just document how to do cloudprovider integrations by hand instead. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: **Release note**: ```release-note kubeadm: Remove the --cloud-provider flag for beta init UX ``` @jbeda @dmmcquay @mikedanese @roberthbailey @pires @errordeveloper
@wlan0 - anythign blocking this PROPOSAL from being merged, in your opinion? |
Nope |
Really glad to see progress on this! 🥇 🎉 |
Yeah, I nominate this overall changeset as yak-shave of the year :)
…On Mon, Feb 27, 2017 at 8:37 PM, Ilya Dmitrichenko ***@***.*** > wrote:
Really glad to see progress on this! 🥇 🎉
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#128 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVAMEjLKRJluUrCkkMo60nKSPQDXXks5rg6SFgaJpZM4LB-qk>
.
|
@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune @wlan0 @bgrant0607 I've got some available bandwidth and would like to help out with this effort. Is there a current status reflected somewhere? |
Automatic merge from submit-queue add rancher credential provider This adds rancher as a credential provider in kubernetes. @erictune This might be a good opportunity to discuss adding a provision for people to have their own credential providers that is similar to the new cloud provider changes (kubernetes/community#128). WDYT? ``` release-note Added Rancher Credential Provider to use Rancher Registry credentials when running in a Rancher cluster ```
This KEP is a rewrite of an existing [design proposal][] which had been given prior [approval][]. Having done a bit of due diligence, mark @thockin as the approver. It is expected that WG Cloud Provider will find additional approvers once PRs which implement this KEP begin landing. [design proposal]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md [approval]: kubernetes#128 (review)
This KEP is a rewrite of an existing [design proposal][] which had been given prior [approval][]. Having done a bit of due diligence, mark @thockin as the approver. It is expected that WG Cloud Provider will find additional approvers once PRs which implement this KEP begin landing. [design proposal]: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloud-provider-refactoring.md [approval]: kubernetes/community#128 (review)
Signed-off-by: Fabian Deutsch <fdeutsch@redhat.com>
@thockin @luxas @roberthbailey @chrislovecnm @liggitt @saad-ali @erictune
This PR is a copy of the rectified contents of this PR - kubernetes/kubernetes#37037
PTAL