-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: Basic internals docs #542
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: phyber If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This has helped us internally get started with Cluster API, hence PRing here. /lgtm |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start to a hard problem. We've discussed the need for documentation of this sort for a while now, and probably should again, especially now that there are more mature providers, in particular, cluster-api-provider-aws.
Some ideas for moving this forward are to 0) start with the original design, and/or 1) submit an outline so that others can fill in the details. It's unclear if the latter idea will work since writing by committee doesn't lend itself to great (or even tolerable) prose. I'd like to see if I can help here. I've been out of pocket for the last few weeks but should be mostly free for the rest of the month.
docs/internals/controllers.md
Outdated
is that the machine controller has to take into account that a resource in | ||
a request may not exist, as such, it must account for things like having to | ||
create them to begin with. | ||
The cluster controller has the privilege of knowing that the cluster already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be true. There are providers for which the Cluster
and Machine
objects exist in a cluster which is different than the cluster they represent (cf. Gardner, cluster-api-provider-ssh, etc.)
Some differences between the cluster and machine controllers are that they 0) manage different cluster-api resources, 1) Cluster
resources may contain configuration and status which is shared between Machine
resources, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering that while writing this. I've been working in the AWS provider, so the above made sense there. I'll try to reword this based on what you've said here.
docs/internals/machine_actuator.md
Outdated
resource request and calls the appropriate machine actuator methods in order to | ||
realise a machine state. | ||
|
||
## Basic Actuator Flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
Note that, modulo my comments above, I am not necessarily opposed to merging this. |
I've realised that the flow I've put under |
OK, I've changed the language here and (hopefully) improved things a little. It is now noted that the definition of a cluster and machine is up to the provider implementation, and that in the simple case they may be things like networks, linux instances, etc, but not necessarily. Basic flow for the cluster controller was added and retryable errors were noted in both cluster and machine controllers. The main controllers doc was expanded a little and links to the cluster and machine controller docs. |
docs/internals/controllers.md
Outdated
The main controller implementations ([cluster] and [machine]) are located | ||
within the `cluster-api` library, and each perform almost identical basic | ||
steps within their `Reconcile` methods. The controllers have the responsibility | ||
of receiving incoming requests and dispatching them to the appropriate actuator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps me to think of this in different terms than "a request is received from Kubernetes." Namel,y the controller watches for changes to the resources. For example, the machine controller watches for changes to the machine resource. If a Machine object is created, updated, or deleted, the controller receives that object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I've fixed some language in this area to mention the watches, and fixed up the basic flow to mention that after a watch is observed, a reconcile request is received before the controller fetches the cluster/machine object from Kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
This is ready for another review and merge if there are no more issues. |
1e43943
to
5df5a8e
Compare
/lgtm |
machine controller has been waiting for the cluster to be ready before it | ||
starts working on creating the machine resources. | ||
|
||
## Basic Controller Flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this is documenting the current flow, which is subject to change.
@@ -0,0 +1,34 @@ | |||
# Controllers | |||
|
|||
The main controller implementations ([cluster] and [machine]) are located |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the main controllers that providers care about; for users, there are also the machine deployment and machine set controllers.
- Controller watches for changes on a resource type | ||
- A change on a watched resource type is observed and the controller | ||
receives a reconcile request | ||
- An attempt is made to fetch the appropriate object from Kubernetes; and if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still true after switching to CRDs? I thought the framework took care of this part now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does still appear to be the case in the machine/controller.go and cluster/controller.go files.
New changes are detected. LGTM label has been removed. |
5df5a8e
to
cbf87de
Compare
@phyber: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
We should have docs for the current flow, because that's what implementers have to deal with right now. Either let's get this PR merged in an acceptable state or get @davidewatson 's gitbook in, but please don't let perfect be the enemy of having some documentation that's helpful to people who are not the original developers of Cluster API. I want to onboard more engineers from our side onto provider implementations, but we're fumbling around, trying to figure out original intent, looking at the GCP implementation, instead of having anything resembling a clear guide. |
I totally agree. We discussed during the call today that we want the githbook, but since that will be at least a week out I'm ok merging this in the mean time if you'd prefer. |
From Slack, by @phyber :
|
Closing in favor of #566. /close |
@roberthbailey: Closing this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…er-config Add missing cloud provider config
What this PR does / why we need it: Adds an
internals
section to thedocs/
directory and adds basic internals documentation for actuators and controllers. Also adds a few.gitignore
s forvim
related files.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Related to #505
Special notes for your reviewer: This is by no means complete, but hopefully provides a small base for people to start from.
Release note: