Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Basic internals docs #542

Closed
wants to merge 1 commit into from

Conversation

phyber
Copy link

@phyber phyber commented Oct 15, 2018

What this PR does / why we need it: Adds an internals section to the docs/ directory and adds basic internals documentation for actuators and controllers. Also adds a few .gitignores for vim related files.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related to #505

Special notes for your reviewer: This is by no means complete, but hopefully provides a small base for people to start from.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 15, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: phyber
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jessicaochen

If they are not already assigned, you can assign the PR to them by writing /assign @jessicaochen in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 15, 2018
@randomvariable
Copy link
Member

This has helped us internally get started with Cluster API, hence PRing here.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2018
@randomvariable
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 15, 2018
Copy link
Contributor

@davidewatson davidewatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start to a hard problem. We've discussed the need for documentation of this sort for a while now, and probably should again, especially now that there are more mature providers, in particular, cluster-api-provider-aws.

Some ideas for moving this forward are to 0) start with the original design, and/or 1) submit an outline so that others can fill in the details. It's unclear if the latter idea will work since writing by committee doesn't lend itself to great (or even tolerable) prose. I'd like to see if I can help here. I've been out of pocket for the last few weeks but should be mostly free for the rest of the month.

is that the machine controller has to take into account that a resource in
a request may not exist, as such, it must account for things like having to
create them to begin with.
The cluster controller has the privilege of knowing that the cluster already
Copy link
Contributor

@davidewatson davidewatson Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be true. There are providers for which the Cluster and Machine objects exist in a cluster which is different than the cluster they represent (cf. Gardner, cluster-api-provider-ssh, etc.)

Some differences between the cluster and machine controllers are that they 0) manage different cluster-api resources, 1) Cluster resources may contain configuration and status which is shared between Machine resources, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering that while writing this. I've been working in the AWS provider, so the above made sense there. I'll try to reword this based on what you've said here.

docs/internals/machine_actuator.md Outdated Show resolved Hide resolved
resource request and calls the appropriate machine actuator methods in order to
realise a machine state.

## Basic Actuator Flow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

@davidewatson
Copy link
Contributor

Note that, modulo my comments above, I am not necessarily opposed to merging this.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2018
@phyber
Copy link
Author

phyber commented Oct 16, 2018

I've realised that the flow I've put under Basic Actuator Flow is actually the Basic Controller Flow. I'll be taking care of this in some upcoming pushes.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 16, 2018
@phyber
Copy link
Author

phyber commented Oct 16, 2018

OK, I've changed the language here and (hopefully) improved things a little. It is now noted that the definition of a cluster and machine is up to the provider implementation, and that in the simple case they may be things like networks, linux instances, etc, but not necessarily.

Basic flow for the cluster controller was added and retryable errors were noted in both cluster and machine controllers. The main controllers doc was expanded a little and links to the cluster and machine controller docs.

The main controller implementations ([cluster] and [machine]) are located
within the `cluster-api` library, and each perform almost identical basic
steps within their `Reconcile` methods. The controllers have the responsibility
of receiving incoming requests and dispatching them to the appropriate actuator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps me to think of this in different terms than "a request is received from Kubernetes." Namel,y the controller watches for changes to the resources. For example, the machine controller watches for changes to the machine resource. If a Machine object is created, updated, or deleted, the controller receives that object.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I've fixed some language in this area to mention the watches, and fixed up the basic flow to mention that after a watch is observed, a reconcile request is received before the controller fetches the cluster/machine object from Kubernetes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@phyber
Copy link
Author

phyber commented Oct 19, 2018

This is ready for another review and merge if there are no more issues.

@randomvariable
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2018
docs/internals/cluster_controller.md Outdated Show resolved Hide resolved
machine controller has been waiting for the cluster to be ready before it
starts working on creating the machine resources.

## Basic Controller Flow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this is documenting the current flow, which is subject to change.

@@ -0,0 +1,34 @@
# Controllers

The main controller implementations ([cluster] and [machine]) are located
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the main controllers that providers care about; for users, there are also the machine deployment and machine set controllers.

- Controller watches for changes on a resource type
- A change on a watched resource type is observed and the controller
receives a reconcile request
- An attempt is made to fetch the appropriate object from Kubernetes; and if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true after switching to CRDs? I thought the framework took care of this part now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does still appear to be the case in the machine/controller.go and cluster/controller.go files.

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2018
@k8s-ci-robot
Copy link
Contributor

@phyber: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-cluster-api-test cbf87de link /test pull-cluster-api-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@randomvariable
Copy link
Member

I assume that this is documenting the current flow, which is subject to change.

We should have docs for the current flow, because that's what implementers have to deal with right now.

Either let's get this PR merged in an acceptable state or get @davidewatson 's gitbook in, but please don't let perfect be the enemy of having some documentation that's helpful to people who are not the original developers of Cluster API.

I want to onboard more engineers from our side onto provider implementations, but we're fumbling around, trying to figure out original intent, looking at the GCP implementation, instead of having anything resembling a clear guide.

@roberthbailey
Copy link
Contributor

Either let's get this PR merged in an acceptable state or get @davidewatson 's gitbook in, but please don't let perfect be the enemy of having some documentation that's helpful to people who are not the original developers of Cluster API.

I totally agree. We discussed during the call today that we want the githbook, but since that will be at least a week out I'm ok merging this in the mean time if you'd prefer.

@randomvariable
Copy link
Member

From Slack, by @phyber :

@dwat I hear that the controller documentation from my PR is making its way into the Gitbook in an upcoming PR of yours. That's fine, and I'm happy to close my PR. No point merging it to have it be removed again when the book shows up shortly 🙂 (Will leave the PR open until an ACK later)

@roberthbailey
Copy link
Contributor

Closing in favor of #566.

/close

@k8s-ci-robot
Copy link
Contributor

@roberthbailey: Closing this PR.

In response to this:

Closing in favor of #566.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jayunit100 pushed a commit to jayunit100/cluster-api that referenced this pull request Jan 31, 2020
…er-config

Add missing cloud provider config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants