Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Operators vs. controller pattern #300

Closed
jlewi opened this issue Jan 12, 2018 · 10 comments
Closed

[Discussion] Operators vs. controller pattern #300

jlewi opened this issue Jan 12, 2018 · 10 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jan 12, 2018

Are operators and controllers actually two different patterns or just different terminology?

When I originally created the TfJob controller I based it on the CoreOs etcd operator.

We are refactoring the code to be a controller #206 and use more K8s infrastructure to support controllers.

However, as far as I can tell operators aren't fundamentally architected differently than controllers. Am I missing something?

/cc @gaocegege @wackxu @enisoc

@gaocegege
Copy link
Member

gaocegege commented Jan 12, 2018

I think they are two patterns, operator is issued by coreos and the controller is used in Kubernetes: https://github.com/kubernetes/kubernetes/tree/master/pkg/controller.

/cc @ScorpioCPH @DjangoPeng @ddysher

@enisoc
Copy link

enisoc commented Jan 12, 2018

Since CoreOS coined the term "Operator", their article is the authority on what they mean by that:

An Operator is an application-specific controller that extends the Kubernetes API to create, configure and manage instances of complex stateful applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts, but also includes domain or application-specific knowledge to automate common tasks better managed by computers.

We use Operators because managing stateful applications, like databases, caches and monitoring systems, is a big challenge, especially at massive scale. These systems require human operational knowledge to correctly scale, upgrade and reconfigure while at the same time protecting against data loss and unavailability.

To paraphrase: All Operators use the controller pattern, but not all controllers are Operators. It's only an Operator if it's got: controller pattern + API extension + single-app focus.

TfJob is a good example of an Operator, because it's a custom controller + CRD that's focused only on running one particular app (TensorFlow). Things like BlueGreenDeployment or IndexedJob implement general patterns that apply abstractly to "whatever it is you're running", so although they are also custom controllers + CRDs, they are not Operators.

@resouer
Copy link

resouer commented Jan 12, 2018

My 2 cents.

Operator is a customized controller implement with CRD. It follow the same pattern with build-in controllers (i.e. watch, diff, action).

The key idea of Operator is providing you with a framework to do extra operation during installation or scaling instances. e.g. register the new instance to master onAdd() of it. These operation can include alert and act on failure, backup, or reconfigure etc.

But the app itself is still deployed with Deployment, ReplicaSet, or even StatefulSet, Operator just provide you with a way to automatically "operate" them by following controller pattern.

Based on those above, I guess you are now actually writing your own version of Operator. 😃 You may want to consider using it directly.

My friend @hongchaodeng from CoreOS would be the best person to final this discussion and plz correct my random words if anything wrong.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 12, 2018

This is very helpful.

My key takeway is that in the context of the TfJob CRD "operator vs. controller" is mostly semantics and doesn't really refer to a different design for the TfJob controller.

To be more specific, the TfJob controller was created by copying the CoreOs etcd operator. Which wasn't using Informer and Controller classes like https://github.com/kubernetes/sample-controller. My working assumption is that the etcd-operator preceded the existence of these libraries and that's why it didn't use them.

@gaocegege
Copy link
Member

gaocegege commented Jan 13, 2018

It is helpful and thanks :-)

I am actually thinking of etcd-operator when we talk about operator. And after looking through Prometheus Operator and kong operator, I found that they are different in implementation although they are all operators. Then I agree with enisoc@ now

All Operators use the controller pattern, but not all controllers are Operators. It's only an Operator if it's got: controller pattern + API extension + single-app focus.

Operator is just a concept, not a pattern,my opinion is corrected.

We copy the code from etcd-operator, and I have a question about the implementation:

  • etcd-operator seems to maintain an in-memory map in the controller, then what will happen if the operator is restarted?(The map will be re-initialized)

And as resouer@ said hongchaodeng@ could give us more helpful information about it :-) I am not sure if I understand the code, if there are some things that I missed please correct me :-)

@alexellis
Copy link

alexellis commented Jan 13, 2018

I was sent a link here - very interesting conversation. We're starting to embrace CRDs on OpenFaaS - there are other distinctions at play here. i.e the difference between an event-driven controller with "owner references" for its CRDs vs. a controller that simply polls for state and remediates that way.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 13, 2018

@alexellis what are power references?

@alexellis
Copy link

Typo - owner references - claims about CRDs made by a controller to receive relevant events.

@jzelinskie
Copy link

jzelinskie commented Jan 14, 2018

An operator is a Kubernetes controller that understands 2 domains: Kubernetes and something else. By combining knowledge of both domains, it can automate tasks that usually require a human operator that understands both domains.

Operator is just a concept, not a pattern

This is correct. Within CoreOS, internally and externally, there are a variety of operators each with different designs depending on what the operator needs to do.

There is an extremely common design where you use a Custom Resource to represent a group of Kubernetes resources that your operator is managing. The etcd Operator is an example of this: it uses the EtcdCluster resource to represent all of the Kubernetes resources required for a single cluster and it handles any extra logic for those resources (e.g. how to properly replace a node in the etcd quorum).

Within reason, I highly recommend you structure your code like the sample-controller repository. The etcd operator is the original code that inspired the idea of an operator and, as such, is not necessarily using the best practices.

@DjangoPeng
Copy link
Member

@gaocegege @jlewi Sorry for the late reply.

Operator is just a concept, not a pattern

I strongly agree with the viewpoint above.

I also think https://github.com/kubernetes/sample-controller is a best practice of operator, and that is what we did in https://github.com/caicloud/kubeflow-controller.

Now we are planing to merge our implementation to upstream. I believe @gaocegege would open a PR to do the merging work this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants