Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Integration #222

Closed
bradbeam opened this issue May 3, 2018 · 16 comments · Fixed by #786
Closed

Prometheus Integration #222

bradbeam opened this issue May 3, 2018 · 16 comments · Fixed by #786
Labels
design kind/feature Categorizes issue or PR as related to a new feature.

Comments

@bradbeam
Copy link
Contributor

bradbeam commented May 3, 2018

We should consider adding in some basic Prometheus support in the generated operator. Just including the client library and exposing the metrics path I think would be a good start, perhaps define/increase a counter in the handler function to set the pattern up.

@spahl
Copy link
Contributor

spahl commented May 4, 2018

Agreed. We have been thinking about similar things but have not gotten to those yet. (https://github.com/operator-framework/operator-sdk/blob/master/doc/design/long-term/simple-extensions.md)

We would welcome any proposal if you want to give it a shot.

@etiennecoutaud
Copy link
Contributor

@spahl I'm working on adding basic Prometheus metrics in the operator, should we generate service and service monitor to expose metrics port ?

@robszumski
Copy link
Contributor

I think that makes sense @etiennecoutaud. The main consideration is whether there will be a cluster-wide prometheus watching for all service monitors or one per namespace. We're leaning towards the per-namespace method for most use-cases, as this gives more control over metrics retention and access control to the data.

@spahl
Copy link
Contributor

spahl commented May 11, 2018

Sorry I missed that. It does make a lot of sense.

@etiennecoutaud
Copy link
Contributor

@spahl @robszumski Who is working on prometheus integration ? I will add service and servicemonitor when kubernetes manifest are generating

@spahl
Copy link
Contributor

spahl commented May 14, 2018

No one at this point. It is in the top list of things we want to add though. Any help is welcome.

@hasbro17
Copy link
Contributor

We need to consider how the SDK should help an operator its metrics to Prometheus via a Service and ServiceMonitor. One way is to generate the manifests at build time and expect the user to configure and create them. This is the approach in #241

There are a couple of issues with this approach:

  • The user has to change the configuration in multiple places if the default port or labels selectors for the Service or ServiceMonitor need to be changed.
  • Generating the manifests at build time leaves the user one more thing to do to enable a feature that could work out of the box.

A preferred approach would be to have the operator expose the metrics on a desired port and create only the Service at runtime by default. The user can pass in some configuration to the operator to override the default values for the port and Service.

The following fields can be made configurable via flags passed to the operator:

  • --metrics-port: The port number to serve metrics. This can have a default e.g 9090 that can be configured if needed.
  • --service-selector: The Service selector label spec.selector that’s used to select the operator pod. The default can be the same label generated for the operator deployment manifest e.g name=memcached-operator. This will need to be changed if the user ever updates the default labels of the operator deployment.
  • --disable-metrics: By default this is false. Can be set to true so the operator will not expose the metrics or create the service.

With the above flags the operator can be configured to setup the prometheus integration in 3 ways:

  1. Expose the port and create the Service with default values:
    labels:
        name: memcached-operator
    spec:
      containers:
       - name: memcached-operator
          command:
          - memcached-operator
          ports:
          - containerPort: 9090
  2. Configure the port or service selector if needed:
    labels:
       app: memcached-operator
    spec:
      containers:
       - name: memcached-operator
          command:
          - memcached-operator
        - --metrics-port=8080
        - --service-selector="app=memcached-operator"
          ports:
          - containerPort: 8080
  3. Disable the prometheus integration completely:
    spec:
      containers:
       - name: memcached-operator
          command:
          - memcached-operator
        - --disable-metrics=true

Creating the ServiceMonitor:
Requiring the operator to create the ServiceMonitor at runtime by default can be a problem in the event where the ServiceMonitor CRD has not been registered with the API server. In this case the operator could simply log the error and skip creating the ServiceMonitor and move on. Or it could keep retrying. In either case it’s debatable how the operator should create the ServiceMonitor.

So for a start I propose we only have the operator expose the port and create a Service at runtime.

@etiennecoutaud Let me know what you think. I think it's important to decide whether we want to generate manifests vs let the operator create the resources, before we move forward with #241

@chancez
Copy link
Contributor

chancez commented May 18, 2018

Regarding creating the ServiceMonitor, you the operator could probe available CRDs and if it exists, it can attempt to create it. This auto-creation could be default on, with the ability to turn it off using a flag (since RBAC will probably be a factor too).

@eparis
Copy link
Contributor

eparis commented May 19, 2018

Might I pretty strongly suggest we not add flags until a real user asks? And only then only if they have, what we agree to be, an extremely compelling use case. Even if we think someone 'will want this' just do the right thing, always, for them. Don't let them customize. Pick a random container port that has an extremely low chance of conflict (something like 41783) and just always turn it on. Don't let it be turned off. Don't let it be configured. Just make all operators the same.

I'm good with the operator creating it's own service definition at run time instead of build time, but if the service it just static, it could come later...

I'm big on no, config options. Just do the right thing for people. Our users should follow the best practice because it is the only path we give them.

@spahl
Copy link
Contributor

spahl commented May 21, 2018

You are right that limiting flags is a wiser approach.

What we could do easily is to not have: --disable-metrics=true and same for --metrics-port=8080 by choosing a port like you described.

I don't know how to get rid of the service-selector. Maybe we could come up with a naming convention for all operators @hasbro17?

@etiennecoutaud
Copy link
Contributor

I agree with @spahl and @eparis for limiting flags. Today we discuss about prometheus integration but tomorrow it will be something else and we will have ton of options with ton of possibilities.

We are generating code, we consider as normal that users add code, but is it normal that users remove some generating parts because they don't need it ?
In #241 approach prometheus server handler is added and a metrics port is exposed. If users don't want to use monitoring, they remove prometheus code and don't apply service monitor.

My opinion join @eparis, we generating operator in the state of the art way. We provide everything needed to run the operator in the best way. Free to users to remove parts and change the way we purpose. We cannot handle all the possibilities, we have to choose the most smart path and propose it

@eparis
Copy link
Contributor

eparis commented May 22, 2018

For serviceSelector how about generating a label on the pod which we know to be unique and use that?
watchedresource.operatorsdk.operatorframework = ${kind}.${api-version}
We should know the kind and api-version we plan to watch, and nothing else should be watching that...

@hasbro17
Copy link
Contributor

@eparis @etiennecoutaud Thanks for the responses. I agree that the number of flags could easily pile up if we add more and more configuration options so perhaps it's best to just pick one approach and stick with that.
So if we have consensus over not adding too much configuration then that means the service to expose the metrics is just static with a fixed port and label selector.

For serviceSelector how about generating a label on the pod which we know to be unique and use that?
watchedresource.operatorsdk.operatorframework = ${kind}.${api-version}

Right now we generate the label name=<operator-name> for the operator pods which should be unique enough to select them since you won't ever have two operators deployments in the same namespace. So we can use that as the label selector like in #241
https://github.com/operator-framework/operator-sdk/pull/241/files#diff-52e0a0ade3435bbd8d2db3ee74b4ede1R142

I'm good with the operator creating it's own service definition at run time instead of build time, but if the service it just static, it could come later

With the service being static should we still have the operator create it at runtime? The benefit of that is the service is always created by default now. If not then we simply go back to generating the manifests as in #241 and let the user create it as needed.

@hasbro17
Copy link
Contributor

@etiennecoutaud @eparis I'm hoping we can move forward based on the discussion so far.

If we're all agreed on not having configuration flags then for #241 can we at start with the following:

  • The operator deployment manifest is generated with an exposed static container port.
  • The operator creates the Service at runtime. The label selector can be fixed to name=<operator-name>. The operator name can be passed in as an environment variable to the pod just like the namespace.

So for e.g the generated deployment manifest would look like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: memcached-operator
  labels:
    name: memcached-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: memcached-operator
  template:
    metadata:
      labels:
        name: memcached-operator
    spec:
      containers:
        - name: memcached-operator
          image: quay.io/example/memcached-operator:v0.0.1
          ports:
          - containerPort: 9090
            name: metrics
          command:
          - memcached-operator
          imagePullPolicy: Always
          env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: OPERATOR_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name

And the Service created by the operator would in turn be:

apiVersion: v1
kind: Service
metadata:
  name: memcached-operator
  labels:
    name: memcached-operator
spec:
  selector:
    name: memcached-operator
  ports:
  - protocol: TCP
    targetPort: metrics
    port: 9090
    name: metrics

Let's leave out the ServiceMonitor from #241 for now.

@etiennecoutaud
Copy link
Contributor

Sounds good to me, I will update #241 according to this change.

@spahl spahl added kind/feature Categorizes issue or PR as related to a new feature. design labels Jun 5, 2018
@joelanford
Copy link
Member

@lilic just making you aware of this issue (if you aren't already). Can this be closed once your metrics work is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants