Skip to content
This repository has been archived by the owner on Feb 27, 2023. It is now read-only.

Additional metrics #143

Merged
merged 1 commit into from
Jun 18, 2018
Merged

Conversation

stevesloka
Copy link
Member

Add additional metrics to the Discoverers:

  • upstream-services
  • replicated-services
  • invalid-services
  • upstream-endpoints
  • replicated-endpoints
  • invalid-endpoints

// Fixes #108
Signed-off-by: Steve Sloka steves@heptio.com

@rosskukulinski rosskukulinski requested review from rosskukulinski and alexbrand and removed request for rosskukulinski June 6, 2018 16:20
@rosskukulinski
Copy link
Contributor

rosskukulinski commented Jun 6, 2018

In testing this I've found a few issues:

  • gimbal_discoverer_upstream_endpoints_total, gimbal_discoverer_replicated_endpoints_total, and invalid_endpoints should also be broken out by service with another label
  • On my cluster I'm seeing upstream_endpoints_total being less than replicated_endpoints_total which doesn't make sense

In addition:

  • Add a gimbal_discoverer_info metric which always has a value of 1 with the following labels: backendname, discoverer_version, discoverer_type
  • Please add a discoverer_type label to all metrics -- this would be openstack or kubernetes

@rosskukulinski
Copy link
Contributor

rosskukulinski commented Jun 6, 2018

Additional detail: I believe gimbal_discoverer_upstream_endpoints_total is showing currently displaying the number of Endpoint objects, not the sum of endpoints in the Endpoint objects.

@rosskukulinski
Copy link
Contributor

gimbal_discoverer_replicated_endpoints_total doesn't seem to handle endpoints that have been removed

Copy link
Contributor

@alexbrand alexbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review and added some comments. Have not tested manually yet. I wonder if there's unit tests we could build.

I am also wondering if it makes sense to keep some (or most) of these out of the sync loop, and instead start a separate goroutine that is responsible for publishing the majority of these metrics on a specific cadence (update metrics every X seconds).

It seems like they don't need to be tied to discovery events (in k8s case) or the sync loop (in openstack case).

Although now that I have written it out, I think changing to this approach could result in metrics lagging behind?

@@ -154,7 +154,22 @@ func (r *Reconciler) reconcile() {
r.reconcileSvcs(desiredSvcs, currentServices.Items)

desiredEndpoints := kubeEndpoints(r.BackendName, projectName, loadbalancers, pools)
for _, ep := range desiredEndpoints {
desiredEndpoints = append(desiredEndpoints, ep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicating the list?

if err != nil {
r.Logger.Error(err)
}
r.Metrics.DiscovererReplicatedEndpointsMetric(r.BackendName, projectName, r.sumEndpoints(endpoints.Items))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this competing with the metric being set in sync.Action? It's unclear to me why we need them in both places. Also seems like we are missing the equivalent for DiscovererReplicatedServicesMetric?


// Log upstream services prometheus
r.Metrics.DiscovererUpstreamServicesMetric(r.BackendName, projectName, totalUpstreamServices)
r.Metrics.DiscovererInvalidServicesMetric(r.BackendName, projectName, totalUpstreamServices-len(loadbalancers))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: took me some time to figure out how this subtraction results in the number of invalid services. Can we create a variable invalidServicesCount or something and move it up closer to totalUpstreamServices and loadbalancers?

endpoints, err := r.GimbalKubeClient.CoreV1().Endpoints(projectName).List(metav1.ListOptions{})
if err != nil {
r.Logger.Error(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an else here to avoid using endpoints when err == nil?

}

// // Log Total Services Metric
// totalUpstreamServices, err := getTotalServicesCount(kubeClient, action.ObjectMeta().GetNamespace(), backendName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@stevesloka
Copy link
Member Author

@alexbrand I 100% believe we could re-do how metrics are implemented, this was my comment last time we were on the phone. I have them tied to events since that's the trigger for when something changes.

Looking into the other comments/issues now.

@stevesloka
Copy link
Member Author

@rosskukulinski for the openstack upstream endpoints metrics. Should the servicename be the modified version we use in Gimbal or the actual value in Openstack? Right now I have it as the modified version as it matches the other metrics, but wanted to clarify.

@rosskukulinski
Copy link
Contributor

I think modified value @stevesloka

@rosskukulinski
Copy link
Contributor

@stevesloka I lied, the endpoint metrics should have a label which is the original service/LB name. The reason being it can be helpful to count the # endpoints available for a given Service across multiple backends.

@@ -71,9 +71,12 @@ func main() {
log.Info("Gimbal Kubernetes Discoverer Starting up...")

// Init prometheus metrics
discovererMetrics = localmetrics.NewMetrics()
discovererMetrics = localmetrics.NewMetrics("kubernetes", backendName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

func (c *Controller) writeServiceMetrics(svc *v1.Service) error {
upstreamServices, err := c.serviceLister.Services(svc.GetNamespace()).List(labels.Everything())
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log instead of return here? We are not handling it at the call site

DiscovererUpstreamServicesGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: DiscovererUpstreamServicesGauge,
Help: "Total number of services in the upstream backend cluster",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"upstream backend" sounds redundant to me. Thoughts?

}

// DiscovererInvalidEndpointsMetric records the total replicated endpoints
func (d *DiscovererMetrics) DiscovererInvalidEndpointsMetric(namespace, serviceName string, totalEp int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get rid of this one?

DiscovererInvalidEndpointsGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: DiscovererInvalidEndpointsGauge,
Help: "Total number of endpoints invalid endpoints that could not be replicaed from the backend cluster",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "endpoints invalid endpoints"
but not sure if we need this metric anyways.

@@ -210,3 +219,19 @@ func (r *Reconciler) reconcileEndpoints(desired, current []v1.Endpoints) {
r.syncqueue.Enqueue(sync.DeleteEndpointsAction(&ep))
}
}

func (r *Reconciler) sumEndpoints(eps v1.Endpoints) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be func instead of method

return total
}

func (r *Reconciler) getListFromMap(mp map[string]v1.Endpoints) []v1.Endpoints {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be func instead of method

metrics.EndpointsEventTimestampMetric(action.endpoints.GetNamespace(), action.endpoints.GetName(), time.Now().Unix())

// TODO: Move to lister()
totalEps, err := metrics.GetTotalEndpointsCount(kubeClient, action.endpoints.GetNamespace(), action.endpoints.GetName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following why we need to GET the endpoints object again, if we already have it in action.endpoints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, before I had the metric sum all endpoints for a namespace which was different.

}

// GetTotalServicesCount returns the number of services in a namespace for the particular backend
func (d *DiscovererMetrics) GetTotalServicesCount(kubeclient kubernetes.Interface, namespace string) (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These funcs seem somewhat out of place in that they are unrelated to the metrics package. Wondering if it makes sense to move these calls to the k8s API into the sync pkg, which is where this is called and it already has a dependency on the k8s packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a good home for this but couldn't find one. Also, I need access to backendName to filter on labels. If it's moved from metrics will just need to make those params public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaaah I see. I think another option might be to add a backendName field to endpointsAction and serviceAction, and do AddEndpointsAction(backendName, endpoints). The downside is that we have to plumb backendName again through a couple of layers. Not sure if it's worth it at this time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved stuff around, just picked the backend name out of metrics, PTAL

Copy link
Contributor

@alexbrand alexbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Is it worth it for me to test it out as well?

The only other thought that just came to me is that we are adding an extra API call for every service add/update/delete we do, right?

DiscovererInvalidEndpointsGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: DiscovererInvalidEndpointsGauge,
Help: "Total number of endpoints invalid endpoints that could not be replicaed from the backend",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: endpoints invalid endpoints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ...also fixed replicated

@stevesloka
Copy link
Member Author

Yes right now new api call for each service CRUD, let me see if I can pipe through the lister, save us the call.

@stevesloka
Copy link
Member Author

ok I've got the lister piped in, so no additional api requests are required for metrics.

@alexbrand
Copy link
Contributor

Nice! Should I test this out on my end or can we merge?

@rosskukulinski
Copy link
Contributor

something still isn't quite right, but now it's with the service metrics.

To test, I stopped both discoverers, deleted all services in the kuard namespace in the Gimbal cluster, then launched my newly-built discoverers from your last commit.

The following has been discovered:

NAME           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE       LABELS
blue-httpbin   ClusterIP   None         <none>        80/TCP     34s       app=httpbin,gimbal.heptio.com/backend=blue,gimbal.heptio.com/service=httpbin
blue-kuard     ClusterIP   None         <none>        8080/TCP   34s       gimbal.heptio.com/backend=blue,gimbal.heptio.com/service=kuard
green-kuard    ClusterIP   None         <none>        8080/TCP   34s       gimbal.heptio.com/backend=green,gimbal.heptio.com/service=kuard

yet here's the metrics gathered:


gimbal_discoverer_replicated_services_total{app="kubernetes-discoverer",backendname="blue",backendtype="kubernetes",cluster="k8s",instance="172.20.36.129:8080",job="kubernetes-pods",kubernetes_namespace="gimbal-discovery",kubernetes_pod_name="blue-kubernetes-discoverer-7b9999b95-rvfsc",namespace="kuard",pod_template_hash="365555651"} | 1
gimbal_discoverer_replicated_services_total{app="kubernetes-discoverer",backendname="blue",backendtype="kubernetes",cluster="k8s",instance="172.20.36.129:8080",job="kubernetes-pods",kubernetes_namespace="gimbal-discovery",kubernetes_pod_name="blue-kubernetes-discoverer-7b9999b95-rvfsc",namespace="nginx",pod_template_hash="365555651"} | 0
gimbal_discoverer_replicated_services_total{app="kubernetes-discoverer",backendname="green",backendtype="kubernetes",cluster="k8s",instance="172.20.41.62:8080",job="kubernetes-pods",kubernetes_namespace="gimbal-discovery",kubernetes_pod_name="green-kubernetes-discoverer-8496f87599-gz64x",namespace="kuard",pod_template_hash="4052943155"} | 0

@rosskukulinski
Copy link
Contributor

And gimbal_discoverer_*_endpoints_total still has the combined servicename which has the backendname in it vs the original upstream service name.

@stevesloka
Copy link
Member Author

I'm having trouble reproducing the service error where the counts are off. Could you port-forward to the discoverer on 8080 and double check the /metrics endpoint?

@rosskukulinski
Copy link
Contributor

gimbal_discoverer_replicated_services_total{backendname="green",backendtype="kubernetes",namespace="kuard"} 0
gimbal_discoverer_upstream_services_total{backendname="green",backendtype="kubernetes",namespace="kuard"} 1

logs:

e="2018-06-08T22:12:05Z" level=info msg="Started workers"
time="2018-06-08T22:12:05Z" level=info msg="Successfully handled: add endpoints 'kuard/green-kuard'"
time="2018-06-08T22:12:05Z" level=info msg="Successfully handled: add service 'kuard/green-kuard'"
time="2018-06-08T22:42:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-08T22:42:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"
time="2018-06-08T23:12:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-08T23:12:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"
time="2018-06-08T23:42:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-08T23:42:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"
time="2018-06-09T00:12:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-09T00:12:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"
time="2018-06-09T00:42:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-09T00:42:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"
time="2018-06-09T01:12:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-09T01:12:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"
time="2018-06-09T01:42:05Z" level=info msg="Successfully handled: update endpoints 'kuard/green-kuard'"
time="2018-06-09T01:42:05Z" level=info msg="Successfully handled: update service 'kuard/green-kuard'"

@rosskukulinski
Copy link
Contributor

@stevesloka it's possible my local container build is screwed up. If you push a build from your environment I can test that.

@stevesloka
Copy link
Member Author

Try this image: stevesloka/gimbal-discoverer:master

@stevesloka
Copy link
Member Author

stevesloka commented Jun 9, 2018

And gimbal_discoverer_*_endpoints_total still has the combined servicename which has the backendname in it vs the original upstream service name

I thought this was only for upstream services, not all. I can work to update. I thought once you sync'd then you wanted to see the gimbal service name, not the upstream name.

@stevesloka
Copy link
Member Author

stevesloka commented Jun 12, 2018

@alexbrand @rosskukulinski I think this is ready to review again. The only things that I think could be improved are writing more tests around the Prom metrics.

Additionally, there may be cases where we need to write out "0" for some error conditions. For example, should we always write a zero for gimbal_discoverer_invalid_services_total?

I have a prebuilt image here: stevesloka/gimbal-discoverer:masterif that helps.

Some notable changes:

  • Openstack discoverer uses a custom struct now (Endpoints) which allows it to store the v1.Endpoints along with the upstream` service name.
  • Sync package seperates out the process from metrics piece to make that a bit cleaner.
  • The syncd services metric uses a query to the api server to accomplish. I think this is reasonable as services shouldn't update as frequently as endpoints. I have some ideas on ways to stream line but it changes the structure quite a bit and I think we should discuss first.

@rosskukulinski
Copy link
Contributor

@stevesloka This is now working for me! I don't think we need to write out a zero for metrics that don't have a value.

@@ -46,7 +46,7 @@ var (

func init() {
flag.BoolVar(&printVersion, "version", false, "Show version and quit")
flag.IntVar(&numProcessThreads, "num-threads", 2, "Specify number of threads to use when processing queue items.")
flag.IntVar(&numProcessThreads, "num-threads", 1, "Specify number of threads to use when processing queue items.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for changing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, just local debugging, I'll restore

@@ -50,6 +50,7 @@ var (
prometheusListenPort int
discovererMetrics localmetrics.DiscovererMetrics
log *logrus.Logger
resyncInterval time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like this is used


gathering, err := gatherers.Gather()
if err != nil {
fmt.Println(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Fatal here?


gathering, err := gatherers.Gather()
if err != nil {
fmt.Println(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Fatal here?

// Convert the k8s list to type []Endpoints so make comparison easier
currentEndpoints := []Endpoints{}
for _, v := range currentk8sEndpoints.Items {
currentEndpoints = append(currentEndpoints, Endpoints{endpoints: v, upstreamName: ""})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set the upstreamName here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the current endpoints which come from the k8s query. The upstream name is not used, but having two lists of the same type makes the comparison logic work as-is.


for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
nowFunc = func() time.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/where is this used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Timestamp metric used time.Now() to get the current time. In order to unit test that metric, we need a way to set it in the test. If it's not set in the test, then it defaults to now(). Both the service & endpoint in package sync use this.

expectErr bool
expectedCount float64
expectedTimestamp float64
expectedLabels map[string]string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like there are assertions on the labels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't write tests for those. They are simple pass-through values. But initially, I had though to implement those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Should we remove them from this struct?

err := action.Sync(sq.KubeClient, sq.Metrics, sq.BackendName)
err := action.Sync(sq.KubeClient, sq.Logger)
if err != nil {
sq.Metrics.ServiceMetricError(action.ObjectMeta().GetNamespace(), action.ObjectMeta().GetName(), action.GetActionType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we get an error while handling an Endpoints resource?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, meant to refactor that, good catch.

return nil
}

func (action endpointsAction) String() string {
return fmt.Sprintf(`%s endpoints '%s/%s'`, action.kind, action.endpoints.Namespace, action.endpoints.Name)
}

func addEndpoints(kubeClient kubernetes.Interface, endpoints *v1.Endpoints, lm localmetrics.DiscovererMetrics, backendName string) error {
func (action endpointsAction) LogMetrics(gimbalKubeClient kubernetes.Interface, metrics localmetrics.DiscovererMetrics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we call this SetMetrics or UpdateMetrics? Log makes me think of log messages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about WriteMetrics? Set & Update sound like they are values that are getting set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh we can do SetMetrics I think


var nowFunc nowFuncT

func init() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some docs on why we need this and what it is doing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just inline comments? We don't have any dev docs at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry! That is what I meant :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned it up, removed unused methods, added the missing header, and added some comments.

// a way to override the default values.
type nowFuncT func() time.Time

var nowFunc nowFuncT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will result in a nil-pointer deref outside of tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my bad, I got overzealous, I restored the old impl.

@@ -114,6 +114,15 @@ func serviceName(lb loadbalancers.LoadBalancer) string {
return strings.ToLower(lbName)
}

// get the lb Name or ID if name is empty
func serviceNameOriginal(lb loadbalancers.LoadBalancer) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be a problem if there are two load balancers with the same name? Seems like it'll affect the metrics, but not 100% sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will. @rosskukulinski do you want to allow duplicates here? It only should affect openstack at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to use the name + ID, but not sure if that is what we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm really not sure. I don't know how often folks name their load balancers the same thing in the same cluster. Let's leave it as-is for now (at risk of duplicates).

If the LB doesn't have a name, do we put in the id for the label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes name is tried first, then id is used.

@stevesloka
Copy link
Member Author

Testing tonight, I lost the golang metrics, looking to add those back in.

@stevesloka
Copy link
Member Author

Ok latest commit adds those metrics back in.

…ervices, upstream-endpoints, replicated-endpoints, invalid-endpoints)

Signed-off-by: Steve Sloka <steves@heptio.com>
Copy link
Contributor

@alexbrand alexbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple of comments that might require some follow up work, but I still want to merge this to start testing. I will open up issues for us to follow up.

@@ -71,6 +72,7 @@ func init() {
flag.IntVar(&prometheusListenPort, "prometheus-listen-address", 8080, "The address to listen on for Prometheus HTTP requests")
flag.Float64Var(&gimbalKubeClientQPS, "gimbal-client-qps", 5, "The maximum queries per second (QPS) that can be performed on the Gimbal Kubernetes API server")
flag.IntVar(&gimbalKubeClientBurst, "gimbal-client-burst", 10, "The maximum number of queries that can be performed on the Gimbal Kubernetes API server during a burst")
flag.DurationVar(&resyncInterval, "resync-interval", time.Minute*30, "Default resync period for watcher to refresh")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this isn't used. I want to merge this PR to start testing, so I will open a new issue to discuss this flag. It seems like we can remove it, but wanted to confirm with Steve.

if err != nil {
action.SetMetricError(sq.Metrics)
}
action.SetMetrics(sq.KubeClient, sq.Metrics, sq.Logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering how these metrics will behave if we need to retry. Is it OK to set these metrics multiple times for the same service because of retries? Will open a new issue to discuss/look into this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants