Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract metrics to own package and refactor implementations #1968

Merged
merged 9 commits into from
Aug 23, 2017

Conversation

m3co-code
Copy link
Contributor

@m3co-code m3co-code commented Aug 17, 2017

This PR should be the first step to an extension of the metrics implementations. In our Traefik fork that we are running in my project we have extended the Prometheus implementation considerably, but to get there some refactoring and alignment has to be done first. I created once already a PR for the extension of the Prometheus implementation, but it was diverging after Datadog and StatsD implementations were added at the same time to Traefik.

Therefore this PR is only about refactoring of the structure and extracting the Metrics implementations into their own package. Once the structure is aligned I will create follow-up metrics extending the functionality. I think it is a better approach this way, as it should make reviewing considerably easier.

Motivation

This PR is changing the design and I want to elaborate a bit in what it changes it and why:

  • extraction into a new metrics package: This is important in order to be able to track metrics outside of the middleware package. Meaningful metrics could be for example configuration reloads. Those happen in the server package and so an implementation inside of middleware can not be used (cyclic dependency). Another positive point about the extraction is the better separation of concerns in regards and a more clear project structure.
  • initialisation of metrics implementation: They all happen now exactly once in NewServer when Traefik boots up, which is the desired behaviour. Through the alignment of the initialisation the logic is now more straightforward and easier to understand.

@m3co-code
Copy link
Contributor Author

@aantono I think this is also interesting for you, I would be glad if you can have a look too :)

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move the code to middleware/metrics/?

package metrics

import (
"net/http"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you reorganize the imports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the notice, done.

@ldez ldez added this to the 1.5 milestone Aug 17, 2017
@ldez ldez added the size/L label Aug 17, 2017
@aantono
Copy link
Contributor

aantono commented Aug 17, 2017

@marco-jantke I think the name change for the label from serviceName to service should not be a big deal... but out of curiosity, why the change?

P.S. Just looked at the code and didn't actually find the place where the change took place. Inside the original datadog.go the NewDataDog(name string) method uses service as the label name, not the serviceName. Am I missing something?

@m3co-code
Copy link
Contributor Author

@aantono my fault I mixed it up. It was service all the time. Sorry for the confusion :/

@ldez ldez modified the milestones: 1.4, 1.5 Aug 19, 2017
// NewVoidRegistry is a noop implementation of metrics.Registry.
// It is used to avoid nil checking in components that do metric collections.
func NewVoidRegistry() Registry {
return &voidRegistry{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove voidRegistry struct and replace this struct by multiRegistry struct.

return &multiRegistry{
	reqDurationHistogram: &voidHistogram{},
	reqsCounter:          &voidCounter{},
	retriesCounter:       &voidCounter{},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am using the standardRegistry now.

}
}

type voidRegistry struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove voidRegistry struct because it's the same struct (and methods) as multiRegistry struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am using the standardRegistry now.

cRetriesCounter := collectingRegistry.RetriesCounter().(*collectingCounter)

if cReqsCounter.counterValue != 1 {
t.Errorf("Got value %v for ReqsCounter, want %v", cReqsCounter.counterValue, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace %v by %f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I also extracted the 1 to a value with proper float64 type, so that I can use %f both times.

t.Errorf("Got value %v for ReqsCounter, want %v", cReqsCounter.counterValue, 1)
}
if cReqDurationHistogram.lastHistogramValue != 2 {
t.Errorf("Got last observation %v for ReqDurationHistogram, want %v", cReqDurationHistogram.lastHistogramValue, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace %v by %f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

t.Errorf("Got last observation %v for ReqDurationHistogram, want %v", cReqDurationHistogram.lastHistogramValue, 2)
}
if cRetriesCounter.counterValue != 3 {
t.Errorf("Got value %v for RetriesCounter, want %v", cRetriesCounter.counterValue, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace %v by %f?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

return r.reqsCounter
}

// ReqsCounter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove or add the good comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it by accident. Removed the comment. I think they are not necessary on "Getters" and the method's struct is not exported.

for _, label := range family.Metric[0].Label {
val, ok := test.labels[*label.Name]
if !ok {
t.Errorf("'%s' metric contains unexpected label '%s'", test.name, label)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace '%s' by %q?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Adapted the error message.

if !ok {
t.Errorf("'%s' metric contains unexpected label '%s'", test.name, label)
} else if val != *label.Value {
t.Errorf("label '%s' in metric '%s' has wrong value '%s'", label, test.name, *label.Value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace '%s' by %q?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Adapted and improved the error message.

for _, test := range tests {
family := findMetricFamily(test.name, metricsFamilies)
if family == nil {
t.Errorf("gathered metrics do not contain '%s'", test.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace '%s' by %q?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

cv := uint(family.Metric[0].Counter.GetValue())
expectedCv := uint(1)
if cv != expectedCv {
t.Errorf("gathered metrics do not contain correct value for total retries, got '%d' expected '%d'", cv, expectedCv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace '%d' by %q?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't sound right to me. When I change this it gets from:
gathered metrics do not contain correct value for total requests, got 2 expected 23
to
gathered metrics do not contain correct value for total requests, got '\x02' expected '\x17'.

Maybe though I should rather use floats for comparison and print messages with %f. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@m3co-code
Copy link
Contributor Author

Thanks for the review @ldez. I rebased the branch to latest master version and addressed all your comments.

@ldez
Copy link
Contributor

ldez commented Aug 21, 2017

@marco-jantke

These files are not properly gofmt'd:

  • server/server.go

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marco-jantke Great job, it really simplifies the metrics :)
I have one comment though.

server/server.go Outdated
@@ -53,6 +54,8 @@ type Server struct {
routinesPool *safe.Pool
leadership *cluster.Leadership
defaultForwardingRoundTripper http.RoundTripper
metricsRegistry metrics.Registry
metricsEnabled bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fan of adding this flag. Can't we remove it and use metricsRegistry instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metricsRegistry now always contains an implementation. When nothing is configured it will hold a VoidRegistry. This means we could write a util function like metricsEnabled() and it will return the ok of a type assertion on the metrics implementation. You would prefer this way?

Personally I am not sure about what is the more explicit option, but both is not hard to implement so you are free to choose :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked the implementation and my described approach above won't work. All different registries that we create through e.g. NewVoidRegistry, RegisterStatsd, RegisterPrometheus.. will have the internal type standardRegistry and therefore I can't use a type assertion.

This means a metricsEnabled() utility would have to check on the global configuration once more. E.g. if globalConfiguration.Web.Metrics.Prometheus != nil || globalConfiguration.Web.Metrics.Statsd != nil .... Personally I think that having this conditions only in registerMetricClients is the more clean approach and tend to leave it like it is. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Slack refactored now towards an IsEnabled() method on the Registry.

}

type standardRegistry struct {
isEnabled bool
Copy link
Contributor

@ldez ldez Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rename isEnabled to enabled?
But keep the method name IsEnabled()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry enabled not enable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, done.

Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @marco-jantke !
LGTM

in order to avoid a name clash for the soon to come metrics package.
This commit is just refactoring and should provide the same
functionality as before. It extracts the metrics implementation into its
own package and makes metrics therefore extendable to be used outside of
the middleware package in the future.

Another design goal is to simplify and align the initialisation of the Metrics
implementations. Each Metric provider should only be initialised once
and reused when configuration reloads happen. This commit accomplishes
this.
in order to have less state on the server struct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants