Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudwatch/metrics API call errors because of duplicate label #528

Closed
feliksik opened this issue May 15, 2017 · 9 comments
Closed

Cloudwatch/metrics API call errors because of duplicate label #528

feliksik opened this issue May 15, 2017 · 9 comments

Comments

@feliksik
Copy link
Contributor

I'm trying to create a cloudwatch counter using the kit/metrics/cloudwatch package (looks nice). I am using this addsvc line as example, but doing a call on a counter/counter with .With("succes", "true"). However I receive a cloudwatch error: InvalidParameterValue: No metric may specify the same dimension name twice. It seems to be because of this line , which just appends the label, using inernal.lv.LabelValues:With().

This looks like a bug to me, related to the fact that the new label is just appended, without deduplicating.

Is this cloudwatch lib used at all?

@feliksik
Copy link
Contributor Author

@cam-stitt I'd like to hear your take on this; there is some other things I have questions about as well, like the fact that the values are not reset when sending things to Cloudwatch. Other implementations seem to be using a pattern that does .

@cam-stitt
Copy link
Contributor

Yeah, looks like a bug to me. Might have to modify this to replace existing k/v's.

I have not yet used this in production. We aren't ready to update to non-stable go-kit releases.

I based my work off the graphite work, see: https://github.com/go-kit/kit/blob/master/metrics/graphite/graphite.go#L116

@cam-stitt
Copy link
Contributor

Apart from the de-dup, does it seem to be working?

@feliksik
Copy link
Contributor Author

Well it uploads results, but I'm not sure if it's doing what it's supposed to.

The other implementations (e.g. influx) do reset on all metrics, and use a tree walking pattern that seems really powerful but is a bit more complicated. Graphite doesn't use it, and I think it doesn't have to, because With() is a no-op there (i.e. doesn't support labels, I guess).

I guess if we want to support labels properly, it might be necessary to use the lv.Space with the nodes and the Reset(), but those implementations gather all the observed values and only 'flatten' them with the clever Histogram tool in the function passed to Walk(f). So you'd sample all values in-memory. For a 1-minute interval, with 200 rps, this would mean we are sampling 200*60=12000 values, which is about 100KB. I think that's probably fair (if they are reset, of course).

Cloudwatch specific issues (I have only very little experience with Amazon Metrics):

  • It seems like they do not have the types (like counter,derive, gauge) like e.g. collectd does, and that you have to manage this yourself.
  • how can we deal with counters/histograms that are influenced by 3 different replica's? Should we use a collectd-like think in-between, making this code unsuitable for that use case?
  • for a go-kit counter, do we want to send continuously increasing values, or periodic deltas? I think we should use the ValueReset() on the generic.Counter when uploading results (like the graphite implementation does). Or is there an amazon feature to administer an ever-increasing count, but should the derivative in a graph?
  • for a go-kit Histogram, I think stuff should be reset as well, if you want to differentiate the percentiles over time. The Graphite plugin doesn't seem to do this either, which I think is a bug.

@feliksik
Copy link
Contributor Author

feliksik commented May 16, 2017

I'm looking a bit more into this, and have a few observations:

  • The Counter, Gauge and Histogram that use an observeFunc for the lv.Space are duplicated in different places
  • For the Histogram tool, the influx.Histogram uses the Observe() method, which forwards to a vs.Space counter that accumulates values in an array; then, the function passed to Walk() receives that array, and uses generic.Histogram.Observe() to observe them.

My impression is that the lv.Space is generalizing the Counter, Gauge and Histogram tools while providing a flexible means of using Labels; but there is a price paid here in terms of efficiency and code complexity. I think it should be possible to achieve both benefits, but this would definitely change the design -- and I don't yet know how I'd do it, to be honest.

@peterbourgon implemented the lv.Space thing, I believe. What's your take on this?

I'd prefer contributing to a generalized solution, if possible. But I consider hacking up my own non-generic thing here...

@peterbourgon
Copy link
Member

I am definitely convinced that the current implementation is suboptimal. In particular I am convinced of the need for a so-called Provider or Factory interface/type to allow metrics to be created more dynamically (#378, #513). Also while I believe the lv.Space type and metrics/generic package are necessary, they are both flawed: lv.Space for the reasons you identify and others besides; and generic because of the poor With handling (#525) and the underlying VividCortex package needs to be replaced (potentially a float-based reimplementation of CodaHale/metrics). Finally, in the current implementation I've just given up on making With work with systems that don't natively support it, like statsd; that was a mistake, and we should definitely devise some canonical way to shoehorn it in, e.g. sticking the labels in the text string somehow.

If you're serious I'd be happy to chat (Slack?) about what this could look like. Be warned, it is probably more work than it seems :)

@feliksik
Copy link
Contributor Author

I'll contact you on Slack :-) just see what we can do.

@thedevelopnik
Copy link

@peterbourgon @feliksik

I'd be happy to help with this as well since we're entirely in AWS and CloudWatch would be ideal for us for metrics.

@peterbourgon
Copy link
Member

@dsudia There's an active PR at #540, feel free to chime in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants