Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Add support for summarize #837

Merged
merged 8 commits into from
Mar 8, 2018
Merged

Conversation

Aergonus
Copy link
Contributor

@Aergonus Aergonus commented Jan 29, 2018

There's a question to be answered before this is ready to merge.
Should the input series have a matching QueryPatt and Target?
Is QueryFrom and QueryTo equivalent to series.start and series.end in the python code?

Also if #833 is merged, there'll be a conflict with the docs/graphite and I'll rebase to squash commits


output := models.Series{
Target: newName(serie.Target),
QueryPatt: newName(serie.QueryPatt), // Should this match target?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is right. Asking for grafana people expertise

newEnd = alignedEnd
}

output := models.Series{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QA tests fail because I don't use newEnd. The python code sets series.start and series.end to the newStart and newEnd, I'm not sure if it's equivalent to QueryFrom and QueryTo and if we're supposed to modify them.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 30, 2018

Should the input series have a matching QueryPatt and Target?

while they represent different things, they may often end up being the same (but not always)
see https://godoc.org/github.com/grafana/metrictank/api/models#Series
maybe @DanCech and @shanson7 also have an opinion on this. this stuff is formed by early graphite-reverse-engineering with some of my sauce sprinkled on. possibly those guys have better / more recent ideas on this.

Is QueryFrom and QueryTo equivalent to series.start and series.end in the python code?

query from are literally the from and to derived from the query. (or they're defaults of now and now-24h if not specified) I'm not sure what the semantics are of those python properties.

Also if #833 is merged, there'll be a conflict with the docs/graphite and I'll rebase to squash commits

merged ! :)

@Aergonus Aergonus force-pushed the feature/summarize branch 5 times, most recently from 833baf1 to 3b00804 Compare January 30, 2018 22:59

numPoints := int(util.Min(uint32(len(serie.Datapoints)), (start-end)/interval))

for ts, i := start, 0; i < numPoints && ts < end; ts += interval {
Copy link
Contributor Author

@Aergonus Aergonus Jan 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be thorough, this is different from the python code. To mirror the python code you would remove i < numPoints from this line.
The only change is that graphite-mt would add an extra NaN value at the end of the results. Adding this would not remove an extra datapoint (only the extra NaN).

@Aergonus
Copy link
Contributor Author

From what I've seen Datapoints.[0].Ts is equivalent to series.start so we wouldn't need it.

Also since fellow special function perSecond modifies QueryPatt, I'm inclined to think we should be modifying it here.

@Dieterbe
Copy link
Contributor

yes, processing functions should update QueryPatt similar to PerSecond or others. see also
https://github.com/grafana/metrictank/blob/master/expr/NOTES#L94

@shanson7
Copy link
Collaborator

shanson7 commented Jan 31, 2018

No doubt it should be updated, but should they match? Should Target be the resolved series name and QueryPatt be the expression?

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 31, 2018

TBH i'm hazy on the details / history but AFAIK they should not necessarily match (though sometimes they might) as explained in the struct documentation I linked to (if you want more info you'll have to dive into the code and see how the attributes are used).

can't we just follow the persecond example, where we just wrap around the pre-existing attributes
eg something like:

Target:     fmt.Sprintf("summarize(%s, extra params)", serie.Target),
QueryPatt:  fmt.Sprintf("summarize(%s, extra params)", serie.QueryPatt),

@Aergonus
Copy link
Contributor Author

It is following the persecond example, just that the rename is dynamic based on a parameter. No changes needed so far :)

@Dieterbe
Copy link
Contributor

ok your approach in setting QueryPatt and Target looks good to me. will get back to you later with a more complete review

func (s *FuncSummarize) Signature() ([]Arg, []Arg) {
return []Arg{
ArgSeriesList{val: &s.in},
ArgString{key: "interval", val: &s.intervalString, validator: []Validator{IsIntervalString}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the key attribute is only for arguments that should be made available as a keyword argument.
per http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.summarize the interval parameter has no key

@shanson7
Copy link
Collaborator

shanson7 commented Feb 2, 2018

It doesn't look like this will support all of the documented functions that the graphite version does (e.g. median)

@Dieterbe
Copy link
Contributor

Dieterbe commented Feb 2, 2018

yep. either we implement those, or we remove the stable flag.
stable flag is only for functions that we know to be 100% compatible with graphite

output := models.Series{
Target: newName(serie.Target),
QueryPatt: newName(serie.QueryPatt),
Tags: serie.Tags,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the python code also does:

    series.tags['summarize'] = intervalString
    series.tags['summarizeFunction'] = func

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does...which is odd IMO. Tags are something set at ingest and it seems odd to add more tags at query time (non-optionally). I get that this isn't the place to make these sorts of arguments, but it doesn't sit right with me to mess with the tags (especially in the case of name collision).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make the info about how the series was processed available to functions further down the chain, and the rationale for modifying the tags is that when you run series x through a function f you are creating a new series f(x), and its tags should identify it and describe how it's different from x.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that, but it means a lot of "reserved" tag names that need to be kept track of and new ones for each function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Added in

func summarizeValues(serie models.Series, aggFunc batch.AggFunc, interval, start, end uint32) []schema.Point {
out := pointSlicePool.Get().([]schema.Point)

numPoints := int(util.Min(uint32(len(serie.Datapoints)), (start-end)/interval))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm confused here. the only reason to get the util.Min of these two, is just in case the input series had a low number of points wrt to the requested interval right?
e.g.
1 day worth of 2-hourly data
requesting summarize of 1h.
so this becomes numPoints = min(12,24) = 12
then why in the loop below do we increment by interval (1h), but only 12 times? that would cover only a 12h timerange?

Copy link
Contributor Author

@Aergonus Aergonus Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If (serie.Datapoints[i].Ts < ts+interval) is violated, it doesn't increment i. So it would append NaN's for the inbetween. I'll push a test case that shows this.

Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

Simple check so we don't consolidate (batch/aggFunc) no points
@Aergonus
Copy link
Contributor Author

Aergonus commented Feb 8, 2018

Tags added as requested, other comment explained.
Agg funcs are in PR #847 which would allow summarize to be merged as stable

@shanson7
Copy link
Collaborator

Not sure how important this is, but the python implementation allows 'true'and true for the bool argument. Should the ArgBool type support this?

@shanson7
Copy link
Collaborator

Not sure how important this is, but the python implementation allows 'true'and true for the bool argument. Should the ArgBool type support this?

Turns out this is very important since grafana sometimes uses strings for bools.

@Dieterbe
Copy link
Contributor

Dieterbe commented Feb 19, 2018

Turns out this is very important since grafana sometimes uses strings for bools.

any more details on "sometimes" ? i recently ran into the same problem with sortByName:
bec3e7c
there i observed grafana adds it without quotes

the graphite docs don't seem to mention quotes for the bool
http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.summarize
or in http://graphite.readthedocs.io/en/latest/functions.html#usage

that said, i'd be happy to hear from @DanCech whether we should follow graphite's documentation/spec or what it actually allows...

@shanson7
Copy link
Collaborator

any more details on "sometimes" ?

Yeah, in grafana if I build up the expression it doesn't use quotes. If I go and change one of the tag values afterwards, it adds the quotes in (not visibly to the user though).

@DanCech
Copy link
Contributor

DanCech commented Feb 26, 2018

At least for the moment Graphite doesn't actually enforce any type-checking on parameters, though with the addition of the inline documentation that would now be possible. Because of that what ends up happening is that 'true' is parsed as a string and passed to the function, and when evaluated in the boolean context it is indeed true. 'false' wouldn't work since it would also evaluate to true. The problem with changing it in Graphite is the potential for breaking existing dashboards etc, especially if there are scenarios where Grafana will pass 'true' rather than true. I'd say that it's something we might look at fixing (by checking parameters passed in again the function's definition) but we would likely retain support for 'true' and "true" in any case to avoid breaking existing dashboards.

@Dieterbe
Copy link
Contributor

so @DanCech conclusion: MT should support true, 'true', false and 'false' ( and capitalized versions)

@shanson7
Copy link
Collaborator

Sounds more like non-empty strings are true, empty is false. Probably similar with 0/non-zero?

@DanCech
Copy link
Contributor

DanCech commented Feb 26, 2018

@shanson7 yeah, it appears that's how it would work today, following the Python rules for truthiness https://docs.python.org/2/library/stdtypes.html#truth-value-testing

@Dieterbe Dieterbe merged commit 5240c14 into grafana:master Mar 8, 2018
@shanson7
Copy link
Collaborator

shanson7 commented Mar 8, 2018

We didn't hash out the ArgBool truthiness. I think this change might cause some issues without it.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 8, 2018

let's continue the convo in #867

@Aergonus Aergonus deleted the feature/summarize branch March 8, 2018 17:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants