Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Add asPercent function #966

Merged
merged 33 commits into from
Aug 14, 2018
Merged

Add asPercent function #966

merged 33 commits into from
Aug 14, 2018

Conversation

stivenbb
Copy link
Contributor

Native implementation of asPercent() Graphite function. (http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.asPercent)

Added a new argument type ArgIn that allows multiple other argument types. This was necessary for the total argument. Some of the code borrowed from an abandoned PR: #672

In terms of speed improvement:

---------- Native Implementation ----------
Requests      [total, rate]            900, 5.01
Duration      [total, attack, wait]    3m0.13756s, 2m59.799999s, 337.561ms
Latencies     [mean, 50, 95, 99, max]  72.006704ms, 38.065ms, 342.887ms, 472.467ms, 765.657ms
Bytes In      [total, mean]            130300948, 144778.83
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:900
Error Set:
---------- Graphite (Python) Implementation ----------
Requests      [total, rate]            900, 5.01
Duration      [total, attack, wait]    3m6.337648s, 2m59.799999s, 6.537649s
Latencies     [mean, 50, 95, 99, max]  797.282367ms, 167.489ms, 4.789024s, 6.756318s, 8.006429s
Bytes In      [total, mean]            144407224, 160452.47
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:900
Error Set:

On average, the native implementation was 11x faster, median was 4x faster, p95 was 14x faster, p99 was 14x faster and max was over 10x faster

@stivenbb stivenbb changed the title Add isPercent function Add asPercent function Jul 26, 2018
@stivenbb stivenbb mentioned this pull request Jul 26, 2018
Copy link
Collaborator

@shanson7 shanson7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add to the tests that this function doesn't modify the inputs?

if len(totals) == 1 {
totalsSerie = totals[0]
} else if len(totals) == len(series) {
sort.Slice(series, func(i, j int) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment about why there are sorted.

return nil, err
}

var outSeries []models.Series
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do a

defer cache[Req{}] = append(cache[Req{}], outSeries...)

here instead of all the individual cache steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, I like it

Copy link
Contributor

@Dieterbe Dieterbe Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer args are evaluated at call time, so this won't work as expected.

edit: looks like i'm wrong: https://play.golang.org/p/t8RZ1v8fLXT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to call foo() in your snippet.
see https://play.golang.org/p/Qh4rp1p2q73

serie1.Target = fmt.Sprintf("asPercent(%s,%s)", serie1.Target, serie2.Target)
serie1.Tags = map[string]string{"name": serie1.Target}
for i := range serie1.Datapoints {
serie1.Datapoints[i].Val = computeAsPercent(serie1.Datapoints[i].Val, serie2.Datapoints[i].Val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serie1.Target = fmt.Sprintf("asPercent(%s,%s)", serie1.Target, serie2.Target)
serie1.Tags = map[string]string{"name": serie1.Target}
for i := range serie1.Datapoints {
serie1.Datapoints[i].Val = computeAsPercent(serie1.Datapoints[i].Val, serie2.Datapoints[i].Val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
totalVal = s.totalFloat
}
serie.Datapoints[i].Val = computeAsPercent(serie.Datapoints[i].Val, totalVal)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed all and added a test

// Test if original series was modified
for i, orig := range originalSeries {
inSerie := in[i]
if orig.Target != inSerie.Target {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use something like

if !reflect.DeepEqual(err, c.expErr) {
here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the original series does get modified by metrictank (a "name" tag gets added). Makes sense to compare relevant values only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that. the real problem was that reflect.DeepEqual(math.NaN(), math.NaN()) == false, which is not the case when comparing series

originalSeries[i].Interval = serie.Interval
originalSeries[i].QueryPatt = serie.QueryPatt
originalSeries[i].Target = serie.Target
originalSeries[i].Datapoints = getCopy(serie.Datapoints)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just

originalSeries[i] = serie
originalSeries[i].Datapoints = getCopy(serie.Datapoints)

Gets all the things like tags etc.

@@ -130,6 +130,8 @@ func (s *FuncAsPercent) Exec(cache map[Req][]models.Series) ([]models.Series, er
if len(totals) == 1 {
totalsSerie = totals[0]
} else if len(totals) == len(series) {
// Sorted to match the input series with the total series based on Target.
// Mimicks Graphite's implementation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but 'Mimic' the common modern spelling


func copyDatapoints(serie *models.Series) {
out := pointSlicePool.Get().([]schema.Point)
for _, p := range serie.Datapoints {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out = append(out, serie.Datapoints...)

@stivenbb
Copy link
Contributor Author

Note: this requires Go version 1.10+ because it uses math.Round()

@shanson7 any outstanding comments?

@stivenbb
Copy link
Contributor Author

stivenbb commented Aug 7, 2018

Any chance this can get merged to master?

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 7, 2018

do you plan to make more changes or dyou consider this complete?
also, it's kindof hard to see which code came from my old PR, is it possible to clarify the situation by reusing some of those commits. ideally perhaps, this PR could resume where the other branch left off.
but if all of this is too much work, then never mind.

@stivenbb
Copy link
Contributor Author

stivenbb commented Aug 7, 2018

This is good to go. I just rebased it to make the merge cleaner.

And I didn't use much, just added an additional argument type (that you originally made). It would be a lot of work to reuse the old commits with little to no payoff imo, since I rewrote most of it.

for _, argExp = range argsExp {
if pos >= len(e.args) {
break // no more args specified. we're done.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the removal of cutoff here is this safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Before it assumed that series arguments can only be non-optional arguments and in the beginning. i.e function(serie, int, string, opt=string).

With my change the following is possible:
function(serie, int, serie, opt=serie)

Like before, this part only consumes series, and just skips over everything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good :) the argument consumption/iterating code is probably my least favorite code of MT. FWIW.
if you can handle this, you can handle anything else 👯‍♂️

totalSeriesLists := groupSeriesByKey(totals, s.nodes, &keys)
totalSeries = getTotalSeries(totalSeriesLists)
} else {
return nil, errors.New("total must be None or a seriesList")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like it's fairly easy to trigger this by specifying a serieslist pattern that doesn't match any series.
maybe in that case we can provide a better error message? (basically above at if len(totals) == 0 { maybe directly return an error ? what does graphite do in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. This error message gets triggered if they pass in a number. In the case where they pass in a nodes argument, only a series or nothing should be passed in for the total argument. First branch is triggered if neither is passed in. Second branch is triggered if a series is passed in. Which leaves the case where a number is passed in (i.e math.IsNaN(s.totalFloat) = false)

Graphite throws that exact error message is that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you specify a serieslist pattern that doesn't match any series, then I believe this will happen:

        if s.totalSeries != nil {
                totals, err = s.totalSeries.Exec(cache)
                if err != nil {
                        return nil, err
                }
                if len(totals) == 0 {
                        totals = nil <---- this right here
                }
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Graphite crashes if you do that.

Let me see if I can handle it in some clean way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this was just unnecessary:

if len(totals) == 0 {
            totals = nil
}

So I removed it. Now if a series returns empty, it does not assume that there were no arguments

return context
}

func (s *FuncAsPercent) Exec(cache map[Req][]models.Series) ([]models.Series, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is rather complex. can we split it up in smaller functions?

return outSeries, err
}

func (s *FuncAsPercent) execWithNodes(series []models.Series, totals []models.Series) ([]models.Series, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

series, totals []models.Series. also the other function

}
}

func deepCopySerieElements(serie *models.Series) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this API is a bit strange. would it not make more sense to return a new series that is a deep copy?
even though that is slightly more work, seems worth it.

in fact, maybe add a Copy() method to the serie type in the models package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also looking at sumSeries, the copying/newly-allocating seems a bit too eager.
we should only need to allocate point slices:

  1. for the totals slice: when we need to sum up values and we need a place to store the totals (not when totals results in a single series, then we can just read from it)
  2. for each output series.(if it is different from the input slice)

Copy link
Contributor Author

@stivenbb stivenbb Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with # 1 is that later I modify the total series in some cases such as this (line 99):

// No series for total series
if _, ok := metaSeries[key]; !ok {
	serie2 := totalSeries[key]
	serie2.QueryPatt = fmt.Sprintf("asPercent(MISSING,%s)", serie2.QueryPatt)
	serie2.Target = fmt.Sprintf("asPercent(MISSING,%s)", serie2.Target)
	serie2.Tags = map[string]string{"name": serie2.Target}
	for i := range serie2.Datapoints {
		serie2.Datapoints[i].Val = math.NaN()
	}
	outSeries = append(outSeries, serie2)
	continue
}

I guess I could make a copy right there, not sure which one would be better.

Copy link
Contributor

@Dieterbe Dieterbe Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be mostly afk until Thursday.
Please read expr/NOTES carefully if you have not already done so. All this complexity is to make sure we never overwrite data in the memory AggMetric, chunk cache etc.
Some of these fields are by value so harmless but the tags map is interesting as well as it might open an avenue to modify the tags in the MemoryIdx. Not sure if we currently take that into account everywhere or maybe we already have a provision for that. On phone so can't check right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, well I added a Series.Copy(emptyDatapoints []schema.Point) function (the argument is there so that pointSlicePool can be used if needed). And I only copy the series if I modify values that shouldn't be modified.

Hopefully that's sufficient and what you were looking for 😃

@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 7, 2018

some unit tests for ArgIn would be nice.
with some luck you can just cherry pick 47ff80a

{Val: float64(199) * 100, Ts: 30},
{Val: float64(29) / 2 * 100, Ts: 40},
{Val: float64(80) / 3.0 * 100, Ts: 50},
{Val: float64(250) / 4 * 100, Ts: 60},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't out2 be the same as in the previous function? because both cases have asPercent(d,a)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because when totals is a seriesList with len(totals) == len(series), the series first get sorted by tag before getting matched. So, in this test case it would be asPercent(d,c) and asPercent(b,a). This behavior is same in Graphite

totalVal = totalsSerie.Datapoints[i].Val
} else {
totalVal = s.totalFloat
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't totalFloat pretty much guaranteed to be NaN here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because totalFloat is still a valid option here. (the else part of the if statement right before this)

for i := range serie.Datapoints {
var totalVal float64
if len(totalsSerie.Datapoints) > i {
totalVal = totalsSerie.Datapoints[i].Val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to always just assume this branch is taken and just write this code line directly without the if/else.
that way if we ever have a bug where len(totalSerie.Datapoints) != len(serie.Datapoints) we can troubleshoot it instead of trying to hide such error case and returning incorrect data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more of a check of whether there is a totalSeries at all. I'll change it to > 0, that way we get an index out of range exception if there's a bug.

sort.Slice(totals, func(i, j int) bool {
return totals[i].Target < totals[j].Target
})
for i, serie1 := range series {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of duplication here wrt the similar code block further down.
can't we do all the if/else stuff here to just set up the right totals and series variables,
and then do the same processing at the end irrespective of which scenario it was?

} else if totals != nil {
totalSeriesLists := groupSeriesByKey(totals, s.nodes, &keys)
totalSeries = getTotalSeries(totalSeriesLists)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • we compute totals even for keys we won't need (e.g. keys not in the input series)
  • for each missing case we repeatedly get a slice and fill it with NaN's. we could reuse the same slice in this case. my suggestion would be declare a var nones []schema.Point. then whenever you need it, if it's nil, instantiate it. if it's not, just reuse it.

Copy link
Collaborator

@shanson7 shanson7 Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think reusing a nones slice would play nicely with the pointSlicePool, would it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, because at the end the same slice would get added to the pool multiple times, which is bad. so nevermind that then.

Copy link
Contributor

@Dieterbe Dieterbe Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On phone now
best would be to just add the slice to the cache thing once I think (that way we can reuse them)

@Dieterbe
Copy link
Contributor

hi @stivenbb my first pass of review comments is now done :p let me know when you've addressed everything or if you have any questions.

@stivenbb
Copy link
Contributor Author

@Dieterbe ok, I think I've addressed all the requested changes. Let me know if I missed something

@Dieterbe
Copy link
Contributor

what is it that requires the new go version?

@shanson7
Copy link
Collaborator

#966 (comment)

math.Round

@Dieterbe
Copy link
Contributor

I think the last things to do here are :

* easier to follow code
* consistency with execWitNodes
* bugfix: pool-obtained sumseries should be recycled later
@stivenbb
Copy link
Contributor Author

Ok, just cherry picked. Will work on not computing totals when unnecessary.

@Dieterbe
Copy link
Contributor

oh i realized my code nones := pointSlicePool.Get().([]schema.Point) is wrong, should be = to prevent scoping issues I think.

@stivenbb
Copy link
Contributor Author

cassandra test seems to be failing after I cherry-picked... Do you know what that could be about?

@Dieterbe
Copy link
Contributor

looks like a flakey test. rerunning tests should fix it. but circleCI is not letting me. can you retrigger on https://circleci.com/workflow-run/ba5e3453-48a0-4731-9c34-22381695eb63 ? if not, your next push will.

@stivenbb
Copy link
Contributor Author

Ok, should be good to go now. LMK if my change looks good.

@Dieterbe
Copy link
Contributor

great work @stivenbb, thank you very much for your work on this.

@Dieterbe Dieterbe merged commit 6933255 into grafana:master Aug 14, 2018
@shanson7 shanson7 deleted the asPercent branch August 21, 2018 20:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants