implement median aggregate #2411

neonstalwart · 2015-04-23T22:19:09Z

this isn't quite working yet but if anyone wants to provide early feedback about the approach and/or help bring it over the line, that would be great!

i've included getSortedRange as a building block for some other aggregations. it partitions the input until we have a range of values that we are interested in and then sorts just that sub-range. the partitioning is an attempt at avoid sorting the whole series. the partitioning to find the range should be O(n) in the average case.

neonstalwart · 2015-04-24T02:58:17Z

this is ready for a look now.

toddboom · 2015-04-30T23:47:05Z

@neonstalwart would you mind rebasing this?

neonstalwart · 2015-04-30T23:51:26Z

no problem. first thing tomorrow

neonstalwart · 2015-05-01T13:56:51Z

@toddboom rebased. one thing i wondered was if MapStddev should get a more generic name but i couldn't think of something good off the top of my head - MapFloat64 just doesn't seem right.

otoolep · 2015-05-01T21:18:18Z

influxql/functions.go

+	}
+}
+
+func getSortedRange(data []float64, start int, count int) []float64 {


While these functions are unexported, they are not trivial. Would doc strings help?

what's your convention/requirements? i'll try to add something if you like.

i just tried to make the names descriptive since i think that things like

// MapMin collects the values to pass to the reducer

are kind of redundant.

A doc string introducing getSortedRange would be useful -- why it exists, and its advantages over the standard library. Basically answering the question I raised.

otoolep · 2015-05-01T21:33:31Z

Thanks @neonstalwart -- tests look good, results make sense.

I would like to know why you just didn't use the standard library to sort the data before selecting the mean. I might be missing something about median, so please let me know.

neonstalwart · 2015-05-01T21:36:27Z

the partitioning and discarding is O(N) in the average case compared to O(NlgN) for the library sort.

i tried to make the partitioning and discarding generic enough that they could be used to do things like get the largest/smallest N elements without sorting the whole list. assuming that your data set is large, sorting the whole thing is more work than needed when you just want a small subset of the sorted set.

neonstalwart · 2015-05-01T22:25:40Z

the last piece of feedback remaining to be addressed is to add unit tests for getSortedRange and friends. it will probably be next week before that happens.

otoolep · 2015-05-01T23:36:58Z

Great, thanks @neonstalwart -- looking forward to it.

neonstalwart · 2015-05-04T22:13:42Z

@otoolep i added a few tests for getSortedRange. i'm open to any suggestions for other things to test - sometimes it's easier to see things from the outside.

in the process of adding tests i thought i would add some benchmarks to compare with the built-in sort and found that i had missed the mark by a lot (about 3 times slower) due to poor memory management. i was able to get closer to where it should be by making some tweaks and on my machine i'm now seeing getSortedRange is about 40% faster than using the built-in sort (5961 ns/op vs 10084 ns/op) on those benchmarks.

otoolep · 2015-05-04T22:39:13Z

Nice.

It seems like you used the standard Go benchmark approach to profiling the code, correct? Can you show us the full output?

neonstalwart · 2015-05-05T01:13:51Z

@otoolep here's the full output of another test run - the tests are included in this PR

> go test -v -bench=BenchmarkGetSortedRange -run=XXX ./influxql
PASS
BenchmarkGetSortedRangeByPivot    300000              5822 ns/op
BenchmarkGetSortedRangeBySort     100000             10717 ns/op
ok      github.com/influxdb/influxdb/influxql   3.112s

otoolep · 2015-05-05T19:04:25Z

Great -- thanks @neonstalwart. I will take 1 final look at this, and then merge.

Thanks again for the thorough job.

otoolep · 2015-05-05T23:00:08Z

@pauldix -- you want to take a quick look?

implement median aggregate

neonstalwart force-pushed the median-aggregate branch from a42c260 to 386ee70 Compare April 24, 2015 02:54

neonstalwart mentioned this pull request Apr 24, 2015

Wire up MEDIAN aggregate #1824

Closed

neonstalwart force-pushed the median-aggregate branch 2 times, most recently from 90047e4 to 7cf2247 Compare April 24, 2015 22:27

neonstalwart force-pushed the median-aggregate branch from 7cf2247 to 5314d4a Compare May 1, 2015 13:47

otoolep reviewed May 1, 2015
View reviewed changes

neonstalwart added 4 commits May 4, 2015 11:09

add median aggregation tests

6571f95

implemented median aggregation

40af5fd

use distinct series names to avoid resetting db between each test

6fbae01

add some documentation for getSortedRange

c47f803

neonstalwart force-pushed the median-aggregate branch from bbad2ae to c47f803 Compare May 4, 2015 16:10

add some tests for getSortedRange and improve efficiency of partitioning

e20444f

toddboom added a commit that referenced this pull request May 6, 2015

Merge pull request #2411 from neonstalwart/median-aggregate

710576e

implement median aggregate

toddboom merged commit 710576e into influxdata:master May 6, 2015

neonstalwart deleted the median-aggregate branch May 6, 2015 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement median aggregate #2411

implement median aggregate #2411

neonstalwart commented Apr 23, 2015

neonstalwart commented Apr 24, 2015

toddboom commented Apr 30, 2015

neonstalwart commented Apr 30, 2015

neonstalwart commented May 1, 2015

otoolep May 1, 2015

neonstalwart May 1, 2015

otoolep May 1, 2015

otoolep commented May 1, 2015

neonstalwart commented May 1, 2015

neonstalwart commented May 1, 2015

otoolep commented May 1, 2015

neonstalwart commented May 4, 2015

otoolep commented May 4, 2015

neonstalwart commented May 5, 2015

otoolep commented May 5, 2015

otoolep commented May 5, 2015

implement median aggregate #2411

implement median aggregate #2411

Conversation

neonstalwart commented Apr 23, 2015

neonstalwart commented Apr 24, 2015

toddboom commented Apr 30, 2015

neonstalwart commented Apr 30, 2015

neonstalwart commented May 1, 2015

otoolep May 1, 2015

Choose a reason for hiding this comment

neonstalwart May 1, 2015

Choose a reason for hiding this comment

otoolep May 1, 2015

Choose a reason for hiding this comment

otoolep commented May 1, 2015

neonstalwart commented May 1, 2015

neonstalwart commented May 1, 2015

otoolep commented May 1, 2015

neonstalwart commented May 4, 2015

otoolep commented May 4, 2015

neonstalwart commented May 5, 2015

otoolep commented May 5, 2015

otoolep commented May 5, 2015