-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Min and max numerical aggregates #3078
feat: Min and max numerical aggregates #3078
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3078 +/- ##
===========================================
+ Coverage 79.67% 79.85% +0.18%
===========================================
Files 348 351 +3
Lines 27014 27358 +344
===========================================
+ Hits 21522 21845 +323
- Misses 3962 3975 +13
- Partials 1530 1538 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 14 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
} | ||
|
||
// Note: this test should follow a different code path to `_avg` on it's own | ||
// utilising the existing `_min` node instead of adding a new one. This test cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: _max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want too review this but will wait until the tests are passing, please re-request my review when they are and I'll get right on this :)
db85fa3
to
58a5093
Compare
Should be all fixed now. There was an issue with casting max value floats to ints which I think may exist in the sum aggregate where I copied the code from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just some minor stuff to sort out first IMO, and the below question.
question: It does not lok like you have added max/min to non-numeric types such as DateTime
, String
, Boolean
, etc, and the gql related code does not look like it will facilitate the introduction of that support - was this an oversight or do you have a plan on how to introduce it later? I am concerned that doing so will be a breaking change to the GQL syntax due to the return type.
internal/planner/aggregate.go
Outdated
return a.Value() < b.Value() | ||
} | ||
|
||
func reverse[T any](original func(T, T) bool) func(T, T) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: inverse
might be a better name? I'm not sure though. Maybe a line of documentation would be handy as reverse
suggests this is order related, but the implementation and signature does not suggest it is limited to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
internal/planner/max.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
// If one source property is a float, the result will be a float - no need to check the rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This comment looks copy pasted and doesn't quite make sense for min/max If one source property is a float, the result will be a float
is not really correct, although this is a valid enough way to determine return-type. I suggest tweaking the comment slightly for min/max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing which part is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you had:
[]any{
float(3.5),
int(9999),
}
max
would be an int. Summing/averaging the set would always result in a float, but min/maxing the set would not. The comment makes sense for other aggregates, but not for min/max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see what you mean. I've refactored the logic so it actually returns the correct type now.
Correct me if I'm wrong but I don't think we have any operations that would allow us to test mixed type aggregates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but I don't think we have any operations that would allow us to test mixed type aggregates.
No you are correct, I'm only commenting on the comment, the code/tests are all good IMO :)
I've refactored the logic so it actually returns the correct type now.
Nice, thanks!
func (n *maxNode) Kind() string { return "maxNode" } | ||
func (n *maxNode) Init() error { return n.plan.Init() } | ||
func (n *maxNode) Start() error { return n.plan.Start() } | ||
func (n *maxNode) Spans(spans core.Spans) { n.plan.Spans(spans) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: This should probably be covered by tests, I think having a docID
param on the aggregate should hit this, otherwise it could be deleted/broken and it wouldn't break the build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a test that should cover it
internal/planner/max.go
Outdated
max := -math.MaxFloat64 | ||
for _, source := range n.aggregateMapping { | ||
child := n.currentValue.Fields[source.Index] | ||
collectionMax := -math.MaxFloat64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: The code in this for loop looks like it will return -math.MaxFloat64
if the target is empty/nil, have you tested this, and is it the behaviour we want (nil
might be better, otherwise an empty set will have the same result as an item with a literal -math.MaxFloat64
?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to return nil on empty values
internal/planner/max.go
Outdated
max := -math.MaxFloat64 | ||
for _, source := range n.aggregateMapping { | ||
child := n.currentValue.Fields[source.Index] | ||
collectionMax := -math.MaxFloat64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggstion: You seem to alternate between using ``-math.MaxFloat64and
-math.MinFloat64`, this makes spotting its significance/other-refs harder and adds a easily avoided little confusion - I suggest picking one and using it throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored, this is no longer applicable
plan planNode | ||
|
||
isFloat bool | ||
virtualFieldIndex int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I'm sorry about the bother, because you are copying my undocumented work in the other aggregates, but I think it would be worth a line of documentation on virtualFieldIndex
on these nodes, its not very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I like what you did with the aggregate generics and thanks for adding all those tests. Just need to resole Andy's comments before merge.
I'd prefer to add support for non-numerical types in a new PR. I think it will require a good amount of changes to the aggregate types and I'm not worried about a breaking GQL change if it's needed. |
Which aggregate types? The new types added in this PR? Or some existing stuff to? I am concerned that you will just end up re-writing a lot the new stuff added in this PR to handle (IMO) a pretty core sub-feature (DateTime). Note: I am alright with you doing that in another PR is everyone else is, especially given this has already been reviewed now (and so a lot of the extra time has already been spent), but my preference would be to get the long-term syntax (and internal code structure) correct first. |
internal/planner/max.go
Outdated
var max *float64 | ||
for _, source := range n.aggregateMapping { | ||
child := n.currentValue.Fields[source.Index] | ||
var collectionMax *float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: This is incorrect for large integers, you cannot convert a very big int to a float and then back again and expect it to have the same value.
suggestion: Add failing tests before fixing this (if you can, you might struggle to set an Int to Int64MaxVal atm via our public clients, as GQL is technically only 32bit Ints)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. the below will result in a negative number:
a := math.MaxInt64 - 1
println(fmt.Sprint(a))
b := float64(a)
c := int64(b)
panic(fmt.Sprint(c))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to use big.Float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added tests for int max value
I haven't thought through all of it yet, but because the aggregates are expected to return a float and can then be used as input into other aggregates, there will be issues with an aggregate that can return a union type of String, DateTime, etc.
The planner code is very generic. I'm not too worried about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for doing the suggestions
Yes, that is why I am concerned you are painting the syntax into a corner and we will need to make an avoidable breaking change in the (possibly near) future in order to correct this problem. The breaking change will require us to change our tests, and users to change their code. I do not think this is a good thing to do, but will not block the PR over it. Please resolve the cast problem/test request before merge though. |
The GQL type system won't allow for an aggregate that returns a union to be used as input in another aggregate. I think the best approach will be to add new aggregates: This shouldn't result in any breaking changes to the existing aggregates. |
Only if we dont consider it weird that we have |
623e275
to
4f8b79c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked over the long term plan in discord, is fully compatible with this PR - thanks for all of this Keenan, and for the discussion :)
bug bash result: all looks good. |
Relevant issue(s)
Resolves #2978
Description
This PR adds support for two new query aggregates
min
andmax
.Tasks
How has this been tested?
Added integration tests
Specify the platform(s) on which this was tested: