-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement window functions with order_by
clause
#520
Conversation
1511b62
to
6cc7f96
Compare
Codecov Report
@@ Coverage Diff @@
## master #520 +/- ##
==========================================
- Coverage 76.09% 75.99% -0.10%
==========================================
Files 156 156
Lines 27047 27036 -11
==========================================
- Hits 20581 20547 -34
- Misses 6466 6489 +23
Continue to review full report at Codecov.
|
3d76e38
to
2a038e1
Compare
order_by
clause
81a834e
to
7fbe3f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice @jimexist -- I went over the code and saw only goodness :)
All that this PR needs to be mergeable in my opinion is to reset the Cargo arrow*
references (now that arrow 4.3.0 has been released)
4, | ||
) | ||
.await?; | ||
// result in one batch, although e.g. having 2 batches do not change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if num_rows == 0 { | ||
return Ok(new_empty_array(value.data_type())); | ||
} | ||
let index: usize = match self.kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
value.len() | ||
))); | ||
} | ||
if num_rows == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function could ever be passed a 0 row input? This check isn't a problem I am just wondering if my mental model is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be changed in later pull request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you are right this would not be passed with 0 length input. this check is just being pedantic.
let arr: ArrayRef = Arc::new(Int32Array::from(vec![1, -2, 3, -4, 5, -6, 7, 8])); | ||
let values = vec![arr]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test change shows the nice refactoring
/// the accumulator expects the same number of arguments as `expressions` and must | ||
/// return states with the same description as `state_fields` | ||
fn create_accumulator(&self) -> Result<Box<dyn WindowAccumulator>>; | ||
|
||
/// expressions that are passed to the WindowAccumulator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WindowExpr
trait is looking 👌
/// A window expression that is a built-in window function. | ||
/// | ||
/// Note that unlike aggregation based window functions, built-in window functions normally ignore | ||
/// window frame spec, with th expression of first_value, last_value, and nth_value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// window frame spec, with th expression of first_value, last_value, and nth_value. | |
/// window frame spec, with the exception of first_value, last_value, and nth_value. |
/// peer based evaluation based on the fact that batch is pre-sorted given the sort columns | ||
/// and then per partition point we'll evaluate the peer group (e.g. SUM or MAX gives the same | ||
/// results for peers) and concatenate the results. | ||
fn peer_based_evaluate(&self, batch: &RecordBatch) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the naming of peer
here (rather than range_based_evaluate
for example, to match with WindowFrameUnits::Range
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will possibly change this naming in implementing #361 but for the moment, range
and groups
both evaluates with peers but rows
evaluates based on rows on each scan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is private function i guess i can leave the naming part for later changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it is fine for now
let len = value_range.end - value_range.start; | ||
let values = values | ||
.iter() | ||
.map(|v| v.slice(value_range.start, len)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
-- See the License for the specific language gOVERning permissions and | ||
-- limitations under the License. | ||
|
||
SELECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice 👍
2650680
to
ce4e262
Compare
thank you for taking time to review. the changes to arrow references are now reverted. |
ce4e262
to
9f6a56b
Compare
@alamb this pull request is ready now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jimexist
/// peer based evaluation based on the fact that batch is pre-sorted given the sort columns | ||
/// and then per partition point we'll evaluate the peer group (e.g. SUM or MAX gives the same | ||
/// results for peers) and concatenate the results. | ||
fn peer_based_evaluate(&self, batch: &RecordBatch) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it is fine for now
Which issue does this PR close?
Closes #360
for now this pull request relies on arrow 4.3.0 to merge
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?