-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement lead and lag built-in window function #429
Conversation
b78b355
to
26ca0fe
Compare
Codecov Report
@@ Coverage Diff @@
## master #429 +/- ##
==========================================
- Coverage 75.16% 75.15% -0.02%
==========================================
Files 150 152 +2
Lines 25144 25357 +213
==========================================
+ Hits 18899 19056 +157
- Misses 6245 6301 +56
Continue to review full report at Codecov.
|
4161e98
to
fddf52a
Compare
I plan to review this PR tomorrow |
9d153a7
to
737c2dd
Compare
Actually let's park this pull request for a while - I plan to implement sort and partition first and then window frame, after which the window shift approach might not be relevant. |
737c2dd
to
225c7ec
Compare
a4523e6
to
f676db8
Compare
now that #520 is implemented, this PR is ready |
f676db8
to
7db8d17
Compare
putting this back to draft as this relies on apache/arrow-rs#388 which is not yet in arrow 4.3 |
Oh no! Can we possibly use the API that is in Arrow 4.3 (and then we can upgrade datafusion to use the new api when the next version of Arrow comes out)? |
I don't mind parking this one here for a while since there would be many other window frame stuff to be done before revisiting this and by that time newer version would be released |
Ok, thank you. The plan is to do a 4.4 release in ~ 2 weeks |
e2d40bc
to
9f78341
Compare
9f78341
to
3a88c0d
Compare
ca475b4
to
1fae443
Compare
@alamb and @Dandandan this pull request is ready now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great start @jimexist -- I do have a question if this will generate the correct answer with multiple partitions.
Also, I suggest an end-to-end integration test using your great harness, but I suspect you plan to do so in a subsequent PR :) 👍
impl PartitionEvaluator for WindowShiftEvaluator { | ||
fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef> { | ||
let value = &self.values[0]; | ||
shift(value.as_ref(), self.shift_offset).map_err(DataFusionError::ArrowError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to restrict the window to the partition bounds? If the input array had 10 rows in 2 partitions, wouldn't this code produce 2 output partitions of 10 rows each (rather than 2 output partitions of 5 rows each)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb good catch, this is fixed and add with integration tests.
1fae443
to
29fdc24
Compare
Thanks @jimexist -- I ran out of time today but will check this out tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jimexist !
Which issue does this PR close?
implement lead and lag built-in window function.
based on #520 so review that first
Closes #553
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?