-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix arrow error when sorting on empty batch #271
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ use std::{ | |
|
||
use futures::{Stream, StreamExt}; | ||
|
||
use arrow_array::{ArrayRef, RecordBatch}; | ||
use arrow_array::{ArrayRef, RecordBatch, RecordBatchOptions}; | ||
use arrow_schema::{DataType, Field, FieldRef, Schema, SchemaRef}; | ||
|
||
use datafusion::{execution::TaskContext, physical_expr::*, physical_plan::*}; | ||
|
@@ -149,7 +149,10 @@ impl CopyStream { | |
.iter() | ||
.map(|v| copy_or_cast_array(v)) | ||
.collect::<Result<Vec<ArrayRef>, _>>()?; | ||
RecordBatch::try_new(self.schema.clone(), vectors).map_err(|e| arrow_datafusion_err!(e)) | ||
|
||
let options = RecordBatchOptions::new().with_row_count(Some(batch.num_rows())); | ||
RecordBatch::try_new_with_options(self.schema.clone(), vectors, &options) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should also check is there any other invocation of I just did a code search, maybe we probably need to revise the code in Expand.rs too: https://github.com/apache/arrow-datafusion-comet/blob/main/core/src/execution/datafusion/operators/expand.rs#L172 Of course, it should be done in a follow up PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. Although I'm not sure if it is possible to have |
||
.map_err(|e| arrow_datafusion_err!(e)) | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, should we wait until the DF PR is merged and use
https://github.com/apache/arrow-datafusion.git
instead of your personal fork?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pointed to my fork of DataFusion not just for this PR. We uses the fork of arrow-rs to use a workaround for the Java Arrow bug. So we must use the fork too in our DataFusion crate, otherwise rust compiler will complain duplicated structs etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I missed #239. It is OK as long as we don't stay in this state for too long 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we will change back to official arrow-rs and DataFusion once we get new Java Arrow release.