-
Notifications
You must be signed in to change notification settings - Fork 1
Merge our internal changes and fix conflicts for DF branch 50 #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Initial commit * Fix formatting * Add across partitions check * Add new test case Add a new test case * Fix buggy test
…#13909) (apache#13934) * Set utf8view as return type when input type is the same * Verify that the returned type from call to scalar function matches the return type specified in the return_type function * Match return type to utf8view Co-authored-by: Tim Saucer <timsaucer@gmail.com>
This reverts commit 5383d30.
* fix: fetch is missed in the EnfoceSorting * fix conflict * resolve comments from alamb * update
…it disabled by default
…e#14415) (apache#14453) * chore: Fixed CI * chore * chore: Fixed clippy * chore Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
* Test for string / numeric coercion * fix tests * Update tests * Add tests to stringview * add numeric coercion
/// (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`, | ||
/// and `Binary/BinaryLarge` with `BinaryView`. | ||
pub schema_force_view_types: bool, default = true | ||
pub schema_force_view_types: bool, default = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to utf8.
// Whether to allow truncated rows when parsing. | ||
// By default this is set to false and will error if the CSV rows have different lengths. | ||
// When set to true then it will allow records with less than the expected number of columns | ||
pub truncated_rows: Option<bool>, default = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our csv truncated_rows support, will be included in DF 51.0.0, but not for DF 50.0.0.
apache#17465
DecoderDeserializer::new(CsvDecoder::new(decoder)) | ||
} | ||
|
||
fn csv_deserializer_with_truncated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our csv truncated_rows support, will be included in DF 51.0.0, but not for DF 50.0.0.
apache#17465
/// By default this is set to false and will error if the CSV rows have different lengths. | ||
/// When set to true then it will allow records with less than the expected number of columns and fill the missing columns with nulls. | ||
/// If the record’s schema is not nullable, then it will still return an error. | ||
pub truncated_rows: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
assert_eq!( | ||
string_truncation_stats.max_value, | ||
Precision::Inexact(ScalarValue::Utf8View(Some("b".repeat(63) + "c"))) | ||
Precision::Inexact(Utf8(Some("b".repeat(63) + "c"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We default to utf8.
.await | ||
.await?; | ||
let mut id_annotator = NodeIdAnnotator::new(); | ||
annotate_node_id_for_execution_plan(&physical_plan, &mut id_annotator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our internal node_id support.
let filter = col("date_string_col").eq(lit(ScalarValue::new_utf8view("01/01/09"))); | ||
// xudong: use new_utf8, because schema_force_view_types was changed to false now. | ||
// qi: when schema_force_view_types setting to true, we should change back to utf8view | ||
let filter = col("date_string_col").eq(lit(ScalarValue::new_utf8("01/01/09"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to utf8.
self.sink.metrics() | ||
} | ||
|
||
fn with_node_id( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal node_id support.
} | ||
} | ||
|
||
fn with_node_id( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal node_id support.
/// Infers new predicates by substituting equalities. | ||
/// For example, with predicates `t2.b = 3` and `t1.b > t2.b`, | ||
/// we can infer `t1.b > 3`. | ||
fn infer_predicates_from_equalities(predicates: Vec<Expr>) -> Result<Vec<Expr>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to:
apache#15906
Should we reopen our upstream PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do later
} | ||
|
||
context.update_plan_from_children() | ||
Ok((context.update_plan_from_children()?, fetch)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to our internal fetch support for enforce_distribution.
data, | ||
children, | ||
}, | ||
mut fetch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to our internal fetch support for enforce_distribution.
// If `fetch` was not consumed, it means that there was `SortPreservingMergeExec` with fetch before | ||
// It was removed by `remove_dist_changing_operators` | ||
// and we need to add it back. | ||
if fetch.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to our internal fetch support for enforce_distribution.
use datafusion_common::DataFusionError; | ||
|
||
// Util for traversing ExecutionPlan tree and annotating node_id | ||
pub struct NodeIdAnnotator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our internal support for node_id.
/// Execution plan for values list based relation (produces constant rows) | ||
#[deprecated( | ||
since = "45.0.0", | ||
note = "Use `MemorySourceConfig::try_new_as_values` instead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we still use deprecated API, so i can try to upgrade those cases in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
fn remove_dist_changing_operators( | ||
mut distribution_context: DistributionContext, | ||
) -> Result<DistributionContext> { | ||
) -> Result<( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal fetch support.
Merge our internal changes and fix conflicts for DF branch 50.