Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update arrow to 51, datafusion to 37 #2240

Merged
merged 5 commits into from
Apr 23, 2024

Conversation

westonpace
Copy link
Contributor

No description provided.

@westonpace westonpace requested review from eddyxu and wjones127 April 22, 2024 21:26
Comment on lines +52 to 58
schema: schema.clone(),
properties: PlanProperties::new(
EquivalenceProperties::new(schema),
Partitioning::RoundRobinBatch(1),
datafusion::physical_plan::ExecutionMode::Bounded,
),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExecPlan interface changed. output_partitioning and output_ordering have gone away and properties() now replaces both of those (as well as a new field, "execution mode".

pub trait ExprExt {
// Helper function to replace Expr::field in DF 37 since DF
// confuses itself with the GetStructField returned by Expr::field
fn field_newstyle(&self, name: &str) -> Expr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the most complicated change. The comment explains it ok. I expect this may eventually go away (e.g. when apache/datafusion#10181 is resolved)

// TODO: consider making this dispatch more generic, i.e. fun.output_type -> coerce
// instead of hardcoding coerce method for each function
Expr::ScalarFunction(ScalarFunction {
func_def: ScalarFunctionDefinition::BuiltIn(BuiltinScalarFunction::RegexpMatch),
func_def: ScalarFunctionDefinition::UDF(udf),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no more builtin functions. So we need to look for regexp_match as a UDF instead.

@@ -25,16 +25,18 @@ impl PhysicalOptimizerRule for CoalesceTake {
plan: Arc<dyn ExecutionPlan>,
_config: &ConfigOptions,
) -> DFResult<Arc<dyn ExecutionPlan>> {
plan.transform_down(&|plan| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DF made minor changes to the transform_down function but no real change here (e.g. Yes -> yes and we return transform_down(...).data()

@@ -183,25 +203,23 @@ impl ContextProvider for LanceContextProvider {
)))
}

fn get_aggregate_meta(&self, name: &str) -> Option<Arc<AggregateUDF>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlToRel now relies on the ContextProvider for functions (e.g. regexp_match). These come from a SessionState instance and so we needed to add one of those to the LanceContextProvider (the default is fine)

@@ -69,6 +75,10 @@ impl ExecutionPlan for TestingExec {
}

fn statistics(&self) -> datafusion::error::Result<datafusion::physical_plan::Statistics> {
todo!()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: DF now actually calls statistics and so we can't get away with todo!() here anymore.

Cargo.toml Outdated
"array_expressions",
"regex_expressions",
] }
datafusion-common = "37.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datafusion is on 37.1 https://docs.rs/datafusion/latest/datafusion/

Should we bump to datafusion 37.1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea, maybe just 37 for simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, probably best to do 37.1 just to avoid issues. It doesn't change that often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have a reason to require >=37.1, I feel like it would be nice to keep at 37.0. Then users can use any 37.X version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean 37? Or does 37.0 allow 37.1?

Scanning through apache/datafusion#9904 I don't see any defects that would impact us.

Copy link
Contributor

@eddyxu eddyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI

@codecov-commenter
Copy link

codecov-commenter commented Apr 22, 2024

Codecov Report

Attention: Patch coverage is 84.70588% with 52 lines in your changes are missing coverage. Please review.

Project coverage is 81.07%. Comparing base (1b8adb5) to head (98b2049).
Report is 1 commits behind head on main.

Files Patch % Lines
rust/lance/src/io/exec/planner.rs 77.21% 18 Missing ⚠️
rust/lance/src/datafusion/logical_expr.rs 83.33% 3 Missing and 5 partials ⚠️
rust/lance/src/io/exec/scalar_index.rs 80.55% 7 Missing ⚠️
rust/lance-index/src/scalar/expression.rs 0.00% 6 Missing ⚠️
rust/lance/src/io/exec/optimizer.rs 85.71% 0 Missing and 5 partials ⚠️
rust/lance-datafusion/src/exec.rs 81.81% 2 Missing ⚠️
rust/lance/src/io/exec/testing.rs 84.61% 2 Missing ⚠️
rust/lance-index/src/scalar/btree.rs 0.00% 1 Missing ⚠️
rust/lance/src/io/exec/projection.rs 95.00% 0 Missing and 1 partial ⚠️
rust/lance/src/io/exec/pushdown_scan.rs 95.65% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2240      +/-   ##
==========================================
+ Coverage   81.01%   81.07%   +0.06%     
==========================================
  Files         186      186              
  Lines       53583    53722     +139     
  Branches    53583    53722     +139     
==========================================
+ Hits        43408    43554     +146     
+ Misses       7707     7690      -17     
- Partials     2468     2478      +10     
Flag Coverage Δ
unittests 81.07% <84.70%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -34,6 +38,45 @@ fn resolve_value(expr: &Expr, data_type: &DataType) -> Result<Expr> {
}
}

/// A simple helper function that interprets an Expr as a string scalar
/// or returns None if it is not.
pub fn get_as_string_scalar_opt(expr: &Expr) -> Option<&String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: &String in a signature is pretty much never desirable.

Suggested change
pub fn get_as_string_scalar_opt(expr: &Expr) -> Option<&String> {
pub fn get_as_string_scalar_opt(expr: &Expr) -> Option<&str> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to &str.

Ok(expr.clone())
}
}))
} else if let Some(left_type) = dbg!(resolve_column_type(left.as_ref(), schema)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left over dbg!?

Suggested change
} else if let Some(left_type) = dbg!(resolve_column_type(left.as_ref(), schema)) {
} else if let Some(left_type) = resolve_column_type(left.as_ref(), schema) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed.

@westonpace westonpace merged commit 8ac02bc into lancedb:main Apr 23, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants