9.0.0 (2022-06-10)
Breaking changes:
- MINOR: Move
simplify_expression
rule todatafusion-optimizer
crate #2686 (andygrove) - Move physical expression planning to
datafusion-physical-expr
crate #2682 (andygrove) - Create new
datafusion-optimizer
crate for logical optimizer rules #2675 (andygrove) - Remove
ExecutionProps
dependency fromOptimizerRule
#2666 (andygrove) - Remove ObjectStoreSchemaProvider (#2656) #2665 (tustvold)
- Move
LogicalPlanBuilder
todatafusion-expr
crate #2576 (andygrove) LogicalPlanBuilder
now usesTableSource
instead ofTableProvider
#2569 (andygrove)- Remove
scan_empty
method fromLogicalPlanBuilder
#2568 (andygrove) - MINOR: Move expression utils from sql module to expr crate #2553 (andygrove)
- Remove
scan_json
methods fromLogicalPlanBuilder
#2541 (andygrove) - Remove
scan_avro
methods fromLogicalPlanBuilder
#2540 (andygrove) - Remove
scan_parquet
methods fromLogicalPlanBuilder
#2539 (andygrove) - MINOR: Move
ExprVisitable
andexprlist_to_columns
to datafusion-expr crate #2538 (andygrove) - Remove
scan_csv
methods fromLogicalPlanBuilder
#2537 (andygrove) - Fix Redundant ScalarValue Boxed Collection #2523 (comphead)
- Support for OFFSET in LogicalPlan #2521 (jdye64)
Implemented enhancements:
- [EPIC] JIT support for
DataFusion
#2703 - Show column names instead of column indices in query plans #2689
- Proposal: remove automated ballista CI checks from DataFusion #2679
- Pass SessionState to TableProvider #2658
- Is ObjectStoreSchemaProvider Still Needed? #2656
- Add logical plan support to
datafusion-proto
#2630 - Like, NotLike expressions work with literal
NULL
#2626 - Move
JOIN ON
predicates push down logic from planner to optimizer #2619 - Remove
ExecutionProps
fromOptimizerRule
trait #2614 - Add, Minus, Multiply, divide, Modulo operator work with literal
NULL
#2609 - Support
DESCRIBE <table>
to show table schemas #2606 - Support
CREATE OR REPLACE TABLE
#2605 - filter_push_down tests should not rely on TableProvider and ExecutionPlan #2600
- Move logical optimizer rules out of the core datafusion crate #2599
- Push Limit through outer Join #2579
datafusion_proto
crate should have exhaustive match statements for handlingExpr
#2565- String representation of Expr variant #2563
- File URI Scheme Interpretation #2562
- Implement physical plan for OFFSET #2551
- Update limit pushdown rule to support offsets #2550
- Move
LogicalPlanBuilder
todatafusion-expr
crate #2536 - Logical optimizer rule "simplify expressions" should not depend on the core datafusion crate #2535
- Support optional filter in Join #2509
- Improve SQL planner & logical plan support for JOIN conditions #2496
- Numeric, String, Boolean comparisons with literal
NULL
#2482 - Redundant ScalarValue Boxed Collection #2449
- ObjectStore Directory Semantics #2445
- Add support for
OFFSET
in SQL query planner + logical plan #2377 - SQL planner should use
TableSource
notTableProvider
#2346 - Move SQL query planning to new crate #2345
- Update LogicalPlan rustdoc code to not use LogicalPlanBuilder #2308
- [Optimizer] Refactor
convert join
#2256 - [Optimizer] Infer is not null predicate from
where clause
#2254 - Support ArrayIndex for ScalarValue(List) #2207
- [Ballista] Fill functional gaps between datafusion and ballista #2062
- [Ballista] support datafusion built_in UDAF work in ballista cluster #1985
- Export C API #1113
Fixed bugs:
- Fix Typos in Docs #2695
- Unable to build a docker image #2691
- Optimization pass
AggregateStatistics
changes type of output fromInt64
toUInt64
#2673 - ViewTable Circular Reference #2657
ScalarValue::to_array_of_size
panics computing statistics for nested parquet file #2653- The result type of count/count_distinct #2635
- limit_push_down is not working properly with OFFSET #2624
- Avro Tests Fail To Compile #2570
- Unused Window functions experssion is wrongly removed from LogicalPlan during optimalization #2542
- Bug: ObjectStoreRegistry get_by_uri does not return correct path when "scheme" is provided #2525
- There are duplicate and inconsistent copies of
datafusion.proto
#2514 - Projection pushdown produces incorrect results when column names are reused #2462
- Incorrect Parquet Projection For Nested Types #2453
- LogicalPlanBuilder::scan_csv creates scans with invalid table names #2278
- Inner join incorrectly pushdown predicate with OR operation #2271
- Ignored alias for columns with aggregate function and incorrect results when collecting statistics is enabled #2176
- Join on path partitioned columns fails with error #2145
Documentation updates:
- Fix Ballista link #2654 (dsaxton)
- MINOR: Add Blaze as a project using DataFusion #2618 (yjshen)
- [MINOR] remove datafusion-cli's ballista feature from docs #2612 (Ted-Jiang)
- chore(doc) remove ballista from datafusion-cli readme #2604 (ming535)
Closed issues:
- [Question] Converting TableSource to custom TableProvider #2644
- [Question] Why DataFusion is shipped with arrow version 9.1.0 on crates.io ? #2474
Merged pull requests:
- Test optional features in CI #2708 (tustvold)
- support indexed fields proto #2707 (nl5887)
- Update sqlparser-rs to 0.18.0 #2705 (alamb)
- [MINOR]: Add documentation to
datafusion-row
modules #2704 (alamb) - Make sure that the data types are supported in hashjoin before genera… #2702 (AssHero)
- Move remaining code out of legacy
core/logical_plan
module #2701 (andygrove) - Move some tests from core to expr #2700 (andygrove)
- MINOR: Improve Docs Readability #2696 (ryanrussell)
- Combine limit and offset to
fetch
andskip
and implement physical plan support #2694 (ming535) - MINOR: Add datafusion-sql example #2693 (andygrove)
- Remove Ballista related lines from Dockerfile #2692 (mocknen)
- Show column names instead of indices in query plans #2690 (andygrove)
- MINOR: Remove uses of TryClone for Parquet #2681 (tustvold)
- Fix
AggregateStatistics
optimization so it doesn't change output type #2674 (alamb) - If statistics of column Max/Min value does not exists in parquet file, sent Min/Max to None #2671 (AssHero)
- MINOR: Move more expression code to
datafusion-expr
crate #2669 (andygrove) - MINOR: Rewrite imports in optimizer moduler #2667 (andygrove)
- Update snmalloc-rs requirement from 0.2 to 0.3 #2663 (dependabot[bot])
- Add module doc for RuntimeEnv, SessionContext, TaskContext, etc... #2655 (tustvold)
- Prune unused dependencies from datafusion-proto #2651 (tustvold)
- MINOR: Implement serde for join filter #2649 (andygrove)
- pushdown support for predicates in
ON
clause of joins #2647 (korowa) - Move
SortKeyCursor
andRowIndex
into modules, addsort_key_cursor
test #2645 (alamb) - Implement DESCRIBE <table> #2642 (LiuYuHui)
- Implement
LogicalPlan
serde indatafusion-proto
#2639 (andygrove) - Fix limit + offset pushdown #2638 (ming535)
- change result type of count/count_distinct from uint64 to int64 #2636 (liukun4515)
- if none columns in window expr are needed, remove the window exprs #2634 (AssHero)
- Like, NotLike expressions work with literal
NULL
#2627 (WinkerDu) - MINOR: Refactor
datafusion-proto
dependencies and imports #2623 (andygrove) - MINOR: add optimizer struct #2616 (jackwener)
- Remove FilterPushDown dependency on physical plan #2615 (andygrove)
- Support CREATE OR REPLACE TABLE #2613 (AssHero)
- Support binary mathematical operators work with
NULL
literals #2610 (WinkerDu) - chore: try fix CI coverage #2608 (Ted-Jiang)
- MINOR: Rename benchmark crate #2607 (andygrove)
- chore(dep): bump cranelift to 0.84.0 #2598 (waynexia)
- fix some typos #2597 (ming535)
- Support limit pushdown through left right outer join #2596 (Ted-Jiang)
- Unignore rustdoc code examples in
datafusion-expr
crate #2590 (andygrove) - Evaluate JIT'd expression over arrays #2587 (waynexia)
- [minor]Fix ci clippy for unused import #2586 (Ted-Jiang)
- [Doc]add doc for enable SIMD need
cargo nightly
#2577 (Ted-Jiang) - Add DataFrame
union_distinct
and fix documentation fordistinct
#2574 (andygrove) - Fix avro tests (#2570) #2571 (tustvold)
- Make datafusion-proto match exhaustive #2567 (andygrove)
- Support limit push down for offset_plan #2566 (Ted-Jiang)
- Introduce Expr.variant_name() function #2564 (jdye64)
- Fix some 404 links in the contribution guide #2561 (hi-rustin)
- Update datafusion-cli readme cli version #2559 (hi-rustin)
- MINOR: Move
expr_rewriter.rs
todatafusion-expr
crate #2552 (andygrove) - Fix
JOIN
s with complex predicates in ON (split ON expressions only by AND operator) #2534 (korowa) - Reduce duplication in file scan tests #2533 (tustvold)
- Fix size_of_scalar test #2531 (alamb)
- Update to arrow-rs 14.0.0 #2528 (alamb)
- ObjectStoreRegistry get_by_uri now returns correct path when "scheme" is provided #2526 (timvw)
- MINOR: Add ORDER BY clause to test #2524 (andygrove)
- Remove unused
binary_array_op_scalar!
in binary.rs #2512 (alamb) - fix
NULL <op> column
evaluation, tests for same #2510 (alamb) - Fix projection pushdown produces incorrect results when column names are reused #2463 (jonmmease)
- Benchmark for sort preserving merge #2431 (alamb)
- Support GetIndexedFieldExpr for ScalarValue #2196 (ovr)