13.0.0 (2022-10-06)
Breaking changes:
- Make ObjectStoreProvider fallible (return
Result
rather thanOption
) #3584 (tustvold) - Make
OptimizerConfig
a builder style API #3525 (alamb)
Implemented enhancements:
- remove
type coercion
for ScalarUDF in the physical phase #3734 - Allow with statements to specify their columns alongside their expression names #3716
- Support SQLDataType::Timestamp(TimezoneInfo) #3693
- support
type coercion
for case when expr #3673 - Add simplification rules for the
Modulo
operator #3664 - Add TIMESTAMPTZ #3659
- Simplify
A * 0
andA * null
. #3626 - change rule of
PreCastLitInComparisonExpressions
to unwrap cast rule after #3582 #3622 - Optimize regex_replace with a known pattern / replacement #3613
- Simplify
CONCAT_WS(NULL, ..)
toNULL
#3607 - Add OctoSQL to list of systems powered by DataFusion #3605
- Prevent over-allocation (and spills) on TopK queries #3596
- Allow ObjectStoreProvider to return None (return Result<Option> rather than Result) #3594
- simplify between expr should consider the data type #3587
- make type coercion simple and remove the evaluate logic #3585
- ReduceOuterJoin optimizer support
cast or try_cast
expr. #3565 - Support type coercion for subquery #3557
- Make
ParquetScanOptions
public and expose a reference to the scan options fromParquetExec
#3550 - Use
fetch
limit inget_sorted_iter
#3544 - Push limit to sort #3528
- Execute sorts in parallel when limit is used after sort #3526
- Consolidate optimizer passes in optimizer module for better testing #3524
- Support Top-K query optimization for `ORDER BY <EXPR> [ASC #3515
- support the type coercion for
like
unlike
istrue
isfalse
isunknown
#3509 - Automate the pushing of releases to Homebrew #3506
- Add extra DATE_PART units that are already supported in arrow-rs #3502
- Release datafusion-cli 12.0.0 on Homebrew #3501
- Make
from_proto_binary_op
public #3489 - coercion between decimal and other types lacking, compared to other numeric types #3479
- move type coercion for inlist from physical phase to logical phase #3468
- Make
datafusion::physical_plan::file_format::file_strean::FileStream
public #3466 - Support using offset index in
ParquetRecordBatchStream
when pushing downRowFilter
#3456 - Support timestamp data type in In_list node #3449
- Evaluate expressions after type coercion #3431
- Make a convenience function to register a single
RecordBatch
as a table from SessionContext #3426 - add datafusion-cli support of external table locations that object_store supports #3424
- pruning support cast/try_cast expr #3414
- Add documentation on querying against files in object store such as S3 #3399
- Remove type-coercion from physical planner #3388
- support
Statement::ShowVariable
to show session configs #3364 - Support
RowFilter
inParquetExec
#3360 - Apply
TypeCoercion
rule beforeFilterPushDown
#3289 - Add support for
get
/show
timezone #3255 - Consider adding DataFusion to ClickBench benchmarks #2902
filter_push_down
panics on semi/anti join with join filters #2888- Migrate the
cross join -> inner join optimization
from the planner to the optimizer #2859 - ObjectStore write support #2185
- DataFusion should scan Parquet statistics once per query #871
- Extend & generalize constant folding / evaluation in logical optimizer #237
Fixed bugs:
projection_push_down
produces invalid aggregate plans in some cases #3738Time With Time Zone
should raise error untilDataType::Time64
support tz #3715- SQL Planner doesn't distinguish normal CTEs from the recursive ones. #3713
- Fix inconsistency between column name formats #3711
- Optimizer rule 'projection_push_down' failed due to unexpected error: Error during planning: Aggregate schema has wrong number of fields. Expected 3 got 8 #3704
- Optimizer regressions in
unwrap_cast_in_comparison
#3690 - Internal error when evaluating a predicate = "The type of Dictionary(Int16, Utf8) = Int64 of binary physical should be same" #3685
- Specialized regexp_replace should early-abort when the the input arrays are empty #3647
- Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3646
- Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3645
- Type coercion error: The type of Boolean AND Decimal128(10, 2) of binary physical should be same #3644
- LEFT JOIN not working as expected, error message is confusing #3639
INTERSECT
andEXCEPT
don't return an error when 2 sets have the different number of columns #3632- The datafusion-cli panics when
union
2 table with different number of columns. #3630 - The expression
col(a) / null
is not optimized. #3624 s3_build_error
test may fail in some environments #3601- New clippy errors appears to be break the CI on the master #3597
StringConcat
gives inconsistent result withconcat
when containingnull
#3569- simplify_expressions don't support different data type for binary #3556
- Broken logical plan serialization for aggregation queries #3555
- Aggregate filters do not get pushed down to table scan #3546
docs.rs
cannot builddatafusion-proto
crate #3538- DataFusion serialization doesn't handle
ScalarValue::Dictionary, Binary, LargeBinary, Time64, IntervalMonthDayNano, Struct
#3531 - What should be returned when trying to get a config in invalid format? #3505
- Dividing decimal type gives wrong error: "170141183460469231731687303715884105727 is too large to store in a Decimal128 #3498
- Add BitwiseXor in function
from_proto_binary_op
#3495 - comparison operations with a scalar null and decimal array panics #3487
- Union columns with different types #3467
- Can't get the right logical plan after optimizer #3421
- Fix conflict between simplify_expression rule and CAST expressions #3409
- Empty array giving error #2439
- Internal error: Unsupported data type in hasher: FixedSizeBinary(16) #1516
- Predicates on to_timestamp do not work as expected with "naive" timestamp strings #765
- Address performance/execution plan of TPCH query 19 #78
- Bug fix: expr_visitor was not visiting aggregate filter expressions #3548 (andygrove)
Documentation updates:
- Publish 8.0.0 user guide #2558
- MINOR: Add Dask SQL to list of projects powered by DataFusion #3581 (andygrove)
- Add Parseable as Datafusion user #3471 (nitisht)
Closed issues:
- Upgrade to Arrow 24.0.0 #3689
- what's the best practice to get a single value from arrow array? #3497
- The data type of predicate in the row filter should be same in the binary expr #3469
- Extend constant folding and parquet filtering support #188
- Add FORMAT to explain plan and an easy to visualize format #96
Merged pull requests:
- Build aggregate schema in Aggregate::try_new #3739 (andygrove)
- delete type coercion for scalar udf in the physical phase #3735 (liukun4515)
- Consolidate coercion code in
datafusion_expr::type_coercion
and submodules #3728 (alamb) - Skip filter push down on semi/anti joins #3723 (andygrove)
- Raise
Unsupported SQL type
forTime(WithTimeZone)
andTime(Tz)
#3718 [sql] (waitingkuo) - Support column aliases specified by
WITH
statements #3717 [sql] (isidentical) - Reject recursive CTEs before processing the sub-expressions #3714 [sql] (isidentical)
- Make column name consistent between Expr::name and Display/Debug #3712 [sql] (andygrove)
- Fix aggregate type coercion bug #3710 (alamb)
- MINOR: Add
Expr::canonical_name
and improve docs onExpr::name
#3706 (andygrove) - Remove type coercions from ScalarValue and aggregation function code #3705 (ozankabak)
unwrap_cast_in_comparison
: fix bug which can find the field for the schema #3699 (liukun4515)- bump sql-parser 0.25 #3698 [sql] (xudong963)
- Move optimizer init to optimizer crate #3692 (andygrove)
- Upgrade
arrow
parquet
andarrow-flight
to 24.0.0 #3691 [sql] (alamb) - Fix bug in dictionary coercion and allow better coercion #3688 (alamb)
- [MINOR] Improve docstrings in binary_rule.rs #3687 (alamb)
- [MINOR] Add
ScalarValue::new_utf8
, clean up creation of literals in casting tests #3680 (alamb) - Disable code coverage until we figure out why it is broken #3679 (alamb)
- move
type coercion
for case when expr #3676 (liukun4515) - Update sqlparser to 0.24.0 #3675 [sql] (alamb)
- Fail if field lengths are not same in INTERSECT and EXPECT #3674 (askoa)
- Simplification Rules for Modulo Operator #3669 (askoa)
- change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662 (liukun4515)
- restore optimization for
between
in simplify expression rule #3661 (liukun4515) - add timestamptz #3660 [sql] (waitingkuo)
- remove the type coercion in the simplify_expressions rule #3657 (liukun4515)
- Cache collected file statistics #3649 (mateuszkj)
- make regexp_replace early abort with empty input #3648 (isidentical)
- Check each query has same number of columns when building the UNION plan #3638 (HaoYang670)
- move the
type coercion
to the beginning of the optimizer rule and support type coercion for subquery #3636 (liukun4515) - Add documentation for querying S3 data with CLI #3631 (andygrove)
- Simplify multiplication by
0
and bynull
#3627 (HaoYang670) - Simplify null division. #3625 (HaoYang670)
- support cast/try_cast expr in reduceOuterJoin #3621 (AssHero)
- MINOR: fix TPC-H conversion function to not miss a row of data #3620 (kmitchener)
- Document ObjectStoreProvider #3619 (tustvold)
- [feat] Support using offset index in ParquetRecordBatchStream when pu… #3616 (Ted-Jiang)
- Optimize
regex_replace
for scalar patterns #3614 (isidentical) - Simplify
concat_ws(null, ..)
tonull
#3608 (HaoYang670) - MINOR: improve docstrings on SessionContext #3603 (alamb)
- Merge s3_success and s3_build_error tests into one test #3602 (Licht-T)
- add
register_batch
andread_batch
toSessionContext
to register a single RecordBatch as a table #3600 (BaymaxHWY) - [CI] Fix the newly added linting errors to make clippy happy #3598 (isidentical)
- Prevent over-allocations (and spills) on sorts with a fixed limit #3593 (isidentical)
- update datafusion cli deps #3588 (Jimexist)
- Update cranelift* dependencies
0.87
-->0.88
#3586 (alamb) - Fix docs.rs #3580 (avantgardnerio)
- Fix build #3576 (alamb)
- Use consistent name for
TimeUnit::Millisecond
#3575 (alamb) - Fix logical plan serialization #3574 (thinkharderdev)
- Custom window frame logic (support
ROWS
,RANGE
,PRECEDING
andFOLLOWING
for window functions) #3570 [sql] (metesynnada) - fix comparison of decimal array with null scalar #3567 (kmitchener)
- Reduce dependencies of
datafusion-sql
crate #3566 [sql] (mbrobbel) - Update pbjson-types requirement from 0.3 to 0.5 #3560 (dependabot[bot])
- Update pbjson requirement from 0.3 to 0.5 #3559 (dependabot[bot])
- Update pbjson-build requirement from 0.3 to 0.5 #3558 (dependabot[bot])
- MINOR: enable q19 in TPCH #3553 (kmitchener)
- MINOR: remove out-of-date is_dictionary checks from binary_rule.rs #3552 (kmitchener)
- Make ParquetScanOptions public and add method to get a reference from… #3551 (thinkharderdev)
- fix coercion of null for decimal math in binary_rules #3549 (kmitchener)
- Use
fetch
limit in get_sorted_iter #3545 (Dandandan) - feat: allow object store registration from datafusion-cli #3540 (turbo1912)
- Actually test that
ScalarValue
s are the same after round trip serialization #3537 (alamb) - Add serialization of
ScalarValue::Struct
#3536 (alamb) - Add serialization of
ScalarValue::IntervalMonthDayNano
#3535 (alamb) - Add serialization of
ScalarValue::Binary
andScalarValue::LargeBinary
,ScalarValue::Time64
#3534 (alamb) - MINOR: Impl
Debug
for TableReference and ResolvedTableReference #3533 [sql] (andygrove) - Add support for serializing
ScalarValue::Dictionary
to datafusion-proto #3532 (alamb) - Push down limit to sort #3530 (Dandandan)
- Execute sort in parallel when a limit is used after sort #3527 (Dandandan)
- Config support type conversion #3522 (comphead)
- MINOR: Add more execs to list of supported execs #3519 (andygrove)
- fix divide by zero not throwing proper error for decimal #3517 (kmitchener)
- Make FileStream and FileOpener public #3514 (thinkharderdev)
- feat: Union types coercion #3513 [sql] (gandronchik)
- [DataFrame] - Add cache function for DataFrame #3512 (francis-du)
- type coercion: support is/is_not_
bool
/like/unknown expr #3510 (liukun4515) - MINOR: remove unused dependencies #3508 (waynexia)
- Automate postrelease publishing to Homebrew #3507 (iajoiner)
- Add additional DATE_PART units #3503 (jonmmease)
- Add BitwiseXor in function from_proto_binary_op #3496 (askoa)
- Make the function from_proto_binary_op public #3490 (askoa)
- minor: fix bug in
downcast_value!
macro (T
-->$T
) #3486 (alamb) - add time_zone into ConfigOptions #3485 [sql] (waitingkuo)
- [MINOR] Change
downcast_value!
macro so it does not need to useuse std::any::type_name;
#3484 (alamb) - Convert more cross joins to inner joins (Address performance/execution plan of TPCH query 19) #3482 (DhamoPS)
- [minor] Remove unused arg in macro in Inlist #3474 (Ted-Jiang)
- inlist: move type coercion to logical phase #3472 (liukun4515)
- Use the column data type as the NULL data type in the row filter #3470 (liukun4515)
- apply type coercion before filter pushdown #3459 (liukun4515)
- add FixedSizeBinary support to create_hashes #3458 (mcassels)
- Support ShowVariable Statement #3455 [sql] (waitingkuo)
- Add additional pruning tests with casts, handle unsupported predicates better #3454 (alamb)
- Add
InList
support for timestamp type. (#3449) #3450 (Ted-Jiang) - Evaluate expressions after type coercion #3444 (Dandandan)
- remove type coercion in the binary physical expr #3396 (liukun4515)
- Use arrow row format in SortPreservingMerge ~50-70% faster #3386 (tustvold)
- Pushdown
RowFilter
inParquetExec
#3380 (thinkharderdev)