Skip to content

Latest commit

 

History

History
233 lines (217 loc) · 29.5 KB

13.0.0.md

File metadata and controls

233 lines (217 loc) · 29.5 KB

13.0.0 (2022-10-06)

Full Changelog

Breaking changes:

  • Make ObjectStoreProvider fallible (return Result rather than Option) #3584 (tustvold)
  • Make OptimizerConfig a builder style API #3525 (alamb)

Implemented enhancements:

  • remove type coercion for ScalarUDF in the physical phase #3734
  • Allow with statements to specify their columns alongside their expression names #3716
  • Support SQLDataType::Timestamp(TimezoneInfo) #3693
  • support type coercion for case when expr #3673
  • Add simplification rules for the Modulo operator #3664
  • Add TIMESTAMPTZ #3659
  • Simplify A * 0 and A * null. #3626
  • change rule of PreCastLitInComparisonExpressions to unwrap cast rule after #3582 #3622
  • Optimize regex_replace with a known pattern / replacement #3613
  • Simplify CONCAT_WS(NULL, ..) to NULL #3607
  • Add OctoSQL to list of systems powered by DataFusion #3605
  • Prevent over-allocation (and spills) on TopK queries #3596
  • Allow ObjectStoreProvider to return None (return Result<Option> rather than Result) #3594
  • simplify between expr should consider the data type #3587
  • make type coercion simple and remove the evaluate logic #3585
  • ReduceOuterJoin optimizer support cast or try_cast expr. #3565
  • Support type coercion for subquery #3557
  • Make ParquetScanOptions public and expose a reference to the scan options from ParquetExec #3550
  • Use fetch limit in get_sorted_iter #3544
  • Push limit to sort #3528
  • Execute sorts in parallel when limit is used after sort #3526
  • Consolidate optimizer passes in optimizer module for better testing #3524
  • Support Top-K query optimization for `ORDER BY <EXPR> [ASC #3515
  • support the type coercion for like unlike istrue isfalse isunknown #3509
  • Automate the pushing of releases to Homebrew #3506
  • Add extra DATE_PART units that are already supported in arrow-rs #3502
  • Release datafusion-cli 12.0.0 on Homebrew #3501
  • Make from_proto_binary_op public #3489
  • coercion between decimal and other types lacking, compared to other numeric types #3479
  • move type coercion for inlist from physical phase to logical phase #3468
  • Make datafusion::physical_plan::file_format::file_strean::FileStream public #3466
  • Support using offset index in ParquetRecordBatchStream when pushing down RowFilter #3456
  • Support timestamp data type in In_list node #3449
  • Evaluate expressions after type coercion #3431
  • Make a convenience function to register a single RecordBatch as a table from SessionContext #3426
  • add datafusion-cli support of external table locations that object_store supports #3424
  • pruning support cast/try_cast expr #3414
  • Add documentation on querying against files in object store such as S3 #3399
  • Remove type-coercion from physical planner #3388
  • support Statement::ShowVariable to show session configs #3364
  • Support RowFilter in ParquetExec #3360
  • Apply TypeCoercion rule before FilterPushDown #3289
  • Add support for get / show timezone #3255
  • Consider adding DataFusion to ClickBench benchmarks #2902
  • filter_push_down panics on semi/anti join with join filters #2888
  • Migrate the cross join -> inner join optimization from the planner to the optimizer #2859
  • ObjectStore write support #2185
  • DataFusion should scan Parquet statistics once per query #871
  • Extend & generalize constant folding / evaluation in logical optimizer #237

Fixed bugs:

  • projection_push_down produces invalid aggregate plans in some cases #3738
  • Time With Time Zone should raise error until DataType::Time64 support tz #3715
  • SQL Planner doesn't distinguish normal CTEs from the recursive ones. #3713
  • Fix inconsistency between column name formats #3711
  • Optimizer rule 'projection_push_down' failed due to unexpected error: Error during planning: Aggregate schema has wrong number of fields. Expected 3 got 8 #3704
  • Optimizer regressions in unwrap_cast_in_comparison #3690
  • Internal error when evaluating a predicate = "The type of Dictionary(Int16, Utf8) = Int64 of binary physical should be same" #3685
  • Specialized regexp_replace should early-abort when the the input arrays are empty #3647
  • Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3646
  • Internal error: Failed to coerce types Decimal128(10, 2) and Boolean in BETWEEN expression #3645
  • Type coercion error: The type of Boolean AND Decimal128(10, 2) of binary physical should be same #3644
  • LEFT JOIN not working as expected, error message is confusing #3639
  • INTERSECT and EXCEPT don't return an error when 2 sets have the different number of columns #3632
  • The datafusion-cli panics when union 2 table with different number of columns. #3630
  • The expression col(a) / null is not optimized. #3624
  • s3_build_error test may fail in some environments #3601
  • New clippy errors appears to be break the CI on the master #3597
  • StringConcat gives inconsistent result with concat when containing null #3569
  • simplify_expressions don't support different data type for binary #3556
  • Broken logical plan serialization for aggregation queries #3555
  • Aggregate filters do not get pushed down to table scan #3546
  • docs.rs cannot build datafusion-proto crate #3538
  • DataFusion serialization doesn't handle ScalarValue::Dictionary, Binary, LargeBinary, Time64, IntervalMonthDayNano, Struct #3531
  • What should be returned when trying to get a config in invalid format? #3505
  • Dividing decimal type gives wrong error: "170141183460469231731687303715884105727 is too large to store in a Decimal128 #3498
  • Add BitwiseXor in function from_proto_binary_op #3495
  • comparison operations with a scalar null and decimal array panics #3487
  • Union columns with different types #3467
  • Can't get the right logical plan after optimizer #3421
  • Fix conflict between simplify_expression rule and CAST expressions #3409
  • Empty array giving error #2439
  • Internal error: Unsupported data type in hasher: FixedSizeBinary(16) #1516
  • Predicates on to_timestamp do not work as expected with "naive" timestamp strings #765
  • Address performance/execution plan of TPCH query 19 #78
  • Bug fix: expr_visitor was not visiting aggregate filter expressions #3548 (andygrove)

Documentation updates:

  • Publish 8.0.0 user guide #2558
  • MINOR: Add Dask SQL to list of projects powered by DataFusion #3581 (andygrove)
  • Add Parseable as Datafusion user #3471 (nitisht)

Closed issues:

  • Upgrade to Arrow 24.0.0 #3689
  • what's the best practice to get a single value from arrow array? #3497
  • The data type of predicate in the row filter should be same in the binary expr #3469
  • Extend constant folding and parquet filtering support #188
  • Add FORMAT to explain plan and an easy to visualize format #96

Merged pull requests: