11.0.0 (2022-08-16)
Breaking changes:
Implemented enhancements:
- Make RowAccumulator public #3138
- docs: proposal for consolidating docs into a Contributor Guide #3127
- feat: support Timestamp +/- Interval #3103
- a
arrow_typeof
like posgresql'spg_typeof
#3095 - Add DataFrame section to user guide #3066
- Document all scalar SQL functions in user guide #3065
- Simplify implementation of approx_median so that it can be exposed in Python #3063
- Support double quoted literal strings for dialects(such as mysql,bigquery) #3055
- Simplify / speed up implementation of character_length to unicode points #3049
- Follow-up on Clickbench benchmark #3048
- Why the PhysicalPlanner is an async trait ? #3032
- Optimize file stream metrics. #3024
- Proposal: Enable typed strings expressions for VALUES clause #3017
- Proposal: Add
date_bin
function #3015 - The upcoming release of Arrow (20?) breaks datafusion #3006
- Can I select some files for query based on the filtering rules in the directory? #2993
- Rename FormatReader to FileOpener #2990
- Derive
Hash
trait forJoinType
#2971 - CAST from Utf8 to Boolean #2967
- Add baseline_metrics for FileStream to record metrics like elapsed time, record output, etc #2961
- Example to show how to convert query result into rust struct #2959
- simplify not clause #2957
- Implement Debug for ColumnarValue #2950
- Parallel fetching of column chunks when reading parquet files #2949
- Extension mechanism for
SessionConfig
#2939 - Streaming CSV/JSON Object Store Read #2935
- Support CSV Limit Pushdown to Object Storage #2930
- Add support for
pow
scalar function #2926 - Add support for exact
median
aggregate function #2925 - Support
mean
as synonym foravg
#2922 - Rename a column name #2919
- Move
ScalarValue
tests alongside implementation, movefrom_slice
tocore
#2913 - Fail gracefully if optimization rule fails #2908
- Make ObjectStoreRegistry as a trait which can allow Ballista to introduce a self registry ObjectStoreRegistry #2905
- Remove datafusion-data-access crate #2903
- Improve formatting of logical plans containing subquery expressions #2898
- Atan2 added to built-in functions #2897
- The explain statements only print logical plans for debug/other purpose. #2894
- JSON version of
display_indent()
#2889 - It would be nice to have a way to generate unique IDs in optimizer rules #2886
- Add support for
TIME
literal values #2883 - Add h2o benchmark #2879
- Implement
from_unixtime
function #2871 - Add
cast
function for creating logical cast expression #2870 - Release DataFusion 10.0.0 #2862
- Implement
information_schema.views
#2857 - Migrate from avro_rs to apache_avro #2783
- Add optimizer rule to remove
OFFSET 0
#2584 - Preserve Element Name in ScalarValue::List #2450
- Add EXISTS subquery support to Ballista #2338
- Add documentation on supported functions to datafusion website #1487
- documentations for datafusion-cli can be consolidated a bit more #1352
- Optimizer: Predicate Rewrite pass for TPCH Q19 #217
- feat: add optimize rule
rewrite_disjunctive_predicate
#2858 (xudong963)
Fixed bugs:
- Regression in SQL support for
ORDER BY
and aliased expressions #3160 - panic when deal with
@
operator #3137 - Incorrect type coercion rule for date + interval #3093
- Cast string to timestamp crash while we input time before 1970 with floating number second #3082
- INTEGER type does't work while importing csv #3059
- Cannot GROUP BY Binary #3050
- incorrect i32 coercion for
to_timestamp
#3046 - Error pruning
IsNull
expressions: Column 'instance_null_count' is declared as non-nullable but contains null values #3042 - I want to query some files in a directory. Is there any way? #3013
- The expression to get an indexed field is only valid for
List
types (common_sub_expression_eliminate
) #3002 - Double to_timestamp_seconds produces abnormal result #2998
- External parquet table fails when schema contains differing key / value metadata #2982
- SELECT on column with uppercase column name fails with FieldNotFound error #2978
- panic reading AWS-generated parquet file #2963
- Can't filter rowgroup for parquet prune for some data type #2962
- CI test is failing with
final link failed: No space left on device
#2947 - bug: new ObjectStore breaks backward compatibility with contrib plugins #2931
- bug: file types handled wrong #2929
- bug: changing the number of partitions does not increase concurrency #2928
- csv_explain fails on RC verifier #2916
- index out of range error from datafusion_row::write::write_field #2910
- Optimization rule
CommonSubexprEliminate
creates invalid projections #2907 - serde_json requires that either
std
(default) oralloc
feature is enabled #2896 - Inconsistent type coercion rules with comparison expressions #2890
- Doc Error: the test directory link 404 which is in CONTRIBUTING.md #2880
- Round trips through
ScalarValue
's sometimes don't preserve types (e.g. change types fromDictionaryArray
) #2874 - Error with CASE and DictionaryArrays:
ArrowError(InvalidArgumentError("arguments need to have the same data type"))
#2873 - window functions not supported in expressions #2869
- Unable to work with month intervals #2796
- Discord invite link in communication page has expired #2743
- Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719
- Reading parquet with (pre-release) arrow fails with "out of order projection is not supported" #2543
- Fix SQL planner bug when resolving columns with same name as a relation #3003 [sql] (andygrove)
- fix
RowWriter
index out of bounds error #2968 (comphead) - fix: support decimal statistic for row group prune #2966 (liukun4515)
- Fix invalid projection in
CommonSubexprEliminate
#2915 (andygrove)
Documentation updates:
- MINOR: Fix broken links in contrib guide #3135 (andygrove)
- MINOR: User Guide: Move expressions to top-level page #3134 (andygrove)
- User Guide: Combine CLI pages #3133 (andygrove)
- User Guide: Add documentation for JOIN syntax #3130 (andygrove)
- separate contributors guide #3128 (kmitchener)
- minor: remove python docs, now they're in another project #3119 (kmitchener)
- minor: doc fixes: fix link to datafusion-python project and add link to slides for rece… #3118 (kmitchener)
- Add all scalar SQL functions to user guide #3090 (andygrove)
- Add DataFrame reference to the user guide #3067 (andygrove)
- MINOR: Add CeresDB to list of products using DataFusion #3060 (andygrove)
- Minor: improve some docstrings about pruning #3041 (alamb)
- doc: add a new video link about datafusion #3025 (xudong963)
- Update README.md to add CnosDB into the Known Uses #2933 (cnoshb)
Performance improvements:
Closed issues:
- Rename
do_data_time_math()
todo_date_time_math()
#3172 - Automatic version updates for github actions with dependabot #3106
- [EPIC] Proposal for Date/Time enhancement #3100
- Upgrade prost/tonic everywhere #3028
- [Question] interested in helping with documentation #2866
- Introducing a new optimizer framework for datafusion. #2633
- Enable discussion tab? #2350
- Add support for AVG(Timestamp) types #200
- TPC-H Query 22 #175
- TPC-H Query 21 #172
- TPC-H Query 20 #171
- TPC-H Query 17 #168
- TPC-H Query 11 #163
- TPC-H Query 4 #160
- TPC-H Query 2 #159
- [Datafusion] Optimize literal expression evaluation #106
Merged pull requests:
- Rename do_data_time_math() to do_date_time_math() #3173 (JasonLi-cn)
- [Minor] Remove some redundant code #3169 (alamb)
- Support
INTEGER
again in addition toINT
inCREATE TABLE
andCAST
statements #3167 [sql] (alamb) - Fix regression in SQL parser related to resolution of aliased expressions #3165 [sql] (andygrove)
- update cargo lock #3164 (waitingkuo)
- add test case for cast_timestamp_before_1970 #3163 (waitingkuo)
- Return proper error message for ill formed variable reference #3162 (alamb)
- Remove outdated license text left over from arrow repo #3154 (alamb)
- Expose RowAccumulator in physical_plan #3151 (iajoiner)
- Rename
DateIntervalExpr
toDateTimeIntervalExpr
#3150 (alamb) - Bump actions/labeler from 4.0.0 to 4.0.1 #3144 (dependabot[bot])
- User Guide: Add documentation for subquery syntax #3132 (andygrove)
- MINOR: User Guide: Move Data Types and Information Schema to their own pages #3131 (andygrove)
- Minor: Clean up
array
test #3121 (alamb) - add arrow_typeof #3120 (waitingkuo)
- Bump actions/labeler from 2.2.0 to 4.0.0 #3114 (dependabot[bot])
- Bump actions/checkout from 2 to 3 #3113 (dependabot[bot])
- Bump actions/setup-node from 2 to 3 #3112 (dependabot[bot])
- Bump actions/setup-python from 3 to 4 #3111 (dependabot[bot])
- Feature/support timestamp plus minus interval #3110 (JasonLi-cn)
- docs: fix typo #3109 (dzvon)
- Remove offset if its zero #3102 (turbo1912)
- Hash binary values #3098 [sql] (Dandandan)
- Update to object_store 0.4 #3089 (tustvold)
- Add cast function for creating cast expression #3084 (turbo1912)
- Upgrade to arrow 20.0.0 (but no change to object_store), including
prost
, andtonic
#3083 [sql] (avantgardnerio) - impl Debug for ColumnarValue, add some docs #3076 (alamb)
- [Minor] run cargo update in datafusion-cli directory #3075 (alamb)
- update cargo.lock in
datafusion-cli
#3074 (waitingkuo) - Update sql parser to v0.20.0 #3072 [sql] (waitingkuo)
- Add opening, scanning, processing metrics in file stream #3070 (Ted-Jiang)
- Simplify
approx_median
implementation, expose viaDataFrame
API #3064 [sql] (andygrove) - docs: fix PruningStatistics example and some typos #3062 (roeap)
- feat: support double quoted literal strings for dialects(such as mysql,bigquery,spark) #3056 [sql] (Rachelint)
- Allow Overriding AsyncFileReader used by ParquetExec #3051 (Cheappie)
- to_timestamp i32 coerced to i64 #3047 (waitingkuo)
- Fix
IsNull
pruning expression generation without null_count statistics #3044 (alamb) - feat: Support
week
,decade
,century
for Interval literal #3038 [sql] (ovr) - feat: Support Binary bitwise shift operators (<< and >>) #3037 [sql] (ovr)
- Use concat_elements_utf8 from arrow rather than custom kernel #3036 (alamb)
- minor: update minimal rust version to 1.62, matching arrow-rs #3035 [sql] (kmitchener)
- feat: Add
date_bin
built-in function #3034 (stuartcarnie) - Split
binary_expr.rs
into smaller modules #3026 (alamb) - feat: Enable typed strings expressions for VALUES clause #3018 [sql] (stuartcarnie)
- fix typo for PR3003 #3011 (waitingkuo)
- feat: Add support for TIME literal values #3010 [sql] (stuartcarnie)
- add TimeUnit::Second as signature for ToTimestampSeconds #3004 (waitingkuo)
- Rename FileReader to FileOpener (#2990) #2991 (tustvold)
- minor: collation the prune test #2986 (liukun4515)
- Optionally skip metadata from schema when merging parquet files #2985 (alamb)
- [Minor] Extract interval parsing logic, add unit tests #2984 [sql] (alamb)
- Update sqlparser to 0.19 #2981 [sql] (alamb)
- test: add file/SQL level test for pruning parquet row group with decimal data type. #2977 (liukun4515)
- Derive Hash for JoinType #2972 (liurenjie1024)
- Example that shows how to convert query result into rust struct #2959 #2969 (thomas-k-cameron)
- Add baseline_metrics for FileStream to record metrics like elapsed ti… #2965 (Ted-Jiang)
- test: add test for decimal and pruning for decimal column #2960 (liukun4515)
- Simplify expressions with
NOT
clause #2958 (AssHero) - chore: update jit-related dependencies #2956 (xudong963)
- Update to arrow
19.0.0
#2955 [sql] (alamb) - Remove CI Caching to preserve diskspace #2948 (alamb)
- Add metadata_size_hint for optimistic fetching of parquet metadata #2946 (thinkharderdev)
- Minor: Remove left over debugging statement #2944 (alamb)
- add Atan2 #2942 (waitingkuo)
- Use
Arc<ObjectStoreRegistry>
and remove ObjectStoreRegistry::clone #2941 (tustvold) - add extension system to
SessionConfig
#2940 (crepererum) - Update prost-build requirement from 0.7 to 0.10 #2937 (dependabot[bot])
- Add streaming JSON and CSV reading, `NewlineDelimitedStream' (#2935) #2936 (tustvold)
- feat(catalog): Implement information_schema.views #2934 [sql] (BaymaxHWY)
- Support
window
functions in expressions by re-write projection after building window plan #2932 [sql] (AssHero) - Add pow as synonym for power #2927 (andygrove)
- Add
from_unixtime
function #2924 (waitingkuo) - fix(aggregate): support mean as synonym avg #2923 (BaymaxHWY)
- Add
DataFrame::with_column_renamed
#2920 (andygrove) - Run clippy with optional features #2918 (tustvold)
- Fix release verification script by not overriding
ARROW_TEST_DATA
orPARQUET_TEST_DATA
#2917 (alamb) - Move
ScalarValue
tests alongside implementation, movefrom_slice
todatafusion_core
#2914 (alamb) - Optimizer should have option to skip failing rules #2909 (andygrove)
- Introduce ObjectStoreProvider to create an object store based on the url #2906 (yahoNanJing)
- Remove datafusion-data-access crate #2904 (yahoNanJing)
- Combine all comparison coercion rules #2901 (andygrove)
- Add
Projection::try_new
andProjection::try_new_with_schema
#2900 (andygrove) - Improve formatting of logical plans containing subqueries #2899 [sql] (andygrove)
- add session option 'datafusion.explain.logical_plan'. when set to true, the explain statement will only print logical plans. #2895 (AssHero)
- Preserve field name in
ScalarValue::List
#2893 [sql] (comphead) - Adds optional serde support to datafusion-proto #2892 (tustvold)
- Implement
ScalarValue::Dictionary
and preserve type through conversion back/forth to Array #2891 (alamb) - Add an ID generator in preparation for PR 2885 #2887 (avantgardnerio)
- Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885 (avantgardnerio)
- fix(doc): update test directory link in CONTRIBUTING.md #2882 (BaymaxHWY)
- Add h2o bench groupby queries #2881 (andygrove)
- Add support for month & year intervals #2797 (avantgardnerio)
- Migrate from avro_rs (0.13) to apache_avro (0.14) #2784 (martin-g)