Skip to content

Commit

Permalink
Upgrade DataFusion to latest, to include fixes for aggregation (#216)
Browse files Browse the repository at this point in the history
* Cleanup logical optimizer rules.  (apache#7919)

* Initial commit

* Address todos

* Update comments

* Simplifications

* Minor simplifications

* Address reviews

* Add TableScan constructor

* Minor changes

* make try_new_with_schema method of Aggregate private

* Use projection try_new instead of try_new_schema

* Simplifications, add comment

* Review changes

* Improve comments

* Move get_wider_type to type_coercion module

* Clean up type coercion file

---------

Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Parallelize Serialization of Columns within Parquet RowGroups (apache#7655)

* merge main

* fixes and cmt

* review comments, tuning parameters, updating docs

* cargo fmt

* reduce default buffer size to 2 and update docs

* feat: Use bloom filter when reading parquet to skip row groups  (apache#7821)

* feat: implement read bloom filter support

* test: add unit test for read bloom filter

* Simplify bloom filter application

* test: add unit test for bloom filter with sql `in`

* fix: imrpove bloom filter match express

* fix: add more test for bloom filter

* ci: rollback dependences

* ci: merge main branch

* fix: unit tests for bloom filter

* ci: cargo clippy

* ci: cargo clippy

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: don't push down volatile predicates in projection (apache#7909)

* fix: don't push down volatile predicates in projection

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* add suggestions

* fix

* fix doc

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Add `parquet` feature flag, enabled by default, and make parquet conditional  (apache#7745)

* Make parquet an option by adding multiple cfg attributes without significant code changes.

* Extract parquet logic into submodule from execution::context

* Extract parquet logic into submodule from datafusion_core::dataframe

* Extract more logic into submodule from execution::context

* Move tests from execution::context

* Rename submodules

* [MINOR]: Simplify enforce_distribution, minor changes (apache#7924)

* Initial commit

* Simplifications

* Cleanup imports

* Review

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Add simple window query to sqllogictest (apache#7928)

* ci: upgrade node to version 20 (apache#7918)

* Change input for `to_timestamp` function to be seconds rather than nanoseconds, add `to_timestamp_nanos` (apache#7844)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Minor: Document `parquet` crate feature (apache#7927)

* Minor: reduce some #cfg(feature = "parquet") (apache#7929)

* Minor: reduce use of cfg(parquet) in tests (apache#7930)

* Fix CI failures on `to_timestamp()` calls (apache#7941)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Fix CI `to_timestamp()` failed

* Update datafusion/expr/src/built_in_function.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix typo

* fix

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* minor: add a datatype casting for the updated value (apache#7922)

* minor: cast the updated value to the data type of target column

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* fix tests

---------

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* fix (apache#7946)

* Add simple exclude all columns test to sqllogictest (apache#7945)

* Add simple exclude all columns test to sqllogictest

* Add more exclude test cases

* Support Partitioning Data by Dictionary Encoded String Array Types (apache#7896)

* support dictionary encoded string columns for partition cols

* remove debug prints

* cargo fmt

* generic dictionary cast and dict encoded test

* updates from review

* force retry checks

* try checks again

* Minor: Remove array() in array_expression (apache#7961)

* remove array

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup others

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* clippy

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup cast

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fmt

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup cast

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: simplify update code (apache#7943)

* Add some initial content about creating logical plans (apache#7952)

* Minor: Change from `&mut SessionContext` to `&SessionContext` in substrait (apache#7965)

* Lower &mut SessionContext in substrait

* rm mut ctx in tests

* Fix crate READMEs (apache#7964)

* Minor: Improve `HashJoinExec` documentation (apache#7953)

* Minor: Improve `HashJoinExec` documentation

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* chore: clean useless clone baesd on clippy (apache#7973)

* Add README.md to `core`, `execution` and `physical-plan` crates (apache#7970)

* Add README.md to `core`, `execution` and `physical-plan` crates

* prettier

* Update datafusion/physical-plan/README.md

* Update datafusion/wasmtest/README.md

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>

* Move source repartitioning into `ExecutionPlan::repartition` (apache#7936)

* Move source repartitioning into ExecutionPlan::repartition

* cleanup

* update test

* update test

* refine docs

* fix merge

* minor: fix broken links in README.md (apache#7986)

* minor: fix broken links in README.md

* fix proto link

* Minor: Upate the `sqllogictest` crate README (apache#7971)

* Minor: Upate the sqllogictest crate README

* prettier

* Apply suggestions from code review

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>

---------

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>

* Improve MemoryCatalogProvider default impl block placement (apache#7975)

* Fix `ScalarValue` handling of NULL values for ListArray (apache#7969)

* Fix try_from_array data type for NULL value in ListArray

* Fix

* Explicitly assert the datatype

* For review

* Refactor of Ordering and Prunability Traversals and States (apache#7985)

* simplify ExprOrdering

* Comment improvements

* Move map/transform comment up

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Keep output as scalar for scalar function if all inputs are scalar (apache#7967)

* Keep output as scalar for scalar function if all inputs are scalar

* Add end-to-end tests

* Fix crate READMEs for core, execution, physical-plan (apache#7990)

* Update sqlparser requirement from 0.38.0 to 0.39.0 (apache#7983)

* chore: Update sqlparser requirement from 0.38.0 to 0.39.0

* support FILTER Aggregates

* Fix panic in multiple distinct aggregates by fixing `ScalarValue::new_list` (apache#7989)

* Fix panic in multiple distinct aggregates by fixing ScalarValue::new_list

* Update datafusion/common/src/scalar.rs

Co-authored-by: Daniël Heres <danielheres@gmail.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>

* MemoryReservation exposes MemoryConsumer (apache#8000)

... as a getter method.

* fix: generate logical plan for `UPDATE SET FROM` statement (apache#7984)

* Create temporary files for reading or writing (apache#8005)

* Create temporary files for reading or writing

* nit

* addr comment

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>

* doc: minor fix to SortExec::with_fetch comment (apache#8011)

* Fix: dataframe_subquery example Optimizer rule `common_sub_expression_eliminate` failed (apache#8016)

* Fix: Optimizer rule 'common_sub_expression_eliminate' failed

* nit

* nit

* nit

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>

* Percent Decode URL Paths (apache#8009) (apache#8012)

* Treat ListingTableUrl as URL-encoded (apache#8009)

* Update lockfile

* Review feedback

* Minor: Extract common deps into workspace (apache#7982)

* Improve datafusion-*

* More common crates

* Extract async-trait

* Extract more

* Fix cli

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* minor: change some plan_err to exec_err (apache#7996)

* minor: change some plan_err to exec_err

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change unreachable code to internal error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Minor: error on unsupported RESPECT NULLs syntax (apache#7998)

* Minor: error on unsupported RESPECT NULLs syntax

* fix clippy

* Update datafusion/sql/tests/sql_integration.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* GroupedHashAggregateStream breaks spill batch (apache#8004)

... into smaller chunks to decrease memory required for merging.

* Minor: Add implementation examples to ExecutionPlan::execute (apache#8013)

* Add implementation examples to ExecutionPlan::execute

* Review feedback

* address comment (apache#7993)

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* GroupedHashAggregateStream should register spillable consumer (apache#8002)

* fix: single_distinct_aggretation_to_group_by fail (apache#7997)

* fix: single_distinct_aggretation_to_group_by faile

* fix

* move test to groupby.slt

* Read only enough bytes to infer Arrow IPC file schema via stream (apache#7962)

* Read only enough bytes to infer Arrow IPC file schema via stream

* Error checking for collect bytes func

* Update datafusion/core/src/datasource/file_format/arrow.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Minor: remove a strange char (apache#8030)

* Minor: Improve documentation for Filter Pushdown (apache#8023)

* Minor: Improve documentation for Fulter Pushdown

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: jakevin <jakevingoo@gmail.com>

* Apply suggestions from code review

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

---------

Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Minor: Improve `ExecutionPlan` documentation (apache#8019)

* Minor: Improve `ExecutionPlan` documentation

* Add link to Partitioning

* fix: clippy warnings from nightly rust 1.75 (apache#8025)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Minor: Avoid recomputing compute_array_ndims in align_array_dimensions (apache#7963)

* Refactor align_array_dimensions

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* address comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* remove unwrap

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* address comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fix rebase

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: fix doc check (apache#8037)

* Minor: remove uncessary #cfg test (apache#8036)

* Minor: remove uncessary #cfg test

* fmt

* Update datafusion/core/src/datasource/file_format/arrow.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* Minor: Improve documentation for  `PartitionStream` and `StreamingTableExec` (apache#8035)

* Minor: Improve documentation for  `PartitionStream` and `StreamingTableExec`

* fmt

* fmt

* Combine Equivalence and Ordering equivalence to simplify state (apache#8006)

* combine equivalence and ordering equivalence

* Remove EquivalenceProperties struct

* Minor changes

* all tests pass

* Refactor oeq

* Simplifications

* Resolve linter errors

* Minor changes

* Minor changes

* Add new tests

* Simplifications window mode selection

* Simplifications

* Use set_satisfy api

* Use utils for aggregate

* Minor changes

* Minor changes

* Minor changes

* All tests pass

* Simplifications

* Simplifications

* Minor changes

* Simplifications

* All tests pass, fix bug

* Remove unnecessary code

* Simplifications

* Minor changes

* Simplifications

* Move oeq join to methods

* Simplifications

* Remove redundant code

* Minor changes

* Minor changes

* Simplifications

* Simplifications

* Simplifications

* Move window to util from method, simplifications

* Simplifications

* Propagate meet in the union

* Simplifications

* Minor changes, rename

* Address berkay reviews

* Simplifications

* Add new buggy test

* Add data test for sort requirement

* Add experimental check

* Add random test

* Minor changes

* Random test gives error

* Fix missing test case

* Minor changes

* Minor changes

* Simplifications

* Minor changes

* Add new test case

* Minor changes

* Address reviews

* Minor changes

* Increase coverage of random tests

* Remove redundant code

* Simplifications

* Simplifications

* Refactor on tests

* Solving clippy errors

* prune_lex improvements

* Fix failing tests

* Update get_finer and get_meet

* Fix window lex ordering implementation

* Buggy state

* Do not use output ordering in the aggregate

* Add union test

* Update comment

* Fix bug, when batch_size is small

* Review Part 1

* Review Part 2

* Change union meet implementation

* Update comments

* Remove redundant check

* Simplify project out_expr function

* Remove Option<Vec<_>> API.

* Do not use project_out_expr

* Simplifications

* Review Part 3

* Review Part 4

* Review Part 5

* Review Part 6

* Review Part 7

* Review Part 8

* Update comments

* Add new unit tests, simplifications

* Resolve linter errors

* Simplify test codes

* Review Part 9

* Add unit tests for remove_redundant entries

* Simplifications

* Review Part 10

* Fix test

* Add new test case, fix implementation

* Review Part 11

* Review Part 12

* Update comments

* Review Part 13

* Review Part 14

* Review Part 15

* Review Part 16

* Review Part 17

* Review Part 18

* Review Part 19

* Review Part 20

* Review Part 21

* Review Part 22

* Review Part 23

* Review Part 24

* Do not construct idx and sort_expr unnecessarily, Update comments, Union meet single entry

* Review Part 25

* Review Part 26

* Name Changes, comment updates

* Review Part 27

* Add issue links

* Address reviews

* Fix failing test

* Update comments

* SortPreservingMerge, SortPreservingRepartition only preserves given expression ordering among input ordering equivalences

---------

Co-authored-by: metesynnada <100111937+metesynnada@users.noreply.github.com>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Encapsulate `ProjectionMapping` as a struct (apache#8033)

* Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, ... (apache#8040)

* Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, etc

* prettier

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: comphead <comphead@users.noreply.github.com>

---------

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Improve comments for `PartitionSearchMode` struct (apache#8047)

* Improve comments

* Make comments partition/group agnostic

* General approach for Array replace (apache#8050)

* checkpoint

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* optimize non-list

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* replace list ver

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* rename

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: Remove the irrelevant note from the Expression API doc (apache#8053)

* Minor: Add more documentation about Partitioning (apache#8022)

* Minor: Add more documentation about Partitioning

* fix typo

* Apply suggestions from code review

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Add more diagrams, improve text

* undo unintended changes

* undo unintended changes

* fix links

* Try and clarify

---------

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Minor: improve documentation for IsNotNull, DISTINCT, etc (apache#8052)

* Minor: improve documentation for IsNotNull, DISTINCT, etc

* fix

* Prepare 33.0.0 Release (apache#8057)

* changelog

* update version

* update changelog

* Minor: improve error message by adding types to message (apache#8065)

* Minor: improve error message

* add test

* Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument() (apache#8059)

* deprecate BuiltinScalarFunction::supports_zero_argument()

* unify old supports_zero_argument() impl

* Add example to ci (apache#8060)

* feat: add example to ci

* nit

* addr comments

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>

* Update substrait requirement from 0.18.0 to 0.19.0 (apache#8076)

Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: substrait
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix incorrect results in COUNT(*) queries with LIMIT (apache#8049)

Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>

* feat: Support determining extensions from names like `foo.parquet.snappy` as well as `foo.parquet` (apache#7972)

* feat: read files based on the file extention

* fix: some the file extension might be started with . and some not

* fix: rename extention to extension

* chore: use exec_err

* chore: rename extention to extension

* chore: rename extention to extension

* chore: simplify the code

* fix: check table is empty

* ci: fix test

* fix: add err info

* refactor: extract the logic to infer_types

* fix: add tests for different extensions

* fix: ci clippy

* fix: add more tests

* fix: simplify the logic

* fix: ci

* Use FairSpillPool for TaskContext with spillable config (apache#8072)

* Minor: Improve HashJoinStream docstrings (apache#8070)

* Minor: Improve HashJoinStream docstrings

* fix comments

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <comphead@users.noreply.github.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: comphead <comphead@users.noreply.github.com>

* Fixing broken link (apache#8085)

* Fixing broken link

* Update docs/source/contributor-guide/index.md

Thanks for spotting this as well

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* fix: DataFusion suggests invalid functions (apache#8083)

* fix: DataFusion suggests invalid functions

* update test

* Add test for BuiltInWindowFunction

* Replace macro with function for  `array_repeat` (apache#8071)

* General array repeat

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* done

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* remove test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fm

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: remove unnecessary projection in `single_distinct_to_group_by` rule (apache#8061)

* Minor: remove unnecessary projection

* fix ci

* minor: Remove duplicate version numbers for arrow, object_store, and parquet dependencies (apache#8095)

* remove duplicate version numbers for arrow, object_store, and parquet dependencies

* cargo update

* use default features in parquet crate

* disable default parquet features in wasmtest

* fix: add match encode/decode  scalar function type (apache#8089)

* feat: Protobuf serde for Json file sink (apache#8062)

* Protobuf serde for Json file sink

* Fix tests

* Fix test

* Minor: use `Expr::alias` in a few places to make the code more concise (apache#8097)

* Minor: Cleanup BuiltinScalarFunction::return_type() (apache#8088)

* Expose metrics from FileSinkExec impl of ExecutionPlan

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>
Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Co-authored-by: Devin D'Angelo <devinjdangelo@gmail.com>
Co-authored-by: Hengfei Yang <hengfei.yang@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Huaijin <haohuaijin@gmail.com>
Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: Chih Wang <ongchi@users.noreply.github.com>
Co-authored-by: Jeffrey <22608443+Jefffrey@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: comphead <comphead@users.noreply.github.com>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: yi wang <48236141+my-vegetable-has-exploded@users.noreply.github.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: 张林伟 <lewiszlw520@gmail.com>
Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
Co-authored-by: Marko Milenković <milenkovicm@users.noreply.github.com>
Co-authored-by: jokercurry <982458633@qq.com>
Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: metesynnada <100111937+metesynnada@users.noreply.github.com>
Co-authored-by: Yongting You <2010youy01@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Sirek <mwsirek@gmail.com>
Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Edmondo Porcu <edmondo.porcu@gmail.com>
Co-authored-by: Syleechan <38198463+Syleechan@users.noreply.github.com>
Co-authored-by: Dan Harris <dan@coralogix.com>
  • Loading branch information
Show file tree
Hide file tree
Showing 217 changed files with 12,721 additions and 7,719 deletions.
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ If there are user-facing changes then we may require documentation to be updated

<!--
If there are any breaking changes to public APIs, please add the `api change` label.
-->
-->
2 changes: 1 addition & 1 deletion .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "14"
node-version: "20"
- name: Prettier check
run: |
# if you encounter error, rerun the command below and commit the changes
Expand Down
16 changes: 2 additions & 14 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,19 +139,7 @@ jobs:
# test datafusion-sql examples
cargo run --example sql
# test datafusion-examples
cargo run --example avro_sql --features=datafusion/avro
cargo run --example csv_sql
cargo run --example custom_datasource
cargo run --example dataframe
cargo run --example dataframe_in_memory
cargo run --example deserialize_to_struct
cargo run --example expr_api
cargo run --example parquet_sql
cargo run --example parquet_sql_multiple_files
cargo run --example memtable
cargo run --example rewrite_expr
cargo run --example simple_udf
cargo run --example simple_udaf
ci/scripts/rust_example.sh
- name: Verify Working Directory Clean
run: git diff --exit-code

Expand Down Expand Up @@ -527,7 +515,7 @@ jobs:
rust-version: stable
- uses: actions/setup-node@v4
with:
node-version: "14"
node-version: "20"
- name: Check if configs.md has been modified
run: |
# If you encounter an error, run './dev/update_config_docs.sh' and commit
Expand Down
41 changes: 38 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ members = [
"datafusion/substrait",
"datafusion/wasmtest",
"datafusion-examples",
"docs",
"test-utils",
"benchmarks",
]
Expand All @@ -45,17 +46,50 @@ license = "Apache-2.0"
readme = "README.md"
repository = "https://github.com/apache/arrow-datafusion"
rust-version = "1.70"
version = "32.0.0"
version = "33.0.0"

[workspace.dependencies]
arrow = { version = "48.0.0", features = ["prettyprint"] }
arrow-array = { version = "48.0.0", default-features = false, features = ["chrono-tz"] }
arrow-buffer = { version = "48.0.0", default-features = false }
arrow-flight = { version = "48.0.0", features = ["flight-sql-experimental"] }
arrow-ord = { version = "48.0.0", default-features = false }
arrow-schema = { version = "48.0.0", default-features = false }
parquet = { version = "48.0.0", features = ["arrow", "async", "object_store"] }
sqlparser = { version = "0.38.0", features = ["visitor"] }
async-trait = "0.1.73"
bigdecimal = "0.4.1"
bytes = "1.4"
ctor = "0.2.0"
datafusion = { path = "datafusion/core" }
datafusion-common = { path = "datafusion/common" }
datafusion-expr = { path = "datafusion/expr" }
datafusion-sql = { path = "datafusion/sql" }
datafusion-optimizer = { path = "datafusion/optimizer" }
datafusion-physical-expr = { path = "datafusion/physical-expr" }
datafusion-physical-plan = { path = "datafusion/physical-plan" }
datafusion-execution = { path = "datafusion/execution" }
datafusion-proto = { path = "datafusion/proto" }
datafusion-sqllogictest = { path = "datafusion/sqllogictest" }
datafusion-substrait = { path = "datafusion/substrait" }
dashmap = "5.4.0"
doc-comment = "0.3"
env_logger = "0.10"
futures = "0.3"
half = "2.2.1"
indexmap = "2.0.0"
itertools = "0.11"
log = "^0.4"
num_cpus = "1.13.0"
object_store = { version = "0.7.0", default-features = false }
parking_lot = "0.12"
parquet = { version = "48.0.0", default-features = false, features = ["arrow", "async", "object_store"] }
rand = "0.8"
rstest = "0.18.0"
serde_json = "1"
sqlparser = { version = "0.39.0", features = ["visitor"] }
tempfile = "3"
thiserror = "1.0.44"
chrono = { version = "0.4.31", default-features = false }
url = "2.2"

[profile.release]
codegen-units = 1
Expand All @@ -74,3 +108,4 @@ opt-level = 3
overflow-checks = false
panic = 'unwind'
rpath = false

2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Default features:
- `compression`: reading files compressed with `xz2`, `bzip2`, `flate2`, and `zstd`
- `crypto_expressions`: cryptographic functions such as `md5` and `sha256`
- `encoding_expressions`: `encode` and `decode` functions
- `parquet`: support for reading the [Apache Parquet] format
- `regex_expressions`: regular expression functions, such as `regexp_match`
- `unicode_expressions`: Include unicode aware functions such as `character_length`

Expand All @@ -59,6 +60,7 @@ Optional features:
- `simd`: enable arrow-rs's manual `SIMD` kernels (requires Rust `nightly`)

[apache avro]: https://avro.apache.org/
[apache parquet]: https://parquet.apache.org/

## Rust Version Compatibility

Expand Down
20 changes: 10 additions & 10 deletions benchmarks/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
[package]
name = "datafusion-benchmarks"
description = "DataFusion Benchmarks"
version = "32.0.0"
version = "33.0.0"
edition = { workspace = true }
authors = ["Apache Arrow <dev@arrow.apache.org>"]
homepage = "https://github.com/apache/arrow-datafusion"
Expand All @@ -34,20 +34,20 @@ snmalloc = ["snmalloc-rs"]

[dependencies]
arrow = { workspace = true }
datafusion = { path = "../datafusion/core", version = "32.0.0" }
datafusion-common = { path = "../datafusion/common", version = "32.0.0" }
env_logger = "0.10"
futures = "0.3"
log = "^0.4"
datafusion = { path = "../datafusion/core", version = "33.0.0" }
datafusion-common = { path = "../datafusion/common", version = "33.0.0" }
env_logger = { workspace = true }
futures = { workspace = true }
log = { workspace = true }
mimalloc = { version = "0.1", optional = true, default-features = false }
num_cpus = "1.13.0"
parquet = { workspace = true }
num_cpus = { workspace = true }
parquet = { workspace = true, default-features = true }
serde = { version = "1.0.136", features = ["derive"] }
serde_json = "1.0.78"
serde_json = { workspace = true }
snmalloc-rs = { version = "0.3", optional = true }
structopt = { version = "0.3", default-features = false }
test-utils = { path = "../test-utils/", version = "0.1.0" }
tokio = { version = "^1.0", features = ["macros", "rt", "rt-multi-thread", "parking_lot"] }

[dev-dependencies]
datafusion-proto = { path = "../datafusion/proto", version = "32.0.0" }
datafusion-proto = { path = "../datafusion/proto", version = "33.0.0" }
35 changes: 35 additions & 0 deletions ci/scripts/rust_example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

set -ex
cd datafusion-examples/examples/
cargo fmt --all -- --check

files=$(ls .)
for filename in $files
do
example_name=`basename $filename ".rs"`
# Skip tests that rely on external storage and flight
# todo: Currently, catalog.rs is placed in the external-dependence directory because there is a problem parsing
# the parquet file of the external parquet-test that it currently relies on.
# We will wait for this issue[https://github.com/apache/arrow-datafusion/issues/8041] to be resolved.
if [ ! -d $filename ]; then
cargo run --example $example_name
fi
done
Loading

0 comments on commit ca4b6ee

Please sign in to comment.