Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate dyn scalar kernels #1685

Merged
merged 6 commits into from
Jan 30, 2022

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #1610

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Jan 26, 2022
@matthewmturner matthewmturner force-pushed the incorporate_dyn_scalar_kernels branch from a1728b8 to 0e412eb Compare January 26, 2022 16:36
@matthewmturner
Copy link
Contributor Author

@alamb ive started the work on this.

given how the signature of the dyn scalar kernels was changed to use num::ToPrimitive and DataFusion uses ScalarValue do you think we need to update ScalarValue to implement num::ToPrimitive?

@matthewmturner matthewmturner changed the title Start incorporating dyn scalar kernels Incorporate dyn scalar kernels Jan 26, 2022
Comment on lines 942 to 963
let is_numeric = DataType::is_numeric($LEFT.data_type());
let is_numeric_dict = match $LEFT.data_type() {
DataType::Dictionary(_, val_type) => DataType::is_numeric(val_type),
_ => false
};
let numeric_like = is_numeric | is_numeric_dict;

let is_string = ($LEFT.data_type() == &DataType::Utf8) | ($LEFT.data_type() == &DataType::LargeUtf8);
let is_string_dict = match $LEFT.data_type() {
DataType::Dictionary(_, val_type) => match **val_type {
DataType::Utf8 | DataType::LargeUtf8 => true,
_ => false
}
};
let string_like = is_string | is_string_dict;

let result: Result<Arc<dyn Array>> = if numeric_like {
compute_op_dyn_scalar!($LEFT, $RIGHT, $OP)
} else if string_like {
compute_utf8_op_dyn_scalar!($LEFT, $RIGHT, $OP)
} else {
let r: Result<Arc<dyn Array>> = match $LEFT.data_type() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb FYI this is how i ended up implementing the pseudo code i previously mentioned.

if primitive_or_dict_vals_primitive {
    call_dyn_cmp_scalar(array, scalar, op)
} else if string_or_dict_vals_string {
    call_dyn_cmp_utf8_scalar(array, scalar, op)
} else {
    // Existing implementation for other types
}

@alamb
Copy link
Contributor

alamb commented Jan 26, 2022

given how the signature of the dyn scalar kernels was changed to use num::ToPrimitive and DataFusion uses ScalarValue do you think we need to update ScalarValue to implement num::ToPrimitive?

That might let us clean up a bit of code 🤔

I am not 100% sure about it...

@matthewmturner
Copy link
Contributor Author

@alamb ok. im going to play with it here for now to see how it works and if needed i can always create a separate issue / pr to discuss and work on that before finishing this.

@matthewmturner
Copy link
Contributor Author

I'm a bit stumped on this. I'm going to have to come back to it.

I thought I was getting close but I keep coming back to the issue of figuring out how to use ScalarValue with the dyn kernels. Just implementing ToPrimitive on ScalarValue doesnt work as Copy is also required which I dont think we want to / cant implement on ScalarValue.

Of note, i tried removing the Copy constraint from the xx_dyn_scalar kernels and arrow still passed cargo check and cargo test. I'm wondering if we would be able to remove that constraint.

@alamb if you happen to have time and have any thoughts let me know.

@alamb
Copy link
Contributor

alamb commented Jan 27, 2022

Update is I think @matthewmturner fixed it upstream apache/arrow-rs#1243

In the interim you might be able to work around the issue with some sort of newtype hack like

struct ScalarValueWrapper(ScalarValue)

impl Copy for ScalarValueWrapper {
  fn copy(&self) -> Self { self.0.clone() }
}

or something yucky like that while we are waiting for the next arrow to be released

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewmturner -- this is looking neat.

@@ -333,6 +333,29 @@ impl std::hash::Hash for ScalarValue {
}
}

impl num::ToPrimitive for ScalarValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Int16(v) => Some(v.unwrap() as i64),
Int32(v) => Some(v.unwrap() as i64),
Int64(v) => Some(v.unwrap() as i64),
_ => None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably implement it for UInt* as well and the timestamp types

fn to_u64(&self) -> Option<u64> {
use ScalarValue::*;
match self {
UInt8(v) => Some(v.unwrap() as u64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be ok to implement for Int8 and the other signed types as long as the value is greater than 0

fn to_i64(&self) -> Option<i64> {
use ScalarValue::*;
match self {
Int8(v) => Some(v.unwrap() as i64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid the unwrap using something like:

Suggested change
Int8(v) => Some(v.unwrap() as i64),
Int8(v) => v.map(|v| v.into()),

};
let numeric_like = is_numeric | is_numeric_dict;

let is_string = ($LEFT.data_type() == &DataType::Utf8) | ($LEFT.data_type() == &DataType::LargeUtf8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the dyn kernel to pick should be based on the type of RIGHT (the scalarrather thanLEFT(as there is already a pile of dispatch logic in theeq_dyn` kernels)

I wonder if we could do something like

match $RIGHT {
  ScalarValue::Utf8(v) => {
    Ok(Arc::new(paste::expr! {[<$OP _dyn_scalar_utf8>]}(
            $LEFT,
            v,
        )?))
  }
  ..
  ScalarValue::Int8(v) => {
        Ok(Arc::new(paste::expr! {[<$OP _dyn_scalar>]}(
            $LEFT,
            v,
        )?))
  }
  ...
}

Though we will probably need some sort of wrapper to handle types not yet supported in arrow-rs 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb this basically solved everything i think...we'll see how CI comes out

@matthewmturner
Copy link
Contributor Author

@alamb CI passed! thanks as always for your fantastic guidance.

Let me know if you see any issues.

Im just going to fix whatever happened to Cargo.toml

@matthewmturner matthewmturner marked this pull request as ready for review January 28, 2022 22:03
@matthewmturner
Copy link
Contributor Author

Actually, it was just reordering the file. Is that ok to keep?

@@ -16,97 +16,98 @@
# under the License.

[package]
name = "datafusion"
authors = ["Apache Arrow <dev@arrow.apache.org>"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭why are there so many meaningless changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I accidentally started auto sorting my Cargo.toml. Or maybe it was related to upgrading cargo / rust? I'm not sure to be honest - trying to figure out what caused the change.

@@ -474,7 +523,7 @@ macro_rules! compute_bool_op {
/// LEFT is array, RIGHT is scalar value
macro_rules! compute_op_scalar {
($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{
use std::convert::TryInto;
// use std::convert::TryInto;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be removed.

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's LGTM expect above comments.

@matthewmturner
Copy link
Contributor Author

matthewmturner commented Jan 29, 2022

@liukun4515 @alamb turns out there was auto sort setting turned on for the syntax highlighter i was using for TOML files. Maybe I accidentally turned on. Apologies for confusion.

@liukun4515
Copy link
Contributor

@liukun4515 @alamb turns out there was auto sort setting turned on for the syntax highlighter i was using for TOML files. Maybe I accidentally turned on. Apologies for confusion.

Don't worry, you have reset it.

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice -- thank you @matthewmturner and @liukun4515

ScalarValue::UInt16(v) => compute_op_dyn_scalar!($LEFT, v, $OP),
ScalarValue::UInt32(v) => compute_op_dyn_scalar!($LEFT, v, $OP),
ScalarValue::UInt64(v) => compute_op_dyn_scalar!($LEFT, v, $OP),
ScalarValue::Float32(_) => compute_op_scalar!($LEFT, $RIGHT, $OP, Float32Array),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb alamb merged commit 3494e9c into apache:master Jan 30, 2022
alamb added a commit that referenced this pull request Feb 8, 2022
* feat: add join type for logical plan display (#1674)

* (minor) Reduce memory manager and disk manager logs from `info!` to `debug!` (#1689)

* Move `information_schema` tests out of execution/context.rs to `sql_integration` tests (#1684)

* Move tests from context.rs to information_schema.rs

* Fix up tests to compile

* Move timestamp related tests out of context.rs and into sql integration test (#1696)

* Move some tests out of context.rs and into sql

* Move support test out of context.rs and into sql tests

* Fixup tests and make them compile

* Fix parquet projection

* fix pruning casting

* fix test based on debug strings

* revert read_spill method by getting schema from file

* Add `MemTrackingMetrics` to ease memory tracking for non-limited memory consumers (#1691)

* Memory manager no longer track consumers, update aggregatedMetricsSet

* Easy memory tracking with metrics

* use tracking metrics in SPMS

* tests

* fix

* doc

* Update datafusion/src/physical_plan/sorts/sort.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* make tracker AtomicUsize

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Implement TableProvider for DataFrameImpl (#1699)

* Add TableProvider impl for DataFrameImpl

* Add physical plan in

* Clean up plan construction and names construction

* Remove duplicate comments

* Remove unused parameter

* Add test

* Remove duplicate limit comment

* Use cloned instead of individual clone

* Reduce the amount of code to get a schema

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Add comments to test

* Fix plan comparison

* Compare only the results of execution

* Remove println

* Refer to df_impl instead of table in test

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Fix the register_table test to use the correct result set for comparison

* Consolidate group/agg exprs

* Format

* Remove outdated comment

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refine test in repartition.rs & coalesce_batches.rs (#1707)

* Fuzz test for spillable sort (#1706)

* Lazy TempDir creation in DiskManager (#1695)

* Incorporate dyn scalar kernels (#1685)

* Rebase

* impl ToNumeric for ScalarValue

* Update macro to be based on

* Add floats

* Cleanup

* Newline

* add annotation for select_to_plan (#1714)

* Support `create_physical_expr` and `ExecutionContextState` or `DefaultPhysicalPlanner` for faster speed (#1700)

* Change physical_expr creation API

* Refactor API usage to avoid creating ExecutionContextState

* Fixup ballista

* clippy!

* Fix can not load parquet table form spark in datafusion-cli. (#1665)

* fix can not load parquet table form spark

* add Invalid file in log.

* fix fmt

* add upper bound for pub fn (#1713)

Signed-off-by: remzi <13716567376yh@gmail.com>

* Create SchemaAdapter trait to map table schema to file schemas (#1709)

* Create SchemaAdapter trait to map table schema to file schemas

* Linting fix

* Remove commented code

* approx_quantile() aggregation function (#1539)

* feat: implement TDigest for approx quantile

Adds a [TDigest] implementation providing approximate quantile
estimations of large inputs using a small amount of (bounded) memory.

A TDigest is most accurate near either "end" of the quantile range (that
is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that
increases resolution at the tails. The paper claims single digit part
per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and
in practice I have found accuracy to be more than acceptable for an
apprixmate function across the entire quantile range.

The implementation is a modified copy of
https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++
implementation]. Both Facebook's implementation, and Mn02's Rust port
are Apache 2.0 licensed.

[TDigest]: https://arxiv.org/abs/1902.04023
[Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h

* feat: approx_quantile aggregation

Adds the ApproxQuantile physical expression, plumbing & test cases.

The function signature is:

	approx_quantile(column, quantile)

Where column can be any numeric type (that can be cast to a float64) and
quantile is a float64 literal between 0 and 1.

* feat: approx_quantile dataframe function

Adds the approx_quantile() dataframe function, and exports it in the
prelude.

* refactor: bastilla approx_quantile support

Adds bastilla wire encoding for approx_quantile.

Adding support for this required modifying the AggregateExprNode proto
message to support propigating multiple LogicalExprNode aggregate
arguments - all the existing aggregations take a single argument, so
this wasn't needed before.

This commit adds "repeated" to the expr field, which I believe is
backwards compatible as described here:

	https://developers.google.com/protocol-buffers/docs/proto3#updating

Specifically, adding "repeated" to an existing message field:

	"For ... message fields, optional is compatible with repeated"

No existing tests needed fixing, and a new roundtrip test is included
that covers the change to allow multiple expr.

* refactor: use input type as return type

Casts the calculated quantile value to the same type as the input data.

* fixup! refactor: bastilla approx_quantile support

* refactor: rebase onto main

* refactor: validate quantile value

Ensures the quantile values is between 0 and 1, emitting a plan error if
not.

* refactor: rename to approx_percentile_cont

* refactor: clippy lints

* suppport bitwise and as an example (#1653)

* suppport bitwise and as an example

* Use $OP in macro rather than `&`

* fix: change signature to &dyn Array

* fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: substr - correct behaivour with negative start pos (#1660)

* minor: fix cargo run --release error (#1723)

* Convert boolean case expressions to boolean logic (#1719)

* Convert boolean case expressions to boolean logic

* Review feedback

* substitute `parking_lot::Mutex` for `std::sync::Mutex` (#1720)

* Substitute parking_lot::Mutex for std::sync::Mutex

* enable parking_lot feature in tokio

* Add Expression Simplification API (#1717)

* Add Expression Simplification API

* fmt

* use from_slice(&[T]) instead of from_slice(Vec<T>) to prevent future merge conflicts

* fix decimal add because arrow2 doesn't include decimal add in arithmetics::add

* fix decimal scale for cast test

* fix parquet file format adapted projection by providing the proper schema to the RecordBatch

Co-authored-by: xudong.w <wxd963996380@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Co-authored-by: Matthew Turner <matthew.m.turner@outlook.com>
Co-authored-by: Yang <37145547+Ted-Jiang@users.noreply.github.com>
Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: Kun Liu <liukun@apache.org>
Co-authored-by: Dmitry Patsura <talk@dmtry.me>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
alamb added a commit that referenced this pull request Feb 15, 2022
* feat: add join type for logical plan display (#1674)

* (minor) Reduce memory manager and disk manager logs from `info!` to `debug!` (#1689)

* Move `information_schema` tests out of execution/context.rs to `sql_integration` tests (#1684)

* Move tests from context.rs to information_schema.rs

* Fix up tests to compile

* Move timestamp related tests out of context.rs and into sql integration test (#1696)

* Move some tests out of context.rs and into sql

* Move support test out of context.rs and into sql tests

* Fixup tests and make them compile

* Add `MemTrackingMetrics` to ease memory tracking for non-limited memory consumers (#1691)

* Memory manager no longer track consumers, update aggregatedMetricsSet

* Easy memory tracking with metrics

* use tracking metrics in SPMS

* tests

* fix

* doc

* Update datafusion/src/physical_plan/sorts/sort.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* make tracker AtomicUsize

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Implement TableProvider for DataFrameImpl (#1699)

* Add TableProvider impl for DataFrameImpl

* Add physical plan in

* Clean up plan construction and names construction

* Remove duplicate comments

* Remove unused parameter

* Add test

* Remove duplicate limit comment

* Use cloned instead of individual clone

* Reduce the amount of code to get a schema

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Add comments to test

* Fix plan comparison

* Compare only the results of execution

* Remove println

* Refer to df_impl instead of table in test

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Fix the register_table test to use the correct result set for comparison

* Consolidate group/agg exprs

* Format

* Remove outdated comment

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refine test in repartition.rs & coalesce_batches.rs (#1707)

* Fuzz test for spillable sort (#1706)

* Lazy TempDir creation in DiskManager (#1695)

* Incorporate dyn scalar kernels (#1685)

* Rebase

* impl ToNumeric for ScalarValue

* Update macro to be based on

* Add floats

* Cleanup

* Newline

* add annotation for select_to_plan (#1714)

* Support `create_physical_expr` and `ExecutionContextState` or `DefaultPhysicalPlanner` for faster speed (#1700)

* Change physical_expr creation API

* Refactor API usage to avoid creating ExecutionContextState

* Fixup ballista

* clippy!

* Fix can not load parquet table form spark in datafusion-cli. (#1665)

* fix can not load parquet table form spark

* add Invalid file in log.

* fix fmt

* add upper bound for pub fn (#1713)

Signed-off-by: remzi <13716567376yh@gmail.com>

* Create SchemaAdapter trait to map table schema to file schemas (#1709)

* Create SchemaAdapter trait to map table schema to file schemas

* Linting fix

* Remove commented code

* approx_quantile() aggregation function (#1539)

* feat: implement TDigest for approx quantile

Adds a [TDigest] implementation providing approximate quantile
estimations of large inputs using a small amount of (bounded) memory.

A TDigest is most accurate near either "end" of the quantile range (that
is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that
increases resolution at the tails. The paper claims single digit part
per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and
in practice I have found accuracy to be more than acceptable for an
apprixmate function across the entire quantile range.

The implementation is a modified copy of
https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++
implementation]. Both Facebook's implementation, and Mn02's Rust port
are Apache 2.0 licensed.

[TDigest]: https://arxiv.org/abs/1902.04023
[Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h

* feat: approx_quantile aggregation

Adds the ApproxQuantile physical expression, plumbing & test cases.

The function signature is:

	approx_quantile(column, quantile)

Where column can be any numeric type (that can be cast to a float64) and
quantile is a float64 literal between 0 and 1.

* feat: approx_quantile dataframe function

Adds the approx_quantile() dataframe function, and exports it in the
prelude.

* refactor: bastilla approx_quantile support

Adds bastilla wire encoding for approx_quantile.

Adding support for this required modifying the AggregateExprNode proto
message to support propigating multiple LogicalExprNode aggregate
arguments - all the existing aggregations take a single argument, so
this wasn't needed before.

This commit adds "repeated" to the expr field, which I believe is
backwards compatible as described here:

	https://developers.google.com/protocol-buffers/docs/proto3#updating

Specifically, adding "repeated" to an existing message field:

	"For ... message fields, optional is compatible with repeated"

No existing tests needed fixing, and a new roundtrip test is included
that covers the change to allow multiple expr.

* refactor: use input type as return type

Casts the calculated quantile value to the same type as the input data.

* fixup! refactor: bastilla approx_quantile support

* refactor: rebase onto main

* refactor: validate quantile value

Ensures the quantile values is between 0 and 1, emitting a plan error if
not.

* refactor: rename to approx_percentile_cont

* refactor: clippy lints

* suppport bitwise and as an example (#1653)

* suppport bitwise and as an example

* Use $OP in macro rather than `&`

* fix: change signature to &dyn Array

* fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: substr - correct behaivour with negative start pos (#1660)

* minor: fix cargo run --release error (#1723)

* Convert boolean case expressions to boolean logic (#1719)

* Convert boolean case expressions to boolean logic

* Review feedback

* substitute `parking_lot::Mutex` for `std::sync::Mutex` (#1720)

* Substitute parking_lot::Mutex for std::sync::Mutex

* enable parking_lot feature in tokio

* Add Expression Simplification API (#1717)

* Add Expression Simplification API

* fmt

* Add tests and CI for optional pyarrow module (#1711)

* Implement other side of conversion

* Add test workflow

* Add (failing) tests

* Get unit tests passing

* Use python -m pip

* Debug LD_LIBRARY_PATH

* Set LIBRARY_PATH

* Update help with better info

* Update parking_lot requirement from 0.11 to 0.12 (#1735)

Updates the requirements on [parking_lot](https://github.com/Amanieu/parking_lot) to permit the latest version.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](Amanieu/parking_lot@0.11.0...0.12.0)

---
updated-dependencies:
- dependency-name: parking_lot
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Prevent repartitioning of certain operator's direct children (#1731) (#1732)

* Prevent repartitioning of certain operator's direct children (#1731)

* Update ballista tests

* Don't repartition children of RepartitionExec

* Revert partition restriction on Repartition and Projection

* Review feedback

* Lint

* API to get Expr's type and nullability without a `DFSchema` (#1726)

* API to get Expr type and nullability without a `DFSchema`

* Add test

* publically export

* Improve docs

* Fix typos in crate documentation (#1739)

* add `cargo check --release` to ci (#1737)

* remote test

* Update .github/workflows/rust.yml

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Move optimize test out of context.rs (#1742)

* Move optimize test out of context.rs

* Update

* use clap 3 style args parsing for datafusion cli (#1749)

* use clap 3 style args parsing for datafusion cli

* upgrade cli version

* Add partitioned_csv setup code to sql_integration test (#1743)

* use ordered-float 2.10 (#1756)

Signed-off-by: Andy Grove <agrove@apache.org>

* #1768 Support TimeUnit::Second in hasher (#1769)

* Support TimeUnit::Second in hasher

* fix linter

* format (#1745)

* Create built-in scalar functions programmatically (#1734)

* create build-in scalar functions programatically

Signed-off-by: remzi <13716567376yh@gmail.com>

* solve conflict

Signed-off-by: remzi <13716567376yh@gmail.com>

* fix spelling mistake

Signed-off-by: remzi <13716567376yh@gmail.com>

* rename to call_fn

Signed-off-by: remzi <13716567376yh@gmail.com>

* [split/1] split datafusion-common module (#1751)

* split datafusion-common module

* pyarrow

* Update datafusion-common/README.md

Co-authored-by: Andy Grove <agrove@apache.org>

* Update datafusion/Cargo.toml

* include publishing

Co-authored-by: Andy Grove <agrove@apache.org>

* fix: Case insensitive unquoted identifiers (#1747)

* move dfschema and column (#1758)

* add datafusion-expr module (#1759)

* move column, dfschema, etc. to common module (#1760)

* include window frames and operator into datafusion-expr (#1761)

* move signature, type signature, and volatility to split module (#1763)

* [split/10] split up expr for rewriting, visiting, and simplification traits (#1774)

* split up expr for rewriting, visiting, and simplification

* add docs

* move built-in scalar functions (#1764)

* split expr type and null info to be expr-schemable (#1784)

* rewrite predicates before pushing to union inputs (#1781)

* move accumulator and columnar value (#1765)

* move accumulator and columnar value (#1762)

* fix bad data type in test_try_cast_decimal_to_decimal

* added projections for avro columns

Co-authored-by: xudong.w <wxd963996380@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Co-authored-by: Matthew Turner <matthew.m.turner@outlook.com>
Co-authored-by: Yang <37145547+Ted-Jiang@users.noreply.github.com>
Co-authored-by: Remzi Yang <59198230+HaoYang670@users.noreply.github.com>
Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: Kun Liu <liukun@apache.org>
Co-authored-by: Dmitry Patsura <talk@dmtry.me>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: r.4ntix <r.4ntix@gmail.com>
Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
Co-authored-by: Andy Grove <agrove@apache.org>
Co-authored-by: Rich <jychen7@users.noreply.github.com>
Co-authored-by: Marko Mikulicic <mmikulicic@gmail.com>
Co-authored-by: Eduard Karacharov <13005055+korowa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch datafusion to using eq_dyn_scalar, etc kernels
3 participants