Introduce Signature::String and return error if input of `strpos` is integer #12751

jayzhan211 · 2024-10-04T05:19:25Z

Which issue does this PR close?

Rationale for this change

I think #12712 introduce a regression on integer type, this PR introduce another signature for string, and return error for integer input on strpos function, which has the same result as Postgres

postgres=# select position(1 in 1);
ERROR:  function pg_catalog.position(integer, integer) does not exist
LINE 1: select position(1 in 1);
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Others

Not sure is this generalize to all the string related function. Are they all expect converting null to String? Handle dictionary?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 · 2024-10-04T05:29:36Z

datafusion/functions/src/math/power.rs


 use arrow::array::{ArrayRef, Float64Array, Int64Array};
-use datafusion_expr::TypeSignature::*;


I prefer not to import wildcard in non-test code area, in this PR it conflicts with rust String and TypeSignature::String

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 · 2024-10-04T09:26:51Z

datafusion/core/tests/expr_api/mod.rs

@@ -37,14 +37,14 @@ mod simplification;
 fn test_octet_length() {
    #[rustfmt::skip]
    evaluate_expr_test(
-        octet_length(col("list")),
+        octet_length(col("id")),


octet_length does not support list type, so change to id.

octet_length takes a string, but everything is supposed to be coercible to string (for some reason). the original example should probably still work, but it is a good idea to add a test with octet_length directly on a string column

When you say "everything is supposed to be coercable to a string" from where do you mean? Is that a SQL "standard" behavior? I don't think it is necessairly the behavior of DataFusion (or postgres)

For example, postgres doesn't automatically coerce integers to strings in all cases. Here is one:

ERROR: operator does not exist: integer ~~ unknown LINE 1: select 123 LIKE 'foo'; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. postgres=# select 123::varchar LIKE 'foo'; ?column? ---------- f (1 row)

When you say "everything is supposed to be coercable to a string" from where do you mean?

i mean this

datafusion/datafusion/expr/src/type_coercion/functions.rs

Lines 649 to 650 in 8aafa54

// Any type can be coerced into strings

(Utf8 | LargeUtf8, _) => Some(type_into.clone()),

Is that a SQL "standard" behavior?

no, i don't think so.

i am NOT proposing this behavior, or saying that such behavior is warranted.
I am just relying what the code says that DF does.

findepi · 2024-10-04T10:06:23Z

datafusion/expr-common/src/signature.rs

+    /// Fixed number of arguments of string types.
+    /// The precedence of type from high to low is Utf8View, LargeUtf8 and Utf8.
+    /// Null is considerd as Utf8 by default
+    /// Dictionay with string value type is also handled.


nit: typo "Dictionay"

findepi · 2024-10-04T10:07:37Z

datafusion/expr-common/src/signature.rs

@@ -125,6 +125,11 @@ pub enum TypeSignature {
    /// Fixed number of arguments of numeric types.
    /// See <https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#method.is_numeric> to know which type is considered numeric
    Numeric(usize),
+    /// Fixed number of arguments of string types.
+    /// The precedence of type from high to low is Utf8View, LargeUtf8 and Utf8.
+    /// Null is considerd as Utf8 by default


Null is not Utf8, but may be coercible to that.
In fact, Null type should be coercible to any other type, but function signatue doesn't need to define that. The function signature needs to define what types the function accepts, and function resolution / expression analyzer should insert coercions as necessary.

Yes, I agree Null should be coercable to any other type (and is generally).

I agree the function signature should define what the function accepts

If we found a function that reject null to string, we could define it with user_define or add another Signature::StringWithoutNull signature if there are more than one.

findepi · 2024-10-04T10:12:03Z

datafusion/expr-common/src/signature.rs

@@ -125,6 +125,11 @@ pub enum TypeSignature {
    /// Fixed number of arguments of numeric types.
    /// See <https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#method.is_numeric> to know which type is considered numeric
    Numeric(usize),
+    /// Fixed number of arguments of string types.


It's ok for now, but it feels pretty inflexible.
Why have special handling for functions that take only strings or only numbers?
What about functions that accept one (any) string and one (any) number? (eg substr)

The Numeric and String could be "a type class"

// strpos Signature::from_type_classes(vec![ TypeClass:String, TypeClass:String, ]); // hypothetical substr Signature::from_type_classes(vec![ TypeClass:String, TypeClass:Numeric, TypeClass:Numeric, ]);

This would be more flexible. However, aren't we just defining the "logical types" (#11513) this way?
Maybe we could introduce such a concept into the codebase and simply start using it incrementally?

cc @alamb @notfilippo

Yes, I agree that this could be the start of effectively defining a logical String type

Coercion actually seems like a pretty good place to start doing so, when I think about it

I know, that is why I didn't introduce this signature before, but works on #12622 first. This PR is mainly fixing the regression.

findepi · 2024-10-04T10:13:02Z

datafusion/core/tests/expr_api/mod.rs

@@ -37,14 +37,14 @@ mod simplification;
 fn test_octet_length() {
    #[rustfmt::skip]
    evaluate_expr_test(
-        octet_length(col("list")),
+        octet_length(col("id")),


octet_length takes a string, but everything is supposed to be coercible to string (for some reason). the original example should probably still work, but it is a good idea to add a test with octet_length directly on a string column

findepi · 2024-10-04T10:15:29Z

datafusion/functions/src/math/nans.rs

-                vec![Exact(vec![Float32]), Exact(vec![Float64])],
+                vec![
+                    TypeSignature::Exact(vec![Float32]),
+                    TypeSignature::Exact(vec![Float64]),


Thanks for improving the code style!

i see there is a bunch of codestyle improvement like this. Would it be possible to package them separately so that mechanical changes can be reviewed with less scrutiny and more focus put on important changes?

I think it's clear that the change is necessary due to the name conflict with Signature::String, so it would be better to group them together in this case

findepi · 2024-10-04T10:18:17Z

datafusion/functions/src/string/btrim.rs

-                    Exact(vec![Utf8View]),
-                    Exact(vec![Utf8]),
-                ],
+                vec![TypeSignature::String(2), TypeSignature::String(1)],


The function did not declare support for LargeUtf8, so the code below checking arg 0 data type to be LargeUtf8 was probably dead code. Now it no longer is. Should this new capability be test covered?

It would be an excellent use of @goldmedal 's infrastructure for testing in

https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files/string#readme

findepi · 2024-10-04T10:18:44Z

datafusion/functions/src/string/contains.rs

-                ],
-                Volatility::Immutable,
-            ),
+            signature: Signature::string(2, Volatility::Immutable),


is there dead code now that lhs type needs to be same as rhs type?

findepi · 2024-10-04T10:21:27Z

datafusion/functions/src/string/ends_with.rs

-                ],
-                Volatility::Immutable,
-            ),
+            signature: Signature::string(2, Volatility::Immutable),


I think it's actually incorrect. My understanding of arrow::compute::kernels::comparison::ends_with is that it actually supports exactly only the cases where left type == right type.

As you pointed out above, doesn't Signature::string coerce to a single common type?

that was written before i realized i read the code wrong. resolving

findepi · 2024-10-04T10:25:04Z

datafusion/expr/src/type_coercion/functions.rs

+            for t in current_types.iter().skip(1) {
+                coerced_type = coercion_rule(&coerced_type, t)?;


This is important decision place. Why would we coerce all arguments to one common super type? Why should a function abstain from accepting a utf8view and utf8 argument pair? fwiw this is exactly what strpos was doing?

If we want to do this for some reason, I would prefer the functions to be converted one by one, so that we can eliminate dead code.

For example if signature::string(2) means "any two string-like arguments", then the change in ContainsFunc is trivial improvement.
However, if signature::string(2) means "two arguments of same string-like type", then ContainsFunc now has dead code which should be removed. (code handling cases of arguments of differing types)

cc @alamb

Why would we coerce all arguments to one common super type? Why should a function abstain from accepting a utf8view and utf8 argument pair? fwiw this is exactly what strpos was doing?

I think this is a simplification for the common case of functions that don't handle all possible String types. Many of the built in function implementations in DataFusion take multiple string arguments but typically only handle one common type (not all possible combinations of types.

There is likely a tradeoff in runtime performance with specialized functions for all possible input parameter type combinations and the compiletime / binary size. I think it is reasomable (and common) to coerce to a common super type

this is exactly what strpos was doing?

That is why I think we need function-specific coercion, coercion rule may be different for each function and each downstream project. It is better not to assume the coercion rule for the user, or at least they should be able to define their own coercion rule function-wise if the default one doesn't fit their need.

Signature::String is a general signature that fit most of the string related function, so we should define the most common way, so I choose the rule that has only one single type for simplicity.

findepi · 2024-10-04T10:31:40Z

datafusion/expr-common/src/signature.rs

@@ -125,6 +125,11 @@ pub enum TypeSignature {
    /// Fixed number of arguments of numeric types.
    /// See <https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#method.is_numeric> to know which type is considered numeric
    Numeric(usize),
+    /// Fixed number of arguments of string types.


Fixed number of arguments of string types.

Currently implementation can be better documented as "Fixed number of arguments of all of same string type (all of Utf8 type, or all of LargeUtf8 type or all of Utf8View type)".

If this is indeed semantics we want (see comment in datafusion/expr/src/type_coercion/functions.rs for that), let's make sure the documentation makes it clear.

findepi · 2024-10-04T10:48:12Z

datafusion/expr/src/type_coercion/functions.rs

+                    _ => {
+                        if let Some(coerced_type) = string_coercion(lhs_type, rhs_type) {
+                            Ok(coerced_type)
+                        } else {
+                            plan_err!(
+                                "{} and {} are not coercible to a common string type",


I believe coercion rules should not be defined separately for each function.
The function should declare what it accepts, and coercion rules should be globally applicable.

In PostgreSQL the docs unfortunately don't say what casts are implicitly added, one has to query the system catalog for that

SELECT * FROM pg_cast WHERE castcontext = 'i'; -- 'i' indicates implicit casts

the point is -- the implicit cast / implicit coercion rules are defined on type pairs, and don't called function name as a context. Thus if type A is coercible to B for the sake of foo(B) call, it will also be coercible for bar(B) call.
I as a user, find this property amusingly useful.

I'll consider how to implement a global, function-agnostic coercion rule, while still allowing function-specific coercion for customization

alamb

Thank you @jayzhan211 and @findepi

In my opinion this PR is better than what is on main, though I do think we should consider if now is the time to introduce some higher level version of "logical types" starting in the type signatures that define the coercion rules

alamb · 2024-10-04T19:15:40Z

datafusion/expr-common/src/signature.rs

@@ -125,6 +125,11 @@ pub enum TypeSignature {
    /// Fixed number of arguments of numeric types.
    /// See <https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#method.is_numeric> to know which type is considered numeric
    Numeric(usize),
+    /// Fixed number of arguments of string types.


Yes, I agree that this could be the start of effectively defining a logical String type

Coercion actually seems like a pretty good place to start doing so, when I think about it

alamb · 2024-10-04T19:17:20Z

datafusion/expr-common/src/signature.rs

@@ -125,6 +125,11 @@ pub enum TypeSignature {
    /// Fixed number of arguments of numeric types.
    /// See <https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#method.is_numeric> to know which type is considered numeric
    Numeric(usize),
+    /// Fixed number of arguments of string types.
+    /// The precedence of type from high to low is Utf8View, LargeUtf8 and Utf8.
+    /// Null is considerd as Utf8 by default


Yes, I agree Null should be coercable to any other type (and is generally).

I agree the function signature should define what the function accepts

alamb · 2024-10-04T19:20:34Z

datafusion/expr/src/type_coercion/functions.rs

+            for t in current_types.iter().skip(1) {
+                coerced_type = coercion_rule(&coerced_type, t)?;


Why would we coerce all arguments to one common super type? Why should a function abstain from accepting a utf8view and utf8 argument pair? fwiw this is exactly what strpos was doing?

I think this is a simplification for the common case of functions that don't handle all possible String types. Many of the built in function implementations in DataFusion take multiple string arguments but typically only handle one common type (not all possible combinations of types.

There is likely a tradeoff in runtime performance with specialized functions for all possible input parameter type combinations and the compiletime / binary size. I think it is reasomable (and common) to coerce to a common super type

alamb · 2024-10-04T19:22:06Z

datafusion/functions/src/string/btrim.rs

-                    Exact(vec![Utf8View]),
-                    Exact(vec![Utf8]),
-                ],
+                vec![TypeSignature::String(2), TypeSignature::String(1)],


It would be an excellent use of @goldmedal 's infrastructure for testing in

https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files/string#readme

alamb · 2024-10-04T19:22:43Z

datafusion/functions/src/string/ends_with.rs

-                ],
-                Volatility::Immutable,
-            ),
+            signature: Signature::string(2, Volatility::Immutable),


As you pointed out above, doesn't Signature::string coerce to a single common type?

alamb · 2024-10-04T19:24:52Z

datafusion/sqllogictest/test_files/string/string_view.slt

-01)Projection: starts_with(__common_expr_1, test.column2_utf8view) AS c1, starts_with(test.column1_utf8, test.column2_utf8) AS c3, starts_with(__common_expr_1, CAST(test.column2_large_utf8 AS Utf8View)) AS c4
-02)--Projection: CAST(test.column1_utf8 AS Utf8View) AS __common_expr_1, test.column1_utf8, test.column2_utf8, test.column2_large_utf8, test.column2_utf8view
-03)----TableScan: test projection=[column1_utf8, column2_utf8, column2_large_utf8, column2_utf8view]
+01)Projection: starts_with(CAST(test.column1_utf8 AS Utf8View), test.column2_utf8view) AS c1, starts_with(test.column1_utf8, test.column2_utf8) AS c3, starts_with(CAST(test.column1_utf8 AS LargeUtf8), test.column2_large_utf8) AS c4


This eems like a reasonable and better plan (fewer casts)

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 · 2024-10-05T00:41:48Z

Summary for these comments

Based on the Postgres's result, I disagree we should coerce anything (integer) to string. We can discuss it in another issue if we have no consensus on this.
Introduce LogicalType to simplify signature Strengthen TypeSignature and Coercion rule. #10507 (comment)
Determine how to establish a general coercion rule that applies before functions, while still allowing function-specific coercion for greater customization.

…os-signaure

findepi · 2024-10-07T12:04:37Z

Based on the Postgres's result, I disagree we should coerce anything (integer) to string. We can discuss it in another issue if we have no consensus on this.

i think we people are in agreement. The disagreeing party is the code in main.

Determine how to establish a general coercion rule that applies before functions, while still allowing function-specific coercion for greater customization.

why would we need function-specific coercion rules?
do we have an example of two functions that accept same formal parameter types, yet are applicable to different sets of actual argument types?

jayzhan211 · 2024-10-08T00:42:55Z

Based on the Postgres's result, I disagree we should coerce anything (integer) to string. We can discuss it in another issue if we have no consensus on this.

i think we people are in agreement. The disagreeing party is the code in main.

Determine how to establish a general coercion rule that applies before functions, while still allowing function-specific coercion for greater customization.

why would we need function-specific coercion rules? do we have an example of two functions that accept same formal parameter types, yet are applicable to different sets of actual argument types?

Both sum and max take a single argument, but they follow different coercion rules

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

…os-signaure

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 · 2024-10-08T07:05:37Z

I think the remaining discussion can be discussed elsewhere in the issue. Thanks @alamb @findepi

findepi · 2024-10-08T08:36:15Z

why would we need function-specific coercion rules? do we have an example of two functions that accept same formal parameter types, yet are applicable to different sets of actual argument types?

Both sum and max take a single argument, but they follow different coercion rules

I know sum and max may have different return type, but I didn't know they differ in coercion rules.
Can you please exemplify so that I understand better?

@etseidl

* Add support for external tables with qualified names (#12645) * Make support schemas * Set default name to table * Remove print statements and stale comment * Add tests for create table * Fix typo * Update datafusion/sql/src/statement.rs Co-authored-by: Jonah Gao <jonahgao@msn.com> * convert create_external_table to objectname * Add sqllogic tests * Fix failing tests --------- Co-authored-by: Jonah Gao <jonahgao@msn.com> * Fix Regex signature types (#12690) * Fix Regex signature types * Uncomment the shared tests in string_query.slt.part and removed tests copies everywhere else * Test `LIKE` and `MATCH` with flags; Remove new tests from regexp.slt * Refactor `ByteGroupValueBuilder` to use `MaybeNullBufferBuilder` (#12681) * Fix malformed hex string literal in docs (#12708) * Simplify match patterns in coercion rules (#12711) Remove conditions where unnecessary. Refactor to improve readability. * Remove aggregate functions dependency on frontend (#12715) * Remove aggregate functions dependency on frontend DataFusion is a SQL query engine and also a reusable library for building query engines. The core functionality should not depend on frontend related functionalities like `sqlparser` or `datafusion-sql`. * Remove duplicate license header * Minor: Remove clone in `transform_to_states` (#12707) * rm clone Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Refactor tests for union sorting properties, add tests for unions and constants (#12702) * Refactor tests for union sorting properties * update doc test * Undo import reordering * remove unecessary static lifetimes * Fix: support Qualified Wildcard in count aggregate function (#12673) * Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics (#12703) * Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics * Fix docs * Disallow duplicated qualified field names (#12608) * Disallow duplicated qualified field names * Fix tests * Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) (#12675) * add bench * replace macro with generic function * remove duplicated code * optimize base64/hex decode * Allow DynamicFileCatalog support to query partitioned file (#12683) * support to query partitioned table for dynamic file catalog * cargo clippy * split partitions inferring to another function * Support `LIMIT` Push-down logical plan optimization for `Extension` nodes (#12685) * Update trait `UserDefinedLogicalNodeCore` Signed-off-by: Austin Liu <austin362667@gmail.com> * Update corresponding interface Signed-off-by: Austin Liu <austin362667@gmail.com> Add rewrite rule for `push-down-limit` for `Extension` Signed-off-by: Austin Liu <austin362667@gmail.com> * Add rewrite rule for `push-down-limit` for `Extension` and tests Signed-off-by: Austin Liu <austin362667@gmail.com> * Update corresponding interface Signed-off-by: Austin Liu <austin362667@gmail.com> * Reorganize to match guard Signed-off-by: Austin Liu <austin362667@gmail.com> * Clena up Signed-off-by: Austin Liu <austin362667@gmail.com> Clean up Signed-off-by: Austin Liu <austin362667@gmail.com> --------- Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix AvroReader: Add union resolving for nested struct arrays (#12686) * Add union resolving for nested struct arrays * Add test * Change test * Reproduce index error * fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Adds macros for creating `WindowUDF` and `WindowFunction` expression (#12693) * Adds macro for udwf singleton * Adds a doc comment parameter to macro * Add doc comment for `create_udwf` macro * Uses default constructor * Update `Cargo.lock` in `datafusion-cli` * Fixes: expand `$FN_NAME` in doc strings * Adds example for macro usage * Renames macro * Improve doc comments * Rename udwf macro * Minor: doc copy edits * Adds macro for creating fluent-style expression API * Adds support for 1 or more parameters in expression function * Rewrite doc comments * Rename parameters * Minor: formatting * Adds doc comment for `create_udwf_expr` macro * Improve example docs * Hides extraneous code in doc comments * Add a one-line readme * Adds doc test assertions + minor formatting fixes * Adds common macro for defining user-defined window functions * Adds doc comment for `define_udwf_and_expr` * Defines `RowNumber` using common macro * Add usage example for common macro * Adds usage for custom constructor * Add examples for remaining patterns * Improve doc comments for usage examples * Rewrite inner line docs * Rewrite `create_udwf_expr!` doc comments * Minor doc improvements * Fix doc test and usage example * Add inline comments for macro patterns * Minor: change doc comment in example * Support unparsing plans with both Aggregation and Window functions (#12705) * Support unparsing plans with both Aggregation and Window functions (#35) * Fix unparsing for aggregation grouping sets * Add test for grouping set unparsing * Update datafusion/sql/src/unparser/utils.rs Co-authored-by: Jax Liu <liugs963@gmail.com> * Update datafusion/sql/src/unparser/utils.rs Co-authored-by: Jax Liu <liugs963@gmail.com> * Update * More tests --------- Co-authored-by: Jax Liu <liugs963@gmail.com> * Fix strpos invocation with dictionary and null (#12712) In 1b3608d `strpos` signature was modified to indicate it supports dictionary as input argument, but the invoke method doesn't support them. * docs: Update DataFusion introduction to clarify that DataFusion does provide an "out of the box" query engine (#12666) * Update DataFusion introduction to show that DataFusion offers packaged versions for end users * change order * Update README.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refine wording and update user guide for consistency * prettier --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Framework for generating function docs from embedded code documentation (#12668) * Initial work on #12432 to allow for generation of udf docs from embedded documentation in the code * Add missing license header. * Fixed examples. * Fixing a really weird RustRover/wsl ... something. No clue what happened there. * permission change * Cargo fmt update. * Refactored Documentation to allow it to be used in a const. * Add documentation for syntax_example * Refactoring Documentation based on PR feedback. * Cargo fmt update. * Doc update * Fixed copy/paste error. * Minor text updates. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Add IMDB(JOB) Benchmark [2/N] (imdb queries) (#12529) * imdb dataset * cargo fmt * Add 113 queries for IMDB(JOB) Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `get_query_sql` from `query_id` string Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix CSV reader & Remove Parquet partition Signed-off-by: Austin Liu <austin362667@gmail.com> * Add benchmark IMDB runner Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `run_imdb` script Signed-off-by: Austin Liu <austin362667@gmail.com> * Add checker for imdb option Signed-off-by: Austin Liu <austin362667@gmail.com> * Add SLT for IMDB Signed-off-by: Austin Liu <austin362667@gmail.com> * Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> Fix `get_query_sql()` for CI roundtrip test Signed-off-by: Austin Liu <austin362667@gmail.com> * Clean up Signed-off-by: Austin Liu <austin362667@gmail.com> * Add missing license Signed-off-by: Austin Liu <austin362667@gmail.com> * Add IMDB(JOB) queries `2b` to `5c` Signed-off-by: Austin Liu <austin362667@gmail.com> * Add `INCLUDE_IMDB` in CI verify-benchmark-results Signed-off-by: Austin Liu <austin362667@gmail.com> * Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com> Prepare IMDB dataset Signed-off-by: Austin Liu <austin362667@gmail.com> * use uint as id type * format * Seperate `tpch` and `imdb` benchmarking CI jobs Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Fix path Signed-off-by: Austin Liu <austin362667@gmail.com> Remove `tpch` in `imdb` benchmark Signed-off-by: Austin Liu <austin362667@gmail.com> * Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com> Remove IMDB(JOB) slt in CI Signed-off-by: Austin Liu <austin362667@gmail.com> --------- Signed-off-by: Austin Liu <austin362667@gmail.com> Co-authored-by: DouPache <douenergy@gmail.com> * Minor: avoid clone while calculating union equivalence properties (#12722) * Minor: avoid clone while calculating union equivalence properties * Update datafusion/physical-expr/src/equivalence/properties.rs * fmt * Simplify streaming_merge function parameters (#12719) * simplify streaming_merge function parameters * revert test change * change StreamingMergeConfig into builder pattern * Fix links on docs index page (#12750) * Provide field and schema metadata missing on cross joins, and union with null fields. (#12729) * test: reproducer for missing schema metadata on cross join * fix: pass thru schema metadata on cross join * fix: preserve metadata when transforming to view types * test: reproducer for missing field metadata in left hand NULL field of union * fix: preserve field metadata from right side of union * chore: safe indexing * Minor: Update string tests for strpos (#12739) * Apply `type_union_resolution` to array and values (#12753) * cleanup make array coercion rule Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change to type union resolution Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change value too Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix tpyo Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Add `DocumentationBuilder::with_standard_argument` to reduce copy/paste (#12747) * Add `DocumentationBuilder::with_standard_expression` to reduce copy/paste * fix doc * fix standard argument * Update docs * Improve documentation to explain what is different * fix `equal_to` in `PrimitiveGroupValueBuilder` (#12758) * fix `equal_to` in `PrimitiveGroupValueBuilder`. * fix typo. * add uts. * reduce calling of `is_null`. * Minor: doc how field name is to be set (#12757) * Fix `equal_to` in `ByteGroupValueBuilder` (#12770) * Fix `equal_to` in `ByteGroupValueBuilder` * refactor null_equal_to * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs * Allow simplification even when nullable (#12746) The nullable requirement seem to have been added in #1401 but as far as I can tell they are not needed for these 2 cases. I think this can be shown using this truth table: (generated using datafusion-cli without this patch) ``` > CREATE TABLE t (v BOOLEAN) as values (true), (false), (NULL); > select t.v, t2.v, t.v AND (t.v OR t2.v), t.v OR (t.v AND t2.v) from t cross join t as t2; +-------+-------+---------------------+---------------------+ | v | v | t.v AND t.v OR t2.v | t.v OR t.v AND t2.v | +-------+-------+---------------------+---------------------+ | true | true | true | true | | true | false | true | true | | true | | true | true | | false | true | false | false | | false | false | false | false | | false | | false | false | | | true | | | | | false | | | | | | | | +-------+-------+---------------------+---------------------+ ``` And it seems Spark applies both of these and DuckDB applies only the first one. * Fix unnest conjunction with selecting wildcard expression (#12760) * fix unnest statement with wildcard expression * add commnets * Improve `round` scalar function unparsing for Postgres (#12744) * Postgres: enforce required `NUMERIC` type for `round` scalar function (#34) Includes initial support for dialects to override scalar functions unparsing * Document scalar_function_to_sql_overrides fn * Fix stack overflow calculating projected orderings (#12759) * Fix stack overflow calculating projected orderings * fix docs * Port / Add Documentation for `VarianceSample` and `VariancePopulation` (#12742) * Upgrade arrow/parquet to `53.1.0` / fix clippy (#12724) * Update to arrow/parquet 53.1.0 * Update some API * update for changed file sizes * Use non deprecated APIs * Use ParquetMetadataReader from @etseidl * remove upstreamed implementation * Update CSV schema * Use upstream is_null and is_not_null kernels * feat: add support for Substrait ExtendedExpression (#12728) * Add support for serializing and deserializing Substrait ExtendedExpr message * Address clippy reviews * Reuse existing rename method * Transformed::new_transformed: Fix documentation formatting (#12787) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fix: Correct results for grouping sets when columns contain nulls (#12571) * Fix grouping sets behavior when data contains nulls * PR suggestion comment * Update new test case * Add grouping_id to the logical plan * Add doc comment next to INTERNAL_GROUPING_ID * Fix unparsing of Aggregate with grouping sets --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Migrate documentation for all string functions from scalar_functions.md to code (#12775) * Added documentation for string and unicode functions. * Fixed issues with aliases. * Cargo fmt. * Minor doc fixes. * Update docs for var_pop/samp --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Account for constant equivalence properties in union, tests (#12562) * Minor: clarify comment about empty dependencies (#12786) * Introduce Signature::String and return error if input of `strpos` is integer (#12751) * fix sig Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix error Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix all signature Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix all signature Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * change default type Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * clippy Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix docs Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm deadcode Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * Minor: improve docs on MovingMin/MovingMax (#12790) * Add slt tests (#12721) --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> Signed-off-by: Austin Liu <austin362667@gmail.com> Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com> Co-authored-by: Jonah Gao <jonahgao@msn.com> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Tomoaki Kawada <kawada@kmckk.co.jp> Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Jay Zhan <jayzhan211@gmail.com> Co-authored-by: HuSen <husen.xjtu@gmail.com> Co-authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@gmail.com> Co-authored-by: Simon Vandel Sillesen <simon.vandel@gmail.com> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Austin Liu <austin362667@gmail.com> Co-authored-by: JonasDev1 <jswipp@googlemail.com> Co-authored-by: jcsherin <jacob@protoship.io> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Andy Grove <agrove@apache.org> Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com> Co-authored-by: DouPache <douenergy@gmail.com> Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com> Co-authored-by: Bryce Mecum <petridish@gmail.com> Co-authored-by: wiedld <wiedld@users.noreply.github.com> Co-authored-by: kamille <caoruiqiu.crq@antgroup.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Val Lorentz <vlorentz@softwareheritage.org>

fix sig

b60ddea

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions labels Oct 4, 2024

jayzhan211 marked this pull request as draft October 4, 2024 05:23

fix

9ead621

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 commented Oct 4, 2024

View reviewed changes

jayzhan211 added 4 commits October 4, 2024 13:30

fix error

86e005f

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

fix all signature

b1aa23b

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

fix all signature

3b5695a

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

change default type

f38ef3b

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

github-actions bot added the core Core DataFusion crate label Oct 4, 2024

clippy

b002c39

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 marked this pull request as ready for review October 4, 2024 09:26

jayzhan211 commented Oct 4, 2024

View reviewed changes

jayzhan211 mentioned this pull request Oct 4, 2024

Fix strpos invocation with dictionary and null #12712

Merged

findepi reviewed Oct 4, 2024

View reviewed changes

alamb approved these changes Oct 4, 2024

View reviewed changes

fix docs

b95c66a

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

Merge branch 'main' of https://github.com/apache/datafusion into strp…

10f5638

…os-signaure

jayzhan211 added 4 commits October 8, 2024 08:46

rm deadcode

984b5cc

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

Merge branch 'main' of https://github.com/apache/datafusion into strp…

045a202

…os-signaure

cleanup

1b37176

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

cleanup

f10d43d

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 mentioned this pull request Oct 8, 2024

[EPIC] Decouple logical from physical types #12622

Open

6 tasks

rm test

6b0f2d8

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

jayzhan211 merged commit 47664df into apache:main Oct 8, 2024
24 checks passed

jayzhan211 deleted the strpos-signaure branch October 8, 2024 07:05


		use arrow::array::{ArrayRef, Float64Array, Int64Array};
		use datafusion_expr::TypeSignature::*;

	// Any type can be coerced into strings
	(Utf8 \| LargeUtf8, _) => Some(type_into.clone()),

		for t in current_types.iter().skip(1) {
		coerced_type = coercion_rule(&coerced_type, t)?;

Introduce Signature::String and return error if input of strpos is integer #12751

Introduce Signature::String and return error if input of strpos is integer #12751

Conversation

jayzhan211 commented Oct 4, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Others

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayzhan211 Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayzhan211 commented Oct 5, 2024

findepi commented Oct 7, 2024

jayzhan211 commented Oct 8, 2024

jayzhan211 commented Oct 8, 2024

findepi commented Oct 8, 2024

Introduce Signature::String and return error if input of `strpos` is integer #12751

Introduce Signature::String and return error if input of `strpos` is integer #12751

jayzhan211 commented Oct 4, 2024 •

edited

Loading

jayzhan211 Oct 4, 2024 •

edited

Loading