GH-33985: [C++] Add substrait serialization/deserialization for expressions #34834

westonpace · 2023-04-01T01:31:54Z

Rationale for this change

Substrait provides a library-independent way to represent compute expressions. By serializing and deserializing pyarrow compute expression to substrait we can allow interoperability with other libraries.

Originally it was thought this would not be needed because users would be sending entire query plans (which contain expressions) back and forth and so there was no need to work with expressions by themselves.

However, as more and more APIs and integration points emerge it turns out there are situations where serializing expressions by themselves is useful. For example, the proposed datasets protocol, or for the Java JNI datasets implementation (which uses Arrow-C++'s datasets)

What changes are included in this PR?

In Arrow-C++ we add two new methods to serialize and deserialize a collection of named, bound expressions to Substrait's ExtendedExpression message.

In pyarrow we expose these two methods and also add utility methods to pyarrow.compute.Expression to convert a single expression to/from substrait (these will be encoded as an ExtendedExpression message with one expression named "expression")

In addition, this PR exposed that we do not have very many bindings for arrow-functions to substrait-functions (previous work has mostly focused on the reverse direction). This PR adds many (though not all) new bindings.

In addition, this PR adds ToProto for cast and both FromProto and ToProto support for the SingularOrList expression type (we convert is_in to SingularOrList and convert SingularOrList to an or list).

This should provide support for all the sargable operators except between (there is no Arrow-C++ between function) and like (we still don't have arrow->substrait bindings for the string functions) which should be a sufficient set of expressions for a first release.

Are these changes tested?

Yes.

Are there any user-facing changes?

There are new features, as described above, but no backwards incompatible changes.

Caveats

There are a fair number of minor inconsistencies or surprises, many of which can be smoothed over by follow-up work.

Bound Expressions

Arrow-C++ has long had a distinction between "unbound expressions" (e.g. a + b) and "bound expressions" (e.g. a:i32 + b:i32). A bound expression is an expression that has been bound to a schema of some kind. Field references are resolved and the output type is known for every node of the AST.

Pyarrow has hidden this complexity and most pyarrow compute expressions that the user encounters will be unbound expressions. Substrait is only capable (currently) of representing bound expressions. As a result, in order to serialize expressions, the user will need to provide an input schema. This can be an inconvenience for some workflows. To resolve this, I would like to eventually add support for unbound expressions to Substrait (substrait-io/substrait#515)

Another minor annoyance of bound expressions is that an unbound pyarrow.compute.Expression object will not be equal to a bound pyarrow.compute.Expression object. It would make testing easier if we had a pyarrow.compute.Expression.equals variant that did not examine bound fields.

Named field references

Pyarrow datasets users are used to working with named field references. For example, one can set a filter pc.equal(ds.field("x"), 7). Substrait, since it requires everything to be bound, considers named references to be superfluous and does everything in terms of numeric indices into the base schema. So the above expression, after round tripping, would become something like pc.equal(ds.field(3), 7) (assuming "x" is at index 3 in the schema used for serialization). This is something that can be overcome in the future if Substrait adds support for unbound expressions. Or, if that doesn't happen, it could still be implemented as a Substrait expression hint (this would allow named references to be used even if the user wants to work with bound expressions).

UDFs

UDFs ARE supported by this PR. This covers both "builtin arrow functions that do not exist in substrait (e.g. shift_left)" and "custom UDFs added with register_scalar_function". By default, UDFs will not be allowed when converting to Substrait because the resulting message would not be portable (e.g. you can't expect an external system to know about your custom UDFs). However, you can set the allow_udfs flag to True and these will be allowed. The Substrait representation will have the URI urn:arrow:substrait_simple_extension_function.

Options: Although UDFs are allowed we do not yet support UDFs that take function options. These are trickier to convert to Substrait (though it should be possible in the future if someone is motivated enough).

Rough Edges

There are a few corner cases:

The function is_in converts to Substrait's SingularOrList. On conversion back to Arrow this becomes an or list. In other words, the function is_in(5, [1, 2, 5]) converts to 5 == 1 || 5 == 2 || 5 == 5. This is because Substrait's or list is more expression and allows things like 5 == field_ref(0) || 5 == 7 which cannot be expressed as an is_in function.
Arrow functions can either be converted to Substrait or are considered UDFs. However, there are a small number of functions which can "sometimes" be converted to Substrait depending on the function options. At the moment I think this is only the is_null function. The is_null function has an option nan_is_null which will allow you to consider NaN as a null value. Substrait has no single function that evaluates both NULL and NaN as true. In the meantime you can use is_null || is_nan. In the future, should someone want to, they could add special logic to convert this case.
Closes: [C++] Support specifying filters and projections with Substrait expressions #33985

github-actions · 2023-04-01T01:32:18Z

Closes: [C++] Support specifying filters and projections with Substrait expressions #33985

github-actions · 2023-04-01T01:32:21Z

⚠️ GitHub issue #33985 has been automatically assigned in GitHub to PR creator.

westonpace · 2023-04-01T01:32:38Z

Leaving as draft as I need to add more test cases as well as python bindings.

westonpace · 2023-05-16T07:29:12Z

I've added python bindings. Now all that is needed is documentation / examples

westonpace · 2023-07-01T15:57:01Z

I'm marking this ready for review. I still want to add a few unit tests that verify we correctly raise errors when given Substrait expressions that Arrow cannot handle (e.g. MultiOrList) but these should be a minor addition.

westonpace · 2023-07-01T17:00:34Z

The appveyor failure seems valid (though utterly baffling): https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/47443264

jorisvandenbossche

Thanks for the nice and comprehensive description!

Did a review of the python bindings, which are looking good, just some minor comments.

python/pyarrow/_compute.pyx

jorisvandenbossche · 2023-07-05T13:05:39Z

python/pyarrow/_substrait.pyx

Is this useful? (it can give multiple expressions with the same name?) Or could also raise an error instead?

I would maybe rather validate that len(exprs) == len(names)

(and in that case you can also do for expr, name in zip(exprs, names): .. to simplify the code a bit)

Ok, I was on the fence here. I don't have any real use case for it. I agree it feels better to just give an error if they don't give the same number of names. I'll fix this.

I've updated this to now raise an error if the length of names and exprs doesn't match (and I use zip now and added a test case).

jorisvandenbossche · 2023-07-06T12:17:48Z

python/pyarrow/_compute.pyx

Suggested change

return _pas().serialize_expressions([self], "expression", schema, allow_udfs=allow_udfs)

return _pas().serialize_expressions([self], ["expression"], schema, allow_udfs=allow_udfs)

This currently causes the bug that a deserialized form of this has "e" as name:

In [16]: expr = pc.field("a") == 1 In [17]: buf = expr.to_substrait(pa.schema([('a', 'int32')])) In [18]: pyarrow.substrait.deserialize_expressions(buf).expressions Out[18]: {'e': <pyarrow.compute.Expression (FieldPath(0) == 1)>}

(so might be good to add a test for this)

Oops. This also gets caught now because exprs and names don't have the same length. I've added a test for this as well.

jorisvandenbossche · 2023-07-06T12:19:04Z

python/pyarrow/_substrait.pyx

Suggested change

the Substrait expression ``a_i32 + b_i32`` is different from the

Substrait expression ``a_i64 + b_i64``. Pyarrow expressions are

the Substrait expression ``a:i32 + b:i32`` is different from the

Substrait expression ``a:i64 + b:i64``. Pyarrow expressions are

? (that might be clearer that the actual field names are still "a" and "b" in both cases)

I've changed to this.

jorisvandenbossche · 2023-07-06T12:22:03Z

python/pyarrow/_substrait.pyx

The "udf" in the keyword name might be a bit confusing, as I think users of pyarrow will think in the form of actual UDFs defined by them, and not functions defined by arrow (but not part of substrait), as for the user, those are "built-in" functions, not UDFs.

I see that in the C++ code you are using allow_arrow_extensions as keyword. We can use that here as well? (or is there a specific reason you went for a different name?)

Ah, I forgot I needed to reconcile this :) Both made sense to me.

Setting this to true enables both "arrow builtin functions that are not substrait functions" and "user registered udfs (or actual UDFs)" (from Substrait's perspective these are the same thing). My thinking was that, as Substrait's function support expands, the first case wouldn't be encountered as much. However, I think both names are still ok. I'm happy to switch to "allow_arrow_extensions"

I switched to allow_arrow_extensions

jorisvandenbossche · 2023-07-06T12:46:37Z

python/pyarrow/tests/test_substrait.py

We also have an equals method on the Expression if you want to avoid string repr comparison (but not sure what the corner cases for either option)

As mentioned below this doesn't work because of the kernels binding.

jorisvandenbossche · 2023-07-06T12:57:19Z

python/pyarrow/tests/test_compute.py

This one I don't fully understand: this are expressions with calls but without any field reference, only with scalars which already have a type. So why is bound/unbound relevant in this case? (I would have expected that only be relevant for references to fields in the schema)

When we bind expressions, in addition to looking up field references, we look up matching kernels (and will actually apply implicit casts) and we store the matching kernel as part of the expression. So "1 + 3" goes from:

function: add args: 1: i64 3: i64 kernel: nullptr

...to...

function: add args: 1: i64 3: i64 kernel: add<i64,i64>

And unfortunately, this means that Expression::Equals does not compare the two as equal. I've added #36427 to hopefully add that option at some point.

I see, thanks for the explanation! I hadn't considered that the function vs kernel distinction

python/pyarrow/tests/test_compute.py

jorisvandenbossche · 2023-07-06T13:26:47Z

python/pyarrow/tests/test_compute.py

It might be nicer to split out the substrait-based serialization to a separate test, but then we need to factor out the expression creation into a helper function? (it's certainly OK to just leave as is)

I went ahead and split them out.

jorisvandenbossche · 2023-07-06T13:31:16Z

The function is_in converts to Substrait's SingularOrList. On conversion back to Arrow this becomes an or list. .... This is because Substrait's or list is more expressive ..

And substrait doesn't have an "is_in" like function? (or are there plans for that?)
(this conversion seems unfortunate, as "is_in" can be more efficient than the equivalent or-list)

westonpace · 2023-07-07T04:49:18Z

And substrait doesn't have an "is_in" like function? (or are there plans for that?)
(this conversion seems unfortunate, as "is_in" can be more efficient than the equivalent or-list)

It's an interesting point. We have things like this outside of expressions too. For example, the "join" node doesn't distinguish between an equality join (which can be done efficiently with a hashmap) and a non-equality join (which cannot). In that case we actually have both representations. The one people typically use is the "JoinRel" which is a logical operator and thus allowed to be more generic without concern for efficiency and the other one is the "HashJoinRel" which is more specific / physical, but typically not created by producers (instead planners or optimizers convert from one to the other).

I think this is interesting because "is_in" vs. "singular-or-list" is basically a logical vs physical distinction for expressions which I don't think I've really considered before, but I agree with you its valid.

In any case, it will be easy enough in Acero's converter, to recognize the cases that can collapse to is_in and use it where appropriate. I've created #36535 to track this.

westonpace · 2023-07-07T16:08:02Z

I also created substrait-io/substrait#517 on the substrait side in case anyone wants to chime in.

ianmcook · 2023-08-18T14:24:24Z

@westonpace @jorisvandenbossche is this ready to merge?

…ed substrait version to 0.27

…ired up allow_udfs

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…is owned by the extension set and will not go out of scope.

westonpace · 2023-08-21T18:56:53Z

Yes. Sorry, I have rebased one last time and will merge as soon as CI is green.

westonpace · 2023-08-22T16:27:16Z

Failures appear unrelated.

conbench-apache-arrow · 2023-08-23T07:15:55Z

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 702e9ca.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

kou · 2023-08-26T07:39:32Z

@github-actions crossbow submit example-python-minimal-build-fedora-conda

github-actions · 2023-08-26T07:42:00Z

Revision: 4280c4a

Submitted crossbow builds: ursacomputing/crossbow @ actions-44bfd49cd1

Task	Status
example-python-minimal-build-fedora-conda

python/pyarrow/tests/test_substrait.py

…ilter as a Substrait proto extended expression (#35570) ### Rationale for this change To close #34252 ### What changes are included in this PR? This is a proposal to try to solve: 1. Receive a list of Substrait scalar expressions and use them to Project a Dataset - [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus) - [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages - [x] Create JNI Wrapper for ScannerBuilder::Project - [x] Create JNI API - [x] Testing coverage - [x] Documentation Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by #35798 This PR needs/use this PRs/Issues: - #34834 - #34227 - #35579 2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset - [x] Working to identify activities ### Are these changes tested? Initial unit test added. ### Are there any user-facing changes? No * Closes: #34252 Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: benibus <bpharks@gmx.com> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com> Signed-off-by: David Li <li.davidm96@gmail.com>

ianmcook · 2023-09-22T16:34:54Z

In case this helps anyone, there's an example here showing how this can be used through Python: https://gist.github.com/ianmcook/f70fc185d29ae97bdf85ffe0378c68e0

… expressions (apache#34834) ### Rationale for this change Substrait provides a library-independent way to represent compute expressions. By serializing and deserializing pyarrow compute expression to substrait we can allow interoperability with other libraries. Originally it was thought this would not be needed because users would be sending entire query plans (which contain expressions) back and forth and so there was no need to work with expressions by themselves. However, as more and more APIs and integration points emerge it turns out there are situations where serializing expressions by themselves is useful. For example, the proposed datasets protocol, or for the Java JNI datasets implementation (which uses Arrow-C++'s datasets) ### What changes are included in this PR? In Arrow-C++ we add two new methods to serialize and deserialize a collection of named, bound expressions to Substrait's ExtendedExpression message. In pyarrow we expose these two methods and also add utility methods to pyarrow.compute.Expression to convert a single expression to/from substrait (these will be encoded as an ExtendedExpression message with one expression named "expression") In addition, this PR exposed that we do not have very many bindings for arrow-functions to substrait-functions (previous work has mostly focused on the reverse direction). This PR adds many (though not all) new bindings. In addition, this PR adds ToProto for cast and both FromProto and ToProto support for the SingularOrList expression type (we convert is_in to SingularOrList and convert SingularOrList to an or list). This should provide support for all the sargable operators except between (there is no Arrow-C++ between function) and like (we still don't have arrow->substrait bindings for the string functions) which should be a sufficient set of expressions for a first release. ### Are these changes tested? Yes. ### Are there any user-facing changes? There are new features, as described above, but no backwards incompatible changes. ### Caveats There are a fair number of minor inconsistencies or surprises, many of which can be smoothed over by follow-up work. #### Bound Expressions Arrow-C++ has long had a distinction between "unbound expressions" (e.g. `a + b`) and "bound expressions" (e.g. `a:i32 + b:i32`). A bound expression is an expression that has been bound to a schema of some kind. Field references are resolved and the output type is known for every node of the AST. Pyarrow has hidden this complexity and most pyarrow compute expressions that the user encounters will be unbound expressions. Substrait is only capable (currently) of representing bound expressions. As a result, in order to serialize expressions, the user will need to provide an input schema. This can be an inconvenience for some workflows. To resolve this, I would like to eventually add support for unbound expressions to Substrait (substrait-io/substrait#515) Another minor annoyance of bound expressions is that an unbound pyarrow.compute.Expression object will not be equal to a bound pyarrow.compute.Expression object. It would make testing easier if we had a `pyarrow.compute.Expression.equals` variant that did not examine bound fields. #### Named field references Pyarrow datasets users are used to working with named field references. For example, one can set a filter `pc.equal(ds.field("x"), 7)`. Substrait, since it requires everything to be bound, considers named references to be superfluous and does everything in terms of numeric indices into the base schema. So the above expression, after round tripping, would become something like `pc.equal(ds.field(3), 7)` (assuming `"x"` is at index `3` in the schema used for serialization). This is something that can be overcome in the future if Substrait adds support for unbound expressions. Or, if that doesn't happen, it could still be implemented as a Substrait expression hint (this would allow named references to be used even if the user wants to work with bound expressions). #### UDFs UDFs ARE supported by this PR. This covers both "builtin arrow functions that do not exist in substrait (e.g. shift_left)" and "custom UDFs added with `register_scalar_function`". By default, UDFs will not be allowed when converting to Substrait because the resulting message would not be portable (e.g. you can't expect an external system to know about your custom UDFs). However, you can set the `allow_udfs` flag to True and these will be allowed. The Substrait representation will have the URI `urn:arrow:substrait_simple_extension_function`. **Options**: Although UDFs are allowed we do not yet support UDFs that take function options. These are trickier to convert to Substrait (though it should be possible in the future if someone is motivated enough). #### Rough Edges There are a few corner cases: * The function `is_in` converts to Substrait's `SingularOrList`. On conversion back to Arrow this becomes an or list. In other words, the function `is_in(5, [1, 2, 5])` converts to `5 == 1 || 5 == 2 || 5 == 5`. This is because Substrait's or list is more expression and allows things like `5 == field_ref(0) || 5 == 7` which cannot be expressed as an `is_in` function. * Arrow functions can either be converted to Substrait or are considered UDFs. However, there are a small number of functions which can "sometimes" be converted to Substrait depending on the function options. At the moment I think this is only the `is_null` function. The `is_null` function has an option `nan_is_null` which will allow you to consider `NaN` as a null value. Substrait has no single function that evaluates both `NULL` and `NaN` as true. In the meantime you can use `is_null || is_nan`. In the future, should someone want to, they could add special logic to convert this case. * Closes: apache#33985 Lead-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

…der::Filter as a Substrait proto extended expression (apache#35570) ### Rationale for this change To close apache#34252 ### What changes are included in this PR? This is a proposal to try to solve: 1. Receive a list of Substrait scalar expressions and use them to Project a Dataset - [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus) - [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages - [x] Create JNI Wrapper for ScannerBuilder::Project - [x] Create JNI API - [x] Testing coverage - [x] Documentation Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by apache#35798 This PR needs/use this PRs/Issues: - apache#34834 - apache#34227 - apache#35579 2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset - [x] Working to identify activities ### Are these changes tested? Initial unit test added. ### Are there any user-facing changes? No * Closes: apache#34252 Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: benibus <bpharks@gmx.com> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com> Signed-off-by: David Li <li.davidm96@gmail.com>

…ilter as a Substrait proto extended expression (#35570) ### Rationale for this change To close apache/arrow#34252 ### What changes are included in this PR? This is a proposal to try to solve: 1. Receive a list of Substrait scalar expressions and use them to Project a Dataset - [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus) - [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages - [x] Create JNI Wrapper for ScannerBuilder::Project - [x] Create JNI API - [x] Testing coverage - [x] Documentation Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by apache/arrow#35798 This PR needs/use this PRs/Issues: - apache/arrow#34834 - apache/arrow#34227 - apache/arrow#35579 2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset - [x] Working to identify activities ### Are these changes tested? Initial unit test added. ### Are there any user-facing changes? No * Closes: #34252 Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: benibus <bpharks@gmx.com> Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com> Signed-off-by: David Li <li.davidm96@gmail.com>

github-actions bot added the Component: C++ label Apr 1, 2023

github-actions bot added the awaiting committer review Awaiting committer review label Apr 1, 2023

davisusanibar mentioned this pull request May 12, 2023

GH-34252: [Java] Support ScannerBuilder::Project or ScannerBuilder::Filter as a Substrait proto extended expression #35570

Merged

7 tasks

westonpace force-pushed the feature/GH-33985--expression-serialization-substrait branch from c7003a1 to f2b0f8a Compare May 16, 2023 07:25

github-actions bot added the Component: Python label May 16, 2023

westonpace mentioned this pull request May 17, 2023

GH-33986: [Python] Add a minimal protocol for datasets #35568

Closed

wjones127 mentioned this pull request May 23, 2023

Improve compatibility with pyarrow compute expression lance-format/lance#849

Closed

westonpace force-pushed the feature/GH-33985--expression-serialization-substrait branch from f2b0f8a to 97c4573 Compare June 23, 2023 19:17

westonpace force-pushed the feature/GH-33985--expression-serialization-substrait branch from 29f9f59 to 44e0023 Compare July 1, 2023 14:51

github-actions bot added the Component: Documentation label Jul 1, 2023

westonpace requested review from jorisvandenbossche and wjones127 July 1, 2023 21:43

westonpace marked this pull request as ready for review July 1, 2023 21:43

westonpace requested a review from AlenkaF as a code owner July 1, 2023 21:43

jorisvandenbossche reviewed Jul 6, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 6, 2023

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jul 7, 2023

westonpace and others added 12 commits August 21, 2023 11:56

Added serialization and deserialization for ExtendedExpression. Updat…

305fc3d

…ed substrait version to 0.27

WIP

0e9a02b

Added python bindings to extended expression

ad823a6

Python lint

e41c7fc

cmake-format

849f82c

Added more test cases, added quite a few arrow->substrait mappings, w…

f939d7a

…ired up allow_udfs

Cleanup

b5d6f36

Lint / namespace issues

985b201

Numpydoc lint

1f29362

Update python/pyarrow/tests/test_compute.py

15c58de

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Addressing review comments

ebdb271

Tweaked the encoding of simple functions to ensure the function name …

4280c4a

…is owned by the extension set and will not go out of scope.

westonpace force-pushed the feature/GH-33985--expression-serialization-substrait branch from fb3bd15 to 4280c4a Compare August 21, 2023 18:56

westonpace merged commit 702e9ca into apache:main Aug 22, 2023

westonpace removed the awaiting change review Awaiting change review label Aug 22, 2023

kou reviewed Aug 26, 2023

View reviewed changes

python/pyarrow/tests/test_substrait.py Show resolved Hide resolved

github-actions bot added the awaiting changes Awaiting changes label Aug 26, 2023

danepitkin mentioned this pull request Sep 11, 2023

ARROW-17351: [C++][Compute] Implement a parser for Expressions #14287

Closed

	return _pas().serialize_expressions([self], "expression", schema, allow_udfs=allow_udfs)
	return _pas().serialize_expressions([self], ["expression"], schema, allow_udfs=allow_udfs)

-    the Substrait expression ``a_i32 + b_i32`` is different from the
-    Substrait expression ``a_i64 + b_i64``.  Pyarrow expressions are
+    the Substrait expression ``a:i32 + b:i32`` is different from the
+    Substrait expression ``a:i64 + b:i64``.  Pyarrow expressions are

GH-33985: [C++] Add substrait serialization/deserialization for expressions #34834

GH-33985: [C++] Add substrait serialization/deserialization for expressions #34834

Uh oh!

Conversation

westonpace commented Apr 1, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Caveats

Bound Expressions

Named field references

UDFs

Rough Edges

Uh oh!

github-actions bot commented Apr 1, 2023

Uh oh!

github-actions bot commented Apr 1, 2023

Uh oh!

westonpace commented Apr 1, 2023

Uh oh!

westonpace commented May 16, 2023

Uh oh!

westonpace commented Jul 1, 2023

Uh oh!

westonpace commented Jul 1, 2023

Uh oh!

jorisvandenbossche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jul 6, 2023

Uh oh!

westonpace commented Jul 7, 2023

Uh oh!

westonpace commented Jul 7, 2023

Uh oh!

ianmcook commented Aug 18, 2023

Uh oh!

westonpace commented Aug 21, 2023

Uh oh!

westonpace commented Aug 22, 2023

westonpace commented Apr 1, 2023 •

edited by github-actions bot

Loading

jorisvandenbossche left a comment •

edited

Loading