Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Apr 1, 2023

Rationale for this change

Substrait provides a library-independent way to represent compute expressions. By serializing and deserializing pyarrow compute expression to substrait we can allow interoperability with other libraries.

Originally it was thought this would not be needed because users would be sending entire query plans (which contain expressions) back and forth and so there was no need to work with expressions by themselves.

However, as more and more APIs and integration points emerge it turns out there are situations where serializing expressions by themselves is useful. For example, the proposed datasets protocol, or for the Java JNI datasets implementation (which uses Arrow-C++'s datasets)

What changes are included in this PR?

In Arrow-C++ we add two new methods to serialize and deserialize a collection of named, bound expressions to Substrait's ExtendedExpression message.

In pyarrow we expose these two methods and also add utility methods to pyarrow.compute.Expression to convert a single expression to/from substrait (these will be encoded as an ExtendedExpression message with one expression named "expression")

In addition, this PR exposed that we do not have very many bindings for arrow-functions to substrait-functions (previous work has mostly focused on the reverse direction). This PR adds many (though not all) new bindings.

In addition, this PR adds ToProto for cast and both FromProto and ToProto support for the SingularOrList expression type (we convert is_in to SingularOrList and convert SingularOrList to an or list).

This should provide support for all the sargable operators except between (there is no Arrow-C++ between function) and like (we still don't have arrow->substrait bindings for the string functions) which should be a sufficient set of expressions for a first release.

Are these changes tested?

Yes.

Are there any user-facing changes?

There are new features, as described above, but no backwards incompatible changes.

Caveats

There are a fair number of minor inconsistencies or surprises, many of which can be smoothed over by follow-up work.

Bound Expressions

Arrow-C++ has long had a distinction between "unbound expressions" (e.g. a + b) and "bound expressions" (e.g. a:i32 + b:i32). A bound expression is an expression that has been bound to a schema of some kind. Field references are resolved and the output type is known for every node of the AST.

Pyarrow has hidden this complexity and most pyarrow compute expressions that the user encounters will be unbound expressions. Substrait is only capable (currently) of representing bound expressions. As a result, in order to serialize expressions, the user will need to provide an input schema. This can be an inconvenience for some workflows. To resolve this, I would like to eventually add support for unbound expressions to Substrait (substrait-io/substrait#515)

Another minor annoyance of bound expressions is that an unbound pyarrow.compute.Expression object will not be equal to a bound pyarrow.compute.Expression object. It would make testing easier if we had a pyarrow.compute.Expression.equals variant that did not examine bound fields.

Named field references

Pyarrow datasets users are used to working with named field references. For example, one can set a filter pc.equal(ds.field("x"), 7). Substrait, since it requires everything to be bound, considers named references to be superfluous and does everything in terms of numeric indices into the base schema. So the above expression, after round tripping, would become something like pc.equal(ds.field(3), 7) (assuming "x" is at index 3 in the schema used for serialization). This is something that can be overcome in the future if Substrait adds support for unbound expressions. Or, if that doesn't happen, it could still be implemented as a Substrait expression hint (this would allow named references to be used even if the user wants to work with bound expressions).

UDFs

UDFs ARE supported by this PR. This covers both "builtin arrow functions that do not exist in substrait (e.g. shift_left)" and "custom UDFs added with register_scalar_function". By default, UDFs will not be allowed when converting to Substrait because the resulting message would not be portable (e.g. you can't expect an external system to know about your custom UDFs). However, you can set the allow_udfs flag to True and these will be allowed. The Substrait representation will have the URI urn:arrow:substrait_simple_extension_function.

Options: Although UDFs are allowed we do not yet support UDFs that take function options. These are trickier to convert to Substrait (though it should be possible in the future if someone is motivated enough).

Rough Edges

There are a few corner cases:

  • The function is_in converts to Substrait's SingularOrList. On conversion back to Arrow this becomes an or list. In other words, the function is_in(5, [1, 2, 5]) converts to 5 == 1 || 5 == 2 || 5 == 5. This is because Substrait's or list is more expression and allows things like 5 == field_ref(0) || 5 == 7 which cannot be expressed as an is_in function.
  • Arrow functions can either be converted to Substrait or are considered UDFs. However, there are a small number of functions which can "sometimes" be converted to Substrait depending on the function options. At the moment I think this is only the is_null function. The is_null function has an option nan_is_null which will allow you to consider NaN as a null value. Substrait has no single function that evaluates both NULL and NaN as true. In the meantime you can use is_null || is_nan. In the future, should someone want to, they could add special logic to convert this case.
  • Closes: [C++] Support specifying filters and projections with Substrait expressions #33985

@github-actions
Copy link

github-actions bot commented Apr 1, 2023

@github-actions
Copy link

github-actions bot commented Apr 1, 2023

⚠️ GitHub issue #33985 has been automatically assigned in GitHub to PR creator.

@westonpace
Copy link
Member Author

Leaving as draft as I need to add more test cases as well as python bindings.

@westonpace
Copy link
Member Author

I've added python bindings. Now all that is needed is documentation / examples

@westonpace
Copy link
Member Author

I'm marking this ready for review. I still want to add a few unit tests that verify we correctly raise errors when given Substrait expressions that Arrow cannot handle (e.g. MultiOrList) but these should be a minor addition.

@westonpace
Copy link
Member Author

The appveyor failure seems valid (though utterly baffling): https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/47443264

@westonpace westonpace marked this pull request as ready for review July 1, 2023 21:43
@westonpace westonpace requested a review from AlenkaF as a code owner July 1, 2023 21:43
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice and comprehensive description!

Did a review of the python bindings, which are looking good, just some minor comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful? (it can give multiple expressions with the same name?) Or could also raise an error instead?

I would maybe rather validate that len(exprs) == len(names)

(and in that case you can also do for expr, name in zip(exprs, names): .. to simplify the code a bit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was on the fence here. I don't have any real use case for it. I agree it feels better to just give an error if they don't give the same number of names. I'll fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to now raise an error if the length of names and exprs doesn't match (and I use zip now and added a test case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return _pas().serialize_expressions([self], "expression", schema, allow_udfs=allow_udfs)
return _pas().serialize_expressions([self], ["expression"], schema, allow_udfs=allow_udfs)

This currently causes the bug that a deserialized form of this has "e" as name:

In [16]: expr = pc.field("a") == 1

In [17]: buf = expr.to_substrait(pa.schema([('a', 'int32')]))

In [18]: pyarrow.substrait.deserialize_expressions(buf).expressions
Out[18]: {'e': <pyarrow.compute.Expression (FieldPath(0) == 1)>}

(so might be good to add a test for this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This also gets caught now because exprs and names don't have the same length. I've added a test for this as well.

Comment on lines 195 to 196
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the Substrait expression ``a_i32 + b_i32`` is different from the
Substrait expression ``a_i64 + b_i64``. Pyarrow expressions are
the Substrait expression ``a:i32 + b:i32`` is different from the
Substrait expression ``a:i64 + b:i64``. Pyarrow expressions are

? (that might be clearer that the actual field names are still "a" and "b" in both cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "udf" in the keyword name might be a bit confusing, as I think users of pyarrow will think in the form of actual UDFs defined by them, and not functions defined by arrow (but not part of substrait), as for the user, those are "built-in" functions, not UDFs.

I see that in the C++ code you are using allow_arrow_extensions as keyword. We can use that here as well? (or is there a specific reason you went for a different name?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot I needed to reconcile this :) Both made sense to me.

Setting this to true enables both "arrow builtin functions that are not substrait functions" and "user registered udfs (or actual UDFs)" (from Substrait's perspective these are the same thing). My thinking was that, as Substrait's function support expands, the first case wouldn't be encountered as much. However, I think both names are still ok. I'm happy to switch to "allow_arrow_extensions"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to allow_arrow_extensions

Comment on lines +960 to +981
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have an equals method on the Expression if you want to avoid string repr comparison (but not sure what the corner cases for either option)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below this doesn't work because of the kernels binding.

Comment on lines 3331 to 3338
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I don't fully understand: this are expressions with calls but without any field reference, only with scalars which already have a type. So why is bound/unbound relevant in this case? (I would have expected that only be relevant for references to fields in the schema)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we bind expressions, in addition to looking up field references, we look up matching kernels (and will actually apply implicit casts) and we store the matching kernel as part of the expression. So "1 + 3" goes from:

function: add
args:
  1: i64
  3: i64
kernel: nullptr

...to...

function: add
args:
  1: i64
  3: i64
kernel: add<i64,i64>

And unfortunately, this means that Expression::Equals does not compare the two as equal. I've added #36427 to hopefully add that option at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation! I hadn't considered that the function vs kernel distinction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer to split out the substrait-based serialization to a separate test, but then we need to factor out the expression creation into a helper function? (it's certainly OK to just leave as is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and split them out.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 6, 2023
@jorisvandenbossche
Copy link
Member

The function is_in converts to Substrait's SingularOrList. On conversion back to Arrow this becomes an or list. .... This is because Substrait's or list is more expressive ..

And substrait doesn't have an "is_in" like function? (or are there plans for that?)
(this conversion seems unfortunate, as "is_in" can be more efficient than the equivalent or-list)

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jul 7, 2023
@westonpace
Copy link
Member Author

And substrait doesn't have an "is_in" like function? (or are there plans for that?)
(this conversion seems unfortunate, as "is_in" can be more efficient than the equivalent or-list)

It's an interesting point. We have things like this outside of expressions too. For example, the "join" node doesn't distinguish between an equality join (which can be done efficiently with a hashmap) and a non-equality join (which cannot). In that case we actually have both representations. The one people typically use is the "JoinRel" which is a logical operator and thus allowed to be more generic without concern for efficiency and the other one is the "HashJoinRel" which is more specific / physical, but typically not created by producers (instead planners or optimizers convert from one to the other).

I think this is interesting because "is_in" vs. "singular-or-list" is basically a logical vs physical distinction for expressions which I don't think I've really considered before, but I agree with you its valid.

In any case, it will be easy enough in Acero's converter, to recognize the cases that can collapse to is_in and use it where appropriate. I've created #36535 to track this.

@westonpace
Copy link
Member Author

I also created substrait-io/substrait#517 on the substrait side in case anyone wants to chime in.

@ianmcook
Copy link
Member

@westonpace @jorisvandenbossche is this ready to merge?

@westonpace westonpace force-pushed the feature/GH-33985--expression-serialization-substrait branch from fb3bd15 to 4280c4a Compare August 21, 2023 18:56
@westonpace
Copy link
Member Author

Yes. Sorry, I have rebased one last time and will merge as soon as CI is green.

@westonpace
Copy link
Member Author

Failures appear unrelated.

@westonpace westonpace merged commit 702e9ca into apache:main Aug 22, 2023
@westonpace westonpace removed the awaiting change review Awaiting change review label Aug 22, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 702e9ca.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

@kou
Copy link
Member

kou commented Aug 26, 2023

@github-actions crossbow submit example-python-minimal-build-fedora-conda

@github-actions
Copy link

Revision: 4280c4a

Submitted crossbow builds: ursacomputing/crossbow @ actions-44bfd49cd1

Task Status
example-python-minimal-build-fedora-conda Github Actions

@github-actions github-actions bot added the awaiting changes Awaiting changes label Aug 26, 2023
lidavidm added a commit that referenced this pull request Sep 20, 2023
…ilter as a Substrait proto extended expression (#35570)

### Rationale for this change

To close #34252

### What changes are included in this PR?

This is a proposal to try to solve:
1. Receive a list of Substrait scalar expressions and use them to Project a Dataset
- [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus)
- [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages
- [x] Create JNI Wrapper for ScannerBuilder::Project 
- [x] Create JNI API
- [x] Testing coverage
- [x] Documentation

Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by #35798

This PR needs/use this PRs/Issues:
- #34834
- #34227
- #35579

2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset
- [x] Working to identify activities

### Are these changes tested?

Initial unit test added.

### Are there any user-facing changes?

No
* Closes: #34252

Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@ianmcook
Copy link
Member

In case this helps anyone, there's an example here showing how this can be used through Python: https://gist.github.com/ianmcook/f70fc185d29ae97bdf85ffe0378c68e0

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
… expressions (apache#34834)

### Rationale for this change

Substrait provides a library-independent way to represent compute expressions.  By serializing and deserializing pyarrow compute expression to substrait we can allow interoperability with other libraries.

Originally it was thought this would not be needed because users would be sending entire query plans (which contain expressions) back and forth and so there was no need to work with expressions by themselves.

However, as more and more APIs and integration points emerge it turns out there are situations where serializing expressions by themselves is useful.  For example, the proposed datasets protocol, or for the Java JNI datasets implementation (which uses Arrow-C++'s datasets)

### What changes are included in this PR?

In Arrow-C++ we add two new methods to serialize and deserialize a collection of named, bound expressions to Substrait's ExtendedExpression message.

In pyarrow we expose these two methods and also add utility methods to pyarrow.compute.Expression to convert a single expression to/from substrait (these will be encoded as an ExtendedExpression message with one expression named "expression")

In addition, this PR exposed that we do not have very many bindings for arrow-functions to substrait-functions (previous work has mostly focused on the reverse direction).  This PR adds many (though not all) new bindings.

In addition, this PR adds ToProto for cast and both FromProto and ToProto support for the SingularOrList expression type (we convert is_in to SingularOrList and convert SingularOrList to an or list).

This should provide support for all the sargable operators except between (there is no Arrow-C++ between function) and like (we still don't have arrow->substrait bindings for the string functions) which should be a sufficient set of expressions for a first release.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

There are new features, as described above, but no backwards incompatible changes.

### Caveats

There are a fair number of minor inconsistencies or surprises, many of which can be smoothed over by follow-up work.

#### Bound Expressions

Arrow-C++ has long had a distinction between "unbound expressions" (e.g. `a + b`) and "bound expressions" (e.g. `a:i32 + b:i32`).  A bound expression is an expression that has been bound to a schema of some kind.  Field references are resolved and the output type is known for every node of the AST.

Pyarrow has hidden this complexity and most pyarrow compute expressions that the user encounters will be unbound expressions.  Substrait is only capable (currently) of representing bound expressions.  As a result, in order to serialize expressions, the user will need to provide an input schema.  This can be an inconvenience for some workflows.  To resolve this, I would like to eventually add support for unbound expressions to Substrait (substrait-io/substrait#515)

Another minor annoyance of bound expressions is that an unbound pyarrow.compute.Expression object will not be equal to a bound pyarrow.compute.Expression object.  It would make testing easier if we had a `pyarrow.compute.Expression.equals` variant that did not examine bound fields.

#### Named field references

Pyarrow datasets users are used to working with named field references.  For example, one can set a filter `pc.equal(ds.field("x"), 7)`.  Substrait, since it requires everything to be bound, considers named references to be superfluous and does everything in terms of numeric indices into the base schema.  So the above expression, after round tripping, would become something like `pc.equal(ds.field(3), 7)` (assuming `"x"` is at index `3` in the schema used for serialization).  This is something that can be overcome in the future if Substrait adds support for unbound expressions.  Or, if that doesn't happen, it could still be implemented as a Substrait expression hint (this would allow named references to be used even if the user wants to work with bound expressions).

#### UDFs

UDFs ARE supported by this PR.  This covers both "builtin arrow functions that do not exist in substrait (e.g. shift_left)" and "custom UDFs added with `register_scalar_function`".  By default, UDFs will not be allowed when converting to Substrait because the resulting message would not be portable (e.g. you can't expect an external system to know about your custom UDFs).  However, you can set the `allow_udfs` flag to True and these will be allowed.  The Substrait representation will have the URI `urn:arrow:substrait_simple_extension_function`.

**Options**: Although UDFs are allowed we do not yet support UDFs that take function options.  These are trickier to convert to Substrait (though it should be possible in the future if someone is motivated enough).

#### Rough Edges

There are a few corner cases:

 * The function `is_in` converts to Substrait's `SingularOrList`.  On conversion back to Arrow this becomes an or list.  In other words, the function `is_in(5, [1, 2, 5])` converts to `5 == 1 || 5 == 2 || 5 == 5`.  This is because Substrait's or list is more expression and allows things like `5 == field_ref(0) || 5 == 7` which cannot be expressed as an `is_in` function.
 * Arrow functions can either be converted to Substrait or are considered UDFs.  However, there are a small number of functions which can "sometimes" be converted to Substrait depending on the function options.  At the moment I think this is only the `is_null` function.  The `is_null` function has an option `nan_is_null` which will allow you to consider `NaN` as a null value.  Substrait has no single function that evaluates both `NULL` and `NaN` as true.  In the meantime you can use `is_null || is_nan`.  In the future, should someone want to, they could add special logic to convert this case.
* Closes: apache#33985

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…der::Filter as a Substrait proto extended expression (apache#35570)

### Rationale for this change

To close apache#34252

### What changes are included in this PR?

This is a proposal to try to solve:
1. Receive a list of Substrait scalar expressions and use them to Project a Dataset
- [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus)
- [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages
- [x] Create JNI Wrapper for ScannerBuilder::Project 
- [x] Create JNI API
- [x] Testing coverage
- [x] Documentation

Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by apache#35798

This PR needs/use this PRs/Issues:
- apache#34834
- apache#34227
- apache#35579

2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset
- [x] Working to identify activities

### Are these changes tested?

Initial unit test added.

### Are there any user-facing changes?

No
* Closes: apache#34252

Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…der::Filter as a Substrait proto extended expression (apache#35570)

### Rationale for this change

To close apache#34252

### What changes are included in this PR?

This is a proposal to try to solve:
1. Receive a list of Substrait scalar expressions and use them to Project a Dataset
- [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus)
- [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages
- [x] Create JNI Wrapper for ScannerBuilder::Project 
- [x] Create JNI API
- [x] Testing coverage
- [x] Documentation

Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by apache#35798

This PR needs/use this PRs/Issues:
- apache#34834
- apache#34227
- apache#35579

2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset
- [x] Working to identify activities

### Are these changes tested?

Initial unit test added.

### Are there any user-facing changes?

No
* Closes: apache#34252

Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
kou pushed a commit to apache/arrow-java that referenced this pull request Nov 25, 2024
…ilter as a Substrait proto extended expression (#35570)

### Rationale for this change

To close apache/arrow#34252

### What changes are included in this PR?

This is a proposal to try to solve:
1. Receive a list of Substrait scalar expressions and use them to Project a Dataset
- [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus)
- [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages
- [x] Create JNI Wrapper for ScannerBuilder::Project 
- [x] Create JNI API
- [x] Testing coverage
- [x] Documentation

Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by apache/arrow#35798

This PR needs/use this PRs/Issues:
- apache/arrow#34834
- apache/arrow#34227
- apache/arrow#35579

2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset
- [x] Working to identify activities

### Are these changes tested?

Initial unit test added.

### Are there any user-facing changes?

No
* Closes: #34252

Lead-authored-by: david dali susanibar arce <davi.sarces@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Support specifying filters and projections with Substrait expressions

4 participants