-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Data] [1/n] Predicate Expression Support #56313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
alexeykudinkin
merged 10 commits into
ray-project:master
from
goutamvenkat-anyscale:goutam/filter_expr
Sep 18, 2025
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
9f67d72
[Data] [1/n] Filter Expression Support
goutamvenkat-anyscale 81224e6
clean up
goutamvenkat-anyscale e2c5a15
Remove where
goutamvenkat-anyscale c806c0c
rst fix
goutamvenkat-anyscale 3cced46
Merge branch 'master' into goutam/filter_expr
goutamvenkat-anyscale ec18b73
Remove predicateExpr
goutamvenkat-anyscale 6907774
Remove old doc
goutamvenkat-anyscale dee95ac
Add todo comment
goutamvenkat-anyscale 2483f60
Change fn name
goutamvenkat-anyscale 24e1591
Address comments
goutamvenkat-anyscale File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,7 +4,7 @@ | |
| from abc import ABC, abstractmethod | ||
| from dataclasses import dataclass, field | ||
| from enum import Enum | ||
| from typing import Any, Callable, Dict, List | ||
| from typing import Any, Callable, Dict, List, Union | ||
|
|
||
| from ray.data.block import BatchColumn | ||
| from ray.data.datatype import DataType | ||
|
|
@@ -23,26 +23,40 @@ class Operation(Enum): | |
| SUB: Subtraction operation (-) | ||
| MUL: Multiplication operation (*) | ||
| DIV: Division operation (/) | ||
| FLOORDIV: Floor division operation (//) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there modulo?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pyarrow doesn't have a native mod kernel. Could be a good follow up |
||
| GT: Greater than comparison (>) | ||
| LT: Less than comparison (<) | ||
| GE: Greater than or equal comparison (>=) | ||
| LE: Less than or equal comparison (<=) | ||
| EQ: Equality comparison (==) | ||
| NE: Not equal comparison (!=) | ||
| AND: Logical AND operation (&) | ||
| OR: Logical OR operation (|) | ||
| NOT: Logical NOT operation (~) | ||
| IS_NULL: Check if value is null | ||
| IS_NOT_NULL: Check if value is not null | ||
| IN: Check if value is in a list | ||
| NOT_IN: Check if value is not in a list | ||
| """ | ||
|
|
||
| ADD = "add" | ||
| SUB = "sub" | ||
| MUL = "mul" | ||
| DIV = "div" | ||
| FLOORDIV = "floordiv" | ||
| GT = "gt" | ||
| LT = "lt" | ||
| GE = "ge" | ||
| LE = "le" | ||
| EQ = "eq" | ||
| NE = "ne" | ||
| AND = "and" | ||
| OR = "or" | ||
| NOT = "not" | ||
| IS_NULL = "is_null" | ||
| IS_NOT_NULL = "is_not_null" | ||
| IN = "in" | ||
| NOT_IN = "not_in" | ||
|
|
||
|
|
||
| @DeveloperAPI(stability="alpha") | ||
|
|
@@ -127,6 +141,14 @@ def __rtruediv__(self, other: Any) -> "Expr": | |
| """Reverse division operator (for literal / expr).""" | ||
| return LiteralExpr(other)._bin(self, Operation.DIV) | ||
|
|
||
| def __floordiv__(self, other: Any) -> "Expr": | ||
| """Floor division operator (//).""" | ||
| return self._bin(other, Operation.FLOORDIV) | ||
|
|
||
| def __rfloordiv__(self, other: Any) -> "Expr": | ||
| """Reverse floor division operator (for literal // expr).""" | ||
| return LiteralExpr(other)._bin(self, Operation.FLOORDIV) | ||
|
|
||
| # comparison | ||
| def __gt__(self, other: Any) -> "Expr": | ||
| """Greater than operator (>).""" | ||
|
|
@@ -148,6 +170,10 @@ def __eq__(self, other: Any) -> "Expr": | |
| """Equality operator (==).""" | ||
| return self._bin(other, Operation.EQ) | ||
|
|
||
| def __ne__(self, other: Any) -> "Expr": | ||
| """Not equal operator (!=).""" | ||
| return self._bin(other, Operation.NE) | ||
|
|
||
| # boolean | ||
| def __and__(self, other: Any) -> "Expr": | ||
| """Logical AND operator (&).""" | ||
|
|
@@ -157,6 +183,31 @@ def __or__(self, other: Any) -> "Expr": | |
| """Logical OR operator (|).""" | ||
| return self._bin(other, Operation.OR) | ||
|
|
||
| def __invert__(self) -> "Expr": | ||
| """Logical NOT operator (~).""" | ||
| return UnaryExpr(Operation.NOT, self) | ||
|
|
||
| # predicate methods | ||
| def is_null(self) -> "Expr": | ||
| """Check if the expression value is null.""" | ||
| return UnaryExpr(Operation.IS_NULL, self) | ||
|
|
||
| def is_not_null(self) -> "Expr": | ||
| """Check if the expression value is not null.""" | ||
| return UnaryExpr(Operation.IS_NOT_NULL, self) | ||
|
|
||
| def is_in(self, values: Union[List[Any], "Expr"]) -> "Expr": | ||
| """Check if the expression value is in a list of values.""" | ||
| if not isinstance(values, Expr): | ||
| values = LiteralExpr(values) | ||
| return self._bin(values, Operation.IN) | ||
|
|
||
| def not_in(self, values: Union[List[Any], "Expr"]) -> "Expr": | ||
| """Check if the expression value is not in a list of values.""" | ||
| if not isinstance(values, Expr): | ||
| values = LiteralExpr(values) | ||
| return self._bin(values, Operation.NOT_IN) | ||
|
|
||
|
|
||
| @DeveloperAPI(stability="alpha") | ||
| @dataclass(frozen=True, eq=False) | ||
|
|
@@ -257,6 +308,39 @@ def structurally_equals(self, other: Any) -> bool: | |
| ) | ||
|
|
||
|
|
||
| @DeveloperAPI(stability="alpha") | ||
| @dataclass(frozen=True, eq=False) | ||
| class UnaryExpr(Expr): | ||
| """Expression that represents a unary operation on a single expression. | ||
|
|
||
| This expression type represents an operation with one operand. | ||
| Common unary operations include logical NOT, IS NULL, IS NOT NULL, etc. | ||
|
|
||
| Args: | ||
| op: The operation to perform (from Operation enum) | ||
| operand: The operand expression | ||
|
|
||
| Example: | ||
| >>> from ray.data.expressions import col | ||
| >>> # Check if a column is null | ||
| >>> expr = col("age").is_null() # Creates UnaryExpr(IS_NULL, col("age")) | ||
| >>> # Logical not | ||
| >>> expr = ~(col("active")) # Creates UnaryExpr(NOT, col("active")) | ||
| """ | ||
|
|
||
| op: Operation | ||
| operand: Expr | ||
|
|
||
| data_type: DataType = field(init=False) | ||
|
|
||
| def structurally_equals(self, other: Any) -> bool: | ||
| return ( | ||
| isinstance(other, UnaryExpr) | ||
| and self.op is other.op | ||
| and self.operand.structurally_equals(other.operand) | ||
| ) | ||
|
|
||
|
|
||
| @DeveloperAPI(stability="alpha") | ||
| @dataclass(frozen=True, eq=False) | ||
| class UDFExpr(Expr): | ||
|
|
@@ -517,6 +601,7 @@ def download(uri_column_name: str) -> DownloadExpr: | |
| "ColumnExpr", | ||
| "LiteralExpr", | ||
| "BinaryExpr", | ||
| "UnaryExpr", | ||
| "UDFExpr", | ||
| "udf", | ||
| "DownloadExpr", | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym by storing ops in a shared state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reference to this comment: #56313 (comment)
https://en.wikipedia.org/wiki/Visitor_pattern