-
Notifications
You must be signed in to change notification settings - Fork 267
Use a balanced tree instead of unbalanced one to prevent recursion error in create_match_filter #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ror in create_match_filter
@@ -257,7 +298,7 @@ class Or(BooleanExpression): | |||
|
|||
def __new__(cls, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> BooleanExpression: # type: ignore | |||
if rest: | |||
return reduce(Or, (left, right, *rest)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also want to apply this in the And
situation :)
I think this is pretty elegant, thoughts @kevinjqliu? I expressed some concerns with in #1783 around performance. I have played around with the |
Moving this forward since I think it is a great idea in general :) |
Thanks @koenvo |
**Use a balanced tree instead of an unbalanced one to prevent recursion error in `create_match_filter`** <!-- Closes #1776 --> ## Rationale for this change In the `create_match_filter` function, the previous implementation used `functools.reduce(operator.or_, filters)` to combine expressions. This approach constructed a right-heavy, unbalanced tree, which could lead to a `RecursionError` when dealing with a large number of expressions (e.g., over 1,000). To address this, we've introduced the `_build_balanced_tree` function. This utility constructs a balanced binary tree of expressions, reducing the maximum depth to O(log n) and thereby preventing potential recursion errors. This makes expression construction more stable and scalable, especially when working with large datasets. ```python # Past behavior Or(*[A, B, C, D]) = Or(A, Or(B, Or(C, D)) # New behavior Or(*[A, B, C, D]) = Or(Or(A, B), Or(C, D)) ``` ## Are these changes tested? Yes, existing tests cover the functionality of `Or`. Additional testing was done with large expression sets (e.g., 10,000 items) to ensure that balanced tree construction avoids recursion errors. ## Are there any user-facing changes? No, there are no user-facing changes. This is an internal implementation improvement that does not affect the public API. Closes #1759 Closes #1785 <!-- In the case of user-facing changes, please add the changelog label. -->
Use a balanced tree instead of an unbalanced one to prevent recursion error in
create_match_filter
Rationale for this change
In the
create_match_filter
function, the previous implementation usedfunctools.reduce(operator.or_, filters)
to combine expressions. This approach constructed a right-heavy, unbalanced tree, which could lead to aRecursionError
when dealing with a large number of expressions (e.g., over 1,000).To address this, we've introduced the
_build_balanced_tree
function. This utility constructs a balanced binary tree of expressions, reducing the maximum depth to O(log n) and thereby preventing potential recursion errors. This makes expression construction more stable and scalable, especially when working with large datasets.Are these changes tested?
Yes, existing tests cover the functionality of
Or
. Additional testing was done with large expression sets (e.g., 10,000 items) to ensure that balanced tree construction avoids recursion errors.Are there any user-facing changes?
No, there are no user-facing changes. This is an internal implementation improvement that does not affect the public API.
Closes #1759
Closes #1785