-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-load lazy vectors that are referenced by multiple sub expressions #2089
Conversation
cd8ba50
to
5781e32
Compare
Fix #2073 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barsondei Thank you for reporting a problem and proposing a fix. I don't quite understand the fix and it appears in a surprising place. I would expect that a proper fix would reconcile existing workarounds in FilterProject::project and Expr::eval and, perhaps, move the logic of loading lazy vectors into ExprSet::eval.
Would you update the PR description to explain the current design, the problems with it and the new design?
Thanks @barsondei for your contribution and thanks @mbasmanova for your valuable review comments. |
@mbasmanova Thank you for your comments.
To deal with multiple reference, current design adopt a relatively simple scheme.
So, there is much code to maintain |
@barsondei Thank you for additional clarifications. I took a closer look into this problem and shared my findings in #2073 (comment)
This is indeed the case, but only for non-null-propagating expressions. array_constructor and row_constructor expressions do not propagate nulls, i.g. a null in one of the inputs doesn't produce a null result. For null-propagating expressions where null in any input produces null result it is sufficient to evaluate inputs for a subset of rows where none of the inputs are null. |
5781e32
to
db090d0
Compare
Thanks @barsondei for posting an update. I will get back to you with a review by end of day today (5 pm PST) |
As discussed in #2073 with @mbasmanova . Also waiting for your comments. @bikramSingh91 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barsondei Thank you for iterating on this PR. The implementation has changed quite a bit, so PR title is no longer accurate. Would you PR title and also provide a description to explain the original problem and the fix implemented here?
Some additional comments below.
This PR is aimed to fix the bug we reported in 2073.
Current implementation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barsondei Thank you for working on this fix. Overall looks great % a couple of small comments and PR title and description need updating.
When code is merged, PR title and description together form a commit message. Hence, it is important that these are written clearly and explain the changes in detail. Here are some guidelines about how to write commit messages:
@mbasmanova I have updated the PR's description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barsondei Barson, thank you editing the PR title and writing up a description. It is clear now what the problem is and what solution is implemented here. I editing the title and description to fix typos and wrap at 80 characters.
The code changes look good to me, but, please, address any remaining comments from @bikramSingh91 and @Yuhta .
Thank you for the contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one small nit
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request has been reverted by 5e2c7e3. |
1) remove inessential comments 2) rename findMultiRefFields to updateAllAndMultiplyReferencedFields 3) materializing fields only for non-null propagating 4) add a test that exercises the changes at ExprSet::eval() 5) rename moreFields to fieldsToAdd
1) use distinctFields_ instead of allFields 2) rename variables, drop temp variables in ExprTest::lazyVectorAccessTwiceWithDifferentRows
1) revert inessential comments 2) combine updateMultiplyReferencedFields and mergeFields functions 3) fix typo 4) use std::unordered_set instead of std::set
e5d04d8
to
1b217e2
Compare
✅ Deploy Preview for meta-velox canceled.
|
rebased and resolved conflicts |
Patch re-applied in #2501 |
What changes were proposed in this pull request? (Please fill in changes proposed in this fix) (Fixes: facebookincubator#2088) How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manually. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Multiply-referenced LazyVector may load for different sets of rows due to
conditional expression with short circuit semantics. e.g.
f(a) AND g(b), h(b)
.If b is a LazyVector and
f(a) AND g(b)
expression is evaluated first,it will load b only for rows where
f(a)
is true. However,h(b)
projectionneeds all rows for "b". Previous solution is to set final selection to false
in
FilterProject::project
method. LazyVector always loads for all rows withfinal selection flag is false.
However, the problem also occurs within a single expression that contains
multiple expressions under a non-null-propagating expression. e.g.
array_constructor(c0 + c1, if(c1 >= 0, c1, 0))
. array_constuctor is thenon-null-propagating expression with two sub-expressions: c0 + c1 and if
(c1 >= 0, c1, 0). First sub-expression is evaluated for a subset of rows when
c0 is not null, but second sub-expression needs to be evaluated for all rows.
Current fix: 1) In Expr::computeMetadata(), identify fields multiply referenced
by input expressions. Load multiply referenced fields at the start of
evaluation in Expr::eval(). 2) Similar to (1), identify multiply referenced
fields references by expressions in ExprSet. Load multiply referenced fields in
ExprSet::eval(). 3) Remove the existing workaround, setting isFinalSelection to
false, from FilterProject::project().