-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Extend PruningPredicate
with background and implementation info
#9184
Conversation
/// | ||
/// If, for some other container, we knew `y` was between the values `4` and | ||
/// `15`, then the rewritten predicate evaluates to `true` (verifying this is | ||
/// left as an exercise to the reader -- are you still here?), and the container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤣
/// | ||
/// * Rows that evaluate to `true` are returned in the query results | ||
/// * Rows that evaluate to `false` are not returned (“filtered out” or “pruned” or “skipped”). | ||
/// * Rows that evaluate to `NULL` are **NOT** returned (also “filtered out”) – *this property appears many times in the discussion below* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rows that evaluate to
NULL
are NOT returned (also “filtered out”)
Seems like this is the opposite of what discussed below.
For example, the quote below says -- "containers that evaluate to NULL
are returned (also 'kept')"
/// *
NULL
: there MAY be rows that pass the predicate, KEEPS the container
/// Note that rewritten predicate can evaluate to NULL when some of
/// the min/max values are not known.
Did I misunderstand something here? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good observation.
Yes the NULL semantics in filter expressions is the opposite of what the rewritten pruning predicate does with NULL (which is quite confusing). I added a note trying to clarify this: 74261e8
Co-authored-by: Chunchun Ye <14298407+appletreeisyellow@users.noreply.github.com>
…w-datafusion into alamb/pruning_predicate_essay
PruningPredicate
with background and implementation info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, thanks for the great explanation. I'm not familiar with the PruningPredicate code much, but from the documentation alone here it looks good (reviewing from phone so it's a little hard to check out the code 😅)
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Draft as it builds on #9183Which issue does this PR close?
related to #9171
Rationale for this change
We use the
PruningPredicate
logic extensively in InfluxDB and we have found it is not an easy concept to understand. I have struggled to explain it face to face concisely so I wrote it down with the help of @appletreeisyellowI would like to add this content to help both our future engineers as well as the broader community understand how this code works (perhaps so they can extend it)
I also harbor an ulterior motive that anyone who works on / with DataFusion can get an accessable master class in practical database implementation techniques by reading our docs
What changes are included in this PR?
Add background and algorithm description to
PruningPredicate
docsAre these changes tested?
By doc test
Are there any user-facing changes?
More docs. No code changes