-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add predicate pushdown using filter2 api #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@isnotinvain, you might want to have a look at this one. I'll take a look next week as well. Is this something you'd like to get into 1.9.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale behind not using predicate pushdown by default? Not breaking existing jobs until it is well tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if other who are more familiar with pig pushdown or filter api think it's worth turning on by default, I'm OK with that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have it on by default once we're confident.
Possibly once it has run in production and works well?
|
@danielcweeks, I found a few things in the filter generation code that looks odd to me so I updated it in my local copy. You might want to have a look at rdblue@a39fdff, which fixes most of the issues I pointed out. It doesn't support BOOLEAN yet, but that wouldn't be hard to add. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue (your comment here got stomped with the updates)
This was a mistake on my part and should have been new ColumnWrapper(booleanColumn(f.alias));
In a sense this introduces a "bug", but I'm not sure you can even represent that in pig latin, so it's more to improve readability and let pig sort out the conflict. Conceptually, I don't like it, but it makes it pretty clean.
|
@rdblue I'll take a look to see how boolean would fit into your branch and we can take a look at the two approaches. |
| } | ||
|
|
||
| private <C extends Comparable<C>> C getValue(Expression expr, Class<C> type) { | ||
| Comparable value = (Comparable) ((Const) expr).getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type.cast(((Const) expr).getValue())?
|
This looks good to me. |
|
I think the only thing that still needed to be done was to throw exceptions rather than returning null when a filter can't be constructed. There was discussion about this above, but it was on an outdated diff. I made those changes and pushed them to https://github.com/rdblue/parquet-mr/tree/PARQUET-397-pig-predicate-pushdown for review. I also implemented not, !=, in, and between operations because it seems likely that those will be used. In and between build expressions based on comparison operators, or, and and. |
This is based on apache#296 from @danielcweeks and implements a few remaining review items. Closes apache#296. Author: Daniel Weeks <dweeks@netflix.com> Author: Ryan Blue <blue@apache.org> Closes apache#331 from rdblue/PARQUET-397-pig-predicate-pushdown and squashes the following commits: c7a9b02 [Ryan Blue] PARQUET-397: Address review comments. 54e23a6 [Ryan Blue] PARQUET-397: Update Pig PPD to throw for bad expressions. 388099b [Daniel Weeks] Cleaning up imports 6b405b4 [Daniel Weeks] Merge remote-tracking branch 'rdblue/pig-predicate-pushdown' into pig-predicate-pushdown f1ef73e [Daniel Weeks] Fixed binary type and storing filter predicate a39fdff [Ryan Blue] WIP: Handle a few error cases in Pig predicate pushdown. 2666849 [Daniel Weeks] Fixed test to check the actual number of materialized rows from the reader 7b019a6 [Daniel Weeks] update tests and logging f8ca447 [Daniel Weeks] Add predicate pushdown using filter2 api
This is based on apache#296 from @danielcweeks and implements a few remaining review items. Closes apache#296. Author: Daniel Weeks <dweeks@netflix.com> Author: Ryan Blue <blue@apache.org> Closes apache#331 from rdblue/PARQUET-397-pig-predicate-pushdown and squashes the following commits: c7a9b02 [Ryan Blue] PARQUET-397: Address review comments. 54e23a6 [Ryan Blue] PARQUET-397: Update Pig PPD to throw for bad expressions. 388099b [Daniel Weeks] Cleaning up imports 6b405b4 [Daniel Weeks] Merge remote-tracking branch 'rdblue/pig-predicate-pushdown' into pig-predicate-pushdown f1ef73e [Daniel Weeks] Fixed binary type and storing filter predicate a39fdff [Ryan Blue] WIP: Handle a few error cases in Pig predicate pushdown. 2666849 [Daniel Weeks] Fixed test to check the actual number of materialized rows from the reader 7b019a6 [Daniel Weeks] update tests and logging f8ca447 [Daniel Weeks] Add predicate pushdown using filter2 api Conflicts: pom.xml Resolution: Fixed up adjacent changes.
No description provided.