Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Dec 6, 2016

This extends the fixing #354 to UserDefinedPredicate.

@rdblue
Copy link
Contributor

rdblue commented Dec 6, 2016

Is it valid to call a UserDefinedPredicate with null? If so, then we could run it once with null, cache the return value, and use that instead of BLOCK_MIGHT_MATCH every time.

@viirya
Copy link
Member Author

viirya commented Dec 7, 2016

@rdblue Thanks for reviewing. In this case (getColumnChunk returns null), I think the overhead is very tiny. I am not sure if caching can be actually benefit.

@rdblue
Copy link
Contributor

rdblue commented Dec 7, 2016

I'm not really talking about the overhead of running the function. Just that the default assumption is that we don't know anything about whether the predicate will accept a block of nulls. I think there are a lot of cases where null will be filtered, so this makes the conservative but wrong decision. If we can actually call the predicate with null, we can use it to eliminate whole row groups and files.

@viirya
Copy link
Member Author

viirya commented Dec 8, 2016

@rdblue Thanks. I see. That makes sense to me. We can run UserDefinedPredicate.keep with null value to know if the predicate will keep or drop null values, and use the result to eliminate whole row groups, instead of BLOCK_MIGHT_MATCH.

@viirya
Copy link
Member Author

viirya commented Dec 8, 2016

@rdblue I've updated this like above. Please take a look. Thank you.

@asfgit asfgit closed this in 71cff7c Dec 8, 2016
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
This extends the fixing apache#354 to UserDefinedPredicate.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#389 from viirya/PARQUET-791 and squashes the following commits:

d6be37d [Liang-Chi Hsieh] Address comment.
7e929c3 [Liang-Chi Hsieh] PARQUET-791: Add missing column support for UserDefinedPredicate.
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 10, 2017
This extends the fixing apache#354 to UserDefinedPredicate.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#389 from viirya/PARQUET-791 and squashes the following commits:

d6be37d [Liang-Chi Hsieh] Address comment.
7e929c3 [Liang-Chi Hsieh] PARQUET-791: Add missing column support for UserDefinedPredicate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants