PARQUET-389: Support predicate push down on missing columns. #354

rdblue · 2016-07-13T20:45:11Z

Predicate push-down will complain when predicates reference columns that aren't in a file's schema. This makes it difficult to implement predicate push-down in engines where schemas evolve because each task needs to process the predicates and prune references to columns not in that task's file. This PR implements predicate evaluation for missing columns, where the values are all null. This allows engines to pass predicates as they are written.

A future commit should rewrite the predicates to avoid the extra work currently done in record-level filtering, but that isn't included here because it is an optimization.

rdblue · 2016-07-13T20:46:13Z

@danielcweeks, @liancheng, this is to avoid the predicate push-down problems in Spark. Can you review? Thanks!

liancheng · 2016-07-14T09:13:54Z

+1

danielcweeks · 2016-07-14T23:19:45Z

+1 as well. Thanks!

liancheng · 2016-07-19T06:36:58Z

Is it possible to have a 1.8.2 release that includes this fix? I just checked and it seems that there isn't a dedicated branch for 1.8.x?

viirya · 2016-12-06T03:52:22Z

I am curious, why this patch doesn't do the same thing to the visit method for UserDefinedPredicate in StatisticsFilter?

Is it intentional?

viirya · 2016-12-06T04:29:24Z

@liancheng @rdblue @danielcweeks I submitted #389 to extend this kind of fixing to UserDefinedPredicate. Don't know if it is appropriate? Please remind me if it is not. Thank you.

This extends the fixing #354 to UserDefinedPredicate. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #389 from viirya/PARQUET-791 and squashes the following commits: d6be37d [Liang-Chi Hsieh] Address comment. 7e929c3 [Liang-Chi Hsieh] PARQUET-791: Add missing column support for UserDefinedPredicate.

Predicate push-down will complain when predicates reference columns that aren't in a file's schema. This makes it difficult to implement predicate push-down in engines where schemas evolve because each task needs to process the predicates and prune references to columns not in that task's file. This PR implements predicate evaluation for missing columns, where the values are all null. This allows engines to pass predicates as they are written. A future commit should rewrite the predicates to avoid the extra work currently done in record-level filtering, but that isn't included here because it is an optimization. Author: Ryan Blue <blue@apache.org> Closes apache#354 from rdblue/PARQUET-389-predicate-push-down-on-missing-columns and squashes the following commits: b4d809a [Ryan Blue] PARQUET-389: Support record-level filtering with missing columns. 91b841c [Ryan Blue] PARQUET-389: Add missing column support to StatisticsFilter. 275f950 [Ryan Blue] PARQUET-389: Add missing column support to DictionaryFilter.

This extends the fixing apache#354 to UserDefinedPredicate. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#389 from viirya/PARQUET-791 and squashes the following commits: d6be37d [Liang-Chi Hsieh] Address comment. 7e929c3 [Liang-Chi Hsieh] PARQUET-791: Add missing column support for UserDefinedPredicate.

Predicate push-down will complain when predicates reference columns that aren't in a file's schema. This makes it difficult to implement predicate push-down in engines where schemas evolve because each task needs to process the predicates and prune references to columns not in that task's file. This PR implements predicate evaluation for missing columns, where the values are all null. This allows engines to pass predicates as they are written. A future commit should rewrite the predicates to avoid the extra work currently done in record-level filtering, but that isn't included here because it is an optimization. Author: Ryan Blue <blue@apache.org> Closes apache#354 from rdblue/PARQUET-389-predicate-push-down-on-missing-columns and squashes the following commits: b4d809a [Ryan Blue] PARQUET-389: Support record-level filtering with missing columns. 91b841c [Ryan Blue] PARQUET-389: Add missing column support to StatisticsFilter. 275f950 [Ryan Blue] PARQUET-389: Add missing column support to DictionaryFilter.

This extends the fixing apache#354 to UserDefinedPredicate. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#389 from viirya/PARQUET-791 and squashes the following commits: d6be37d [Liang-Chi Hsieh] Address comment. 7e929c3 [Liang-Chi Hsieh] PARQUET-791: Add missing column support for UserDefinedPredicate.

rdblue added 3 commits July 12, 2016 14:23

PARQUET-389: Add missing column support to DictionaryFilter.

275f950

PARQUET-389: Add missing column support to StatisticsFilter.

91b841c

PARQUET-389: Support record-level filtering with missing columns.

b4d809a

asfgit closed this in 42662f8 Jul 15, 2016

viirya mentioned this pull request Dec 6, 2016

PARQUET-791: Add missing column support for UserDefinedPredicate #389

Closed

gatorsmile mentioned this pull request May 15, 2017

[SPARK-20364][SQL] Support Parquet predicate pushdown on columns with dots apache/spark#17680

Closed

asfimport mentioned this pull request Apr 21, 2018

Filter predicates should work with missing columns #1900

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PARQUET-389: Support predicate push down on missing columns. #354

PARQUET-389: Support predicate push down on missing columns. #354

Uh oh!

rdblue commented Jul 13, 2016

Uh oh!

rdblue commented Jul 13, 2016

Uh oh!

liancheng commented Jul 14, 2016

Uh oh!

danielcweeks commented Jul 14, 2016

Uh oh!

liancheng commented Jul 19, 2016

Uh oh!

viirya commented Dec 6, 2016

Uh oh!

viirya commented Dec 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PARQUET-389: Support predicate push down on missing columns. #354

PARQUET-389: Support predicate push down on missing columns. #354

Uh oh!

Conversation

rdblue commented Jul 13, 2016

Uh oh!

rdblue commented Jul 13, 2016

Uh oh!

liancheng commented Jul 14, 2016

Uh oh!

danielcweeks commented Jul 14, 2016

Uh oh!

liancheng commented Jul 19, 2016

Uh oh!

viirya commented Dec 6, 2016

Uh oh!

viirya commented Dec 6, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants