Filter predicates should work with missing columns

This issue originates from SPARK-11103, which contains detailed information about how to reproduce it.

The major problem here is that, filter predicates pushed down assert that columns they touch must exist in the target physical files. But this isn't true in case of schema merging.

Actually this assertion is unnecessary, because if a column is missing in the filter schema, the column is considered to be filled by nulls, and all the filters should be able to act accordingly. For example, if we push down `a = 1` but `a` is missing in the underlying physical file, all records in this file should be dropped since `a` is always null. On the other hand, if we push down `a IS NULL`, all records should be preserved.

**Reporter**: [Cheng Lian](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=lian+cheng) / @liancheng
**Assignee**: [Ryan Blue](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=rdblue) / @rdblue
#### Related issues:
- [Parquet filters push-down may cause exception when schema merging is turned on](https://issues.apache.org/jira/browse/SPARK-11103) (relates to)
- [Parquet predicate pushdown on columns with dots return empty results](https://issues.apache.org/jira/browse/SPARK-20364) (relates to)
- [Cannot filter by nonexisting column in parquet file](https://issues.apache.org/jira/browse/SPARK-18539) (is related to)
#### PRs and other links:
- [PR #354](https://github.com/apache/parquet-mr/pull/354)

<sub>**Note**: *This issue was originally created as [PARQUET-389](https://issues.apache.org/jira/browse/PARQUET-389). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Filter predicates should work with missing columns #1900

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Filter predicates should work with missing columns #1900

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions