[SPARK-11153] [SQL] Disables Parquet filter push-down for string and binary columns #9152

liancheng · 2015-10-16T20:12:42Z

Due to PARQUET-251, BINARY columns in existing Parquet files may be written with corrupted statistics information. This information is used by filter push-down optimization. Since Spark 1.5 turns on Parquet filter push-down by default, we may end up with wrong query results. PARQUET-251 has been fixed in parquet-mr 1.8.1, but Spark 1.5 is still using 1.7.0.

This affects all Spark SQL data types that can be mapped to Parquet {{BINARY}}, namely:

StringType
BinaryType
DecimalType

(But Spark SQL doesn't support pushing down filters involving DecimalType columns for now.)

To avoid wrong query results, we should disable filter push-down for columns of StringType and BinaryType until we upgrade to parquet-mr 1.8.

SparkQA · 2015-10-16T23:00:17Z

Test build #43850 has finished for PR 9152 at commit 3584d03.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-10-20T00:08:24Z

retest this please

SparkQA · 2015-10-20T01:58:14Z

Test build #43956 has finished for PR 9152 at commit 3584d03.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-10-20T05:31:26Z

Jenkins, retest this please.

SparkQA · 2015-10-20T08:40:14Z

Test build #43963 timed out for PR 9152 at commit 3584d03 after a configured wait of 175m.

liancheng · 2015-10-20T10:38:11Z

retest this please

SparkQA · 2015-10-20T13:07:50Z

Test build #43973 has finished for PR 9152 at commit 3584d03.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-10-21T01:01:43Z

Merging to branch-1.5, and master if the merge script let me do that. Otherwise will open a separate PR for master.

…inary columns Due to PARQUET-251, `BINARY` columns in existing Parquet files may be written with corrupted statistics information. This information is used by filter push-down optimization. Since Spark 1.5 turns on Parquet filter push-down by default, we may end up with wrong query results. PARQUET-251 has been fixed in parquet-mr 1.8.1, but Spark 1.5 is still using 1.7.0. This affects all Spark SQL data types that can be mapped to Parquet {{BINARY}}, namely: - `StringType` - `BinaryType` - `DecimalType` (But Spark SQL doesn't support pushing down filters involving `DecimalType` columns for now.) To avoid wrong query results, we should disable filter push-down for columns of `StringType` and `BinaryType` until we upgrade to parquet-mr 1.8. Author: Cheng Lian <lian@databricks.com> Closes #9152 from liancheng/spark-11153.workaround-parquet-251.

liancheng · 2015-10-21T01:03:54Z

OK, merged to both branch-1.5 and master.

Disables Parquet filter push-down for string and binary columns

3584d03

asfgit closed this in 89e6db6 Oct 21, 2015

liancheng deleted the spark-11153.workaround-parquet-251 branch October 21, 2015 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-11153] [SQL] Disables Parquet filter push-down for string and binary columns #9152

[SPARK-11153] [SQL] Disables Parquet filter push-down for string and binary columns #9152

Uh oh!

liancheng commented Oct 16, 2015

Uh oh!

SparkQA commented Oct 16, 2015

Uh oh!

liancheng commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 20, 2015

Uh oh!

JoshRosen commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 20, 2015

Uh oh!

liancheng commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 20, 2015

Uh oh!

liancheng commented Oct 21, 2015

Uh oh!

liancheng commented Oct 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-11153] [SQL] Disables Parquet filter push-down for string and binary columns #9152

[SPARK-11153] [SQL] Disables Parquet filter push-down for string and binary columns #9152

Uh oh!

Conversation

liancheng commented Oct 16, 2015

Uh oh!

SparkQA commented Oct 16, 2015

Uh oh!

liancheng commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 20, 2015

Uh oh!

JoshRosen commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 20, 2015

Uh oh!

liancheng commented Oct 20, 2015

Uh oh!

SparkQA commented Oct 20, 2015

Uh oh!

liancheng commented Oct 21, 2015

Uh oh!

liancheng commented Oct 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants