[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError #14387

sameeragarwal · 2016-07-28T06:20:06Z

What changes were proposed in this pull request?

We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. As a short term fix, this patch intercepts the OutOfMemoryError exception and suggest the user to disable the vectorized parquet reader.

How was this patch tested?

Existing Tests

rxin · 2016-07-28T06:28:23Z

LGTM pending Jenkins.

SparkQA · 2016-07-28T08:13:59Z

Test build #62955 has finished for PR 14387 at commit b62c1d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-07-28T20:03:49Z

Merging in master/2.0.

…utOfMemoryError ## What changes were proposed in this pull request? We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. As a short term fix, this patch intercepts the OutOfMemoryError exception and suggest the user to disable the vectorized parquet reader. ## How was this patch tested? Existing Tests Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14387 from sameeragarwal/oom. (cherry picked from commit 3fd39b8) Signed-off-by: Reynold Xin <rxin@databricks.com>

Recommend disabling vectorized parquet reader on OutOfMemoryError

b62c1d2

asfgit closed this in 3fd39b8 Jul 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError #14387

[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError #14387

Uh oh!

sameeragarwal commented Jul 28, 2016

Uh oh!

rxin commented Jul 28, 2016

Uh oh!

SparkQA commented Jul 28, 2016

Uh oh!

rxin commented Jul 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError #14387

[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError #14387

Uh oh!

Conversation

sameeragarwal commented Jul 28, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin commented Jul 28, 2016

Uh oh!

SparkQA commented Jul 28, 2016

Uh oh!

rxin commented Jul 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants