[SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader #11418

viirya · 2016-02-28T08:23:02Z

JIRA: https://issues.apache.org/jira/browse/SPARK-13537

What changes were proposed in this pull request?

In readBytes of VectorizedPlainValuesReader, we use buffer[offset] to access bytes in buffer. It is incorrect because offset is added with Platform.BYTE_ARRAY_OFFSET when initialization. We should fix it.

How was this patch tested?

ParquetHadoopFsRelationSuite sometimes (depending on the randomly generated data) will be failed by this bug. After applying this, the test can be passed.

I added a test to ParquetHadoopFsRelationSuite with the data which will fail without this patch.

The error exception:

[info] ParquetHadoopFsRelationSuite:
[info] - test all data types - StringType (440 milliseconds)
[info] - test all data types - BinaryType (434 milliseconds)
[info] - test all data types - BooleanType (406 milliseconds)
20:59:38.618 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 2597.0 (TID 67966)
java.lang.ArrayIndexOutOfBoundsException: 46
at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readBytes(VectorizedPlainValuesReader.java:88)

viirya · 2016-02-28T08:29:02Z

cc @nongli @rxin

SparkQA · 2016-02-28T10:41:28Z

Test build #52142 has finished for PR 11418 at commit 44f5c41.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-28T10:42:56Z

Test build #52143 has finished for PR 11418 at commit 1b09304.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

nongli · 2016-02-29T04:58:42Z

LGTM

Thanks for fixing this. Just out of curiosity, how did you find this initially?

viirya · 2016-02-29T05:05:32Z

I saw the failure in #11415 jenkins test report. Then I rerun the test locally to find the problematic data and do debugging with it.

rxin · 2016-02-29T05:16:14Z

Thanks - I've merged this in master.

JIRA: https://issues.apache.org/jira/browse/SPARK-13537 ## What changes were proposed in this pull request? In readBytes of VectorizedPlainValuesReader, we use buffer[offset] to access bytes in buffer. It is incorrect because offset is added with Platform.BYTE_ARRAY_OFFSET when initialization. We should fix it. ## How was this patch tested? `ParquetHadoopFsRelationSuite` sometimes (depending on the randomly generated data) will be [failed](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52136/consoleFull) by this bug. After applying this, the test can be passed. I added a test to `ParquetHadoopFsRelationSuite` with the data which will fail without this patch. The error exception: [info] ParquetHadoopFsRelationSuite: [info] - test all data types - StringType (440 milliseconds) [info] - test all data types - BinaryType (434 milliseconds) [info] - test all data types - BooleanType (406 milliseconds) 20:59:38.618 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 2597.0 (TID 67966) java.lang.ArrayIndexOutOfBoundsException: 46 at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readBytes(VectorizedPlainValuesReader.java:88) Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#11418 from viirya/fix-readbytes.

Fix readBytes.

44f5c41

Add a test.

1b09304

asfgit closed this in 6dfc4a7 Feb 29, 2016

viirya deleted the fix-readbytes branch December 27, 2023 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader #11418

[SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader #11418

Uh oh!

viirya commented Feb 28, 2016

Uh oh!

viirya commented Feb 28, 2016

Uh oh!

SparkQA commented Feb 28, 2016

Uh oh!

SparkQA commented Feb 28, 2016

Uh oh!

nongli commented Feb 29, 2016

Uh oh!

viirya commented Feb 29, 2016

Uh oh!

rxin commented Feb 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader #11418

[SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader #11418

Uh oh!

Conversation

viirya commented Feb 28, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

viirya commented Feb 28, 2016

Uh oh!

SparkQA commented Feb 28, 2016

Uh oh!

SparkQA commented Feb 28, 2016

Uh oh!

nongli commented Feb 29, 2016

Uh oh!

viirya commented Feb 29, 2016

Uh oh!

rxin commented Feb 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants