[SPARK-4107] Fix incorrect handling of read() and skip() return values #2969

JoshRosen · 2014-10-28T01:45:14Z

read() may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. skip() faces similar issues, too.

This patch fixes several cases where we mis-handle these methods' return values.

read() may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors.

SparkQA · 2014-10-28T01:47:28Z

Test build #22318 has started for PR 2969 at commit db985ed.

This patch merges cleanly.

pwendell · 2014-10-28T01:59:29Z

Jenkins, test this please.

SparkQA · 2014-10-28T02:04:44Z

Test build #22319 has started for PR 2969 at commit db985ed.

This patch merges cleanly.

SparkQA · 2014-10-28T02:58:29Z

Test build #22318 has finished for PR 2969 at commit db985ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

This is a less critical issue since this code was only called from the log viewer, but it’s still wrong.

AmplabJenkins · 2014-10-28T02:58:33Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22318/
Test PASSed.

In this case, we might unnecessarily fail to read a block due to a partial read().

JoshRosen · 2014-10-28T03:04:49Z

core/src/main/scala/org/apache/spark/storage/TachyonStore.scala

@haoyuan In addition to improper use of read(), I think this method could have potentially returned Some(null) when is == null (which should never happen, but still...).

Can you verify that these changes are correct?

I checked the source code for all releases back until 0.4.0 (which is the first one Spark supports), and it's true that is cannot be null.

SparkQA · 2014-10-28T03:12:29Z

Test build #22324 has started for PR 2969 at commit b9265d2.

This patch merges cleanly.

aarondav · 2014-10-28T03:17:53Z

core/src/main/scala/org/apache/spark/network/ManagedBuffer.scala

getName does not return the full path, we should probably use the path instead

Good catch; I've updated this to use getAbsolutePath.

AmplabJenkins · 2014-10-28T03:22:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22323/
Test FAILed.

JoshRosen · 2014-10-28T03:23:45Z

Jenkins, retest this please.

JoshRosen · 2014-10-28T03:26:31Z

core/src/main/scala/org/apache/spark/TestUtils.scala

In here (and FileServerSuite), I think that the bug is that this should be nRead >= 0. If nRead is less than the length of file but greater than 0, then I think this would exit the loop without having copied the whole file.

SparkQA · 2014-10-28T03:29:52Z

Test build #22327 has started for PR 2969 at commit cbc03ce.

This patch merges cleanly.

AmplabJenkins · 2014-10-28T03:37:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22325/
Test FAILed.

JoshRosen · 2014-10-28T03:50:05Z

Found more potential problems: we also appear to ignore the return value of skip() in a few places.

SparkQA · 2014-10-28T03:59:57Z

Test build #22330 has started for PR 2969 at commit e724a9f.

This patch merges cleanly.

SparkQA · 2014-10-28T04:04:44Z

Test build #22319 timed out for PR 2969 at commit db985ed after a configured wait of 120m.

AmplabJenkins · 2014-10-28T04:04:47Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22319/
Test FAILed.

SparkQA · 2014-10-28T04:22:22Z

Test build #22324 has finished for PR 2969 at commit b9265d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-28T04:22:25Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22324/
Test PASSed.

SparkQA · 2014-10-28T04:40:03Z

Test build #22327 has finished for PR 2969 at commit cbc03ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-28T04:40:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22327/
Test PASSed.

SparkQA · 2014-10-28T05:08:13Z

Test build #22330 has finished for PR 2969 at commit e724a9f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-28T05:08:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22330/
Test PASSed.

rxin · 2014-10-28T05:49:43Z

core/src/main/scala/org/apache/spark/network/ManagedBuffer.scala

what is the problem with the old code here?

http://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html#skip(long):

Skips over and discards n bytes of data from the input stream.

The skip method may, for a variety of reasons, end up skipping over some smaller number of bytes, possibly 0. If n is negative, an IOException is thrown, even though the skip method of the InputStream superclass does nothing in this case. The actual number of bytes skipped is returned.

This method may skip more bytes than are remaining in the backing file. This produces no exception and the number of bytes skipped may include some number of bytes that were beyond the EOF of the backing file. Attempting to read from the stream after skipping past the end will result in -1 indicating the end of the file.

rxin · 2014-10-28T07:03:56Z

BTW this LGTM.

rxin · 2014-10-28T07:04:36Z

I merged it in master.

Can you also create a patch for branch-1.1?

`read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. `skip()` faces similar issues, too. This patch fixes several cases where we mis-handle these methods' return values. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#2969 from JoshRosen/file-channel-read-fix and squashes the following commits: e724a9f [Josh Rosen] Fix similar issue of not checking skip() return value. cbc03ce [Josh Rosen] Update the other log message, too. 01e6015 [Josh Rosen] file.getName -> file.getAbsolutePath d961d95 [Josh Rosen] Fix another issue in FileServerSuite. b9265d2 [Josh Rosen] Fix a similar (minor) issue in TestUtils. cd9d76f [Josh Rosen] Fix a similar error in Tachyon: 3db0008 [Josh Rosen] Fix a similar read() error in Utils.offsetBytes(). db985ed [Josh Rosen] Fix unsafe usage of FileChannel.read(): Conflicts: core/src/main/scala/org/apache/spark/network/ManagedBuffer.scala core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockManager.scala core/src/main/scala/org/apache/spark/storage/DiskStore.scala core/src/test/scala/org/apache/spark/FileServerSuite.scala

JoshRosen · 2014-10-28T07:23:23Z

Thanks! I've opened a new pull request for backporting to branch-1.1.

…s (branch-1.1 backport) `read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. `skip()` faces similar issues, too. This patch fixes several cases where we mis-handle these methods' return values. This is a backport of #2969 to `branch-1.1`. Author: Josh Rosen <joshrosen@databricks.com> Closes #2974 from JoshRosen/spark-4107-branch-1.1-backport and squashes the following commits: d82c05b [Josh Rosen] [SPARK-4107] Fix incorrect handling of read() and skip() return values

Fix unsafe usage of FileChannel.read():

db985ed

read() may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors.

Fix a similar read() error in Utils.offsetBytes().

3db0008

This is a less critical issue since this code was only called from the log viewer, but it’s still wrong.

Fix a similar error in Tachyon:

cd9d76f

In this case, we might unnecessarily fail to read a block due to a partial read().

JoshRosen reviewed Oct 28, 2014
View reviewed changes

Fix a similar (minor) issue in TestUtils.

b9265d2

Fix another issue in FileServerSuite.

d961d95

aarondav reviewed Oct 28, 2014
View reviewed changes

file.getName -> file.getAbsolutePath

01e6015

Update the other log message, too.

cbc03ce

JoshRosen reviewed Oct 28, 2014
View reviewed changes

JoshRosen changed the title ~~Fix unsafe usage of FileChannel.read()~~ [SPARK-4107] Fix incorrect usage of FileChannel.read() Oct 28, 2014

Fix similar issue of not checking skip() return value.

e724a9f

JoshRosen changed the title ~~[SPARK-4107] Fix incorrect usage of FileChannel.read()~~ [SPARK-4107] Fix incorrect usage of Channel.read() and Stream.skip() Oct 28, 2014

JoshRosen changed the title ~~[SPARK-4107] Fix incorrect usage of Channel.read() and Stream.skip()~~ [SPARK-4107] Fix incorrect handling of read() and skip() return values Oct 28, 2014

rxin reviewed Oct 28, 2014
View reviewed changes

asfgit closed this in 46c6341 Oct 28, 2014

JoshRosen mentioned this pull request Oct 28, 2014

[SPARK-4107] Fix incorrect handling of read() and skip() return values (branch-1.1 backport) #2974

Closed

JoshRosen deleted the file-channel-read-fix branch October 28, 2014 07:23

[SPARK-4107] Fix incorrect handling of read() and skip() return values #2969

[SPARK-4107] Fix incorrect handling of read() and skip() return values #2969

Uh oh!

Conversation

JoshRosen commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

pwendell commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

JoshRosen commented Oct 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

JoshRosen commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

SparkQA commented Oct 28, 2014

Uh oh!

AmplabJenkins commented Oct 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Oct 28, 2014

Uh oh!

rxin commented Oct 28, 2014

Uh oh!

JoshRosen commented Oct 28, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants