Avoid creating null output stream in S3SingleDriverLogStore #317

easel · 2020-02-07T19:27:03Z

Fixes #316

databricks-cla-assistant · 2020-02-07T19:27:07Z

All committers have signed the CLA.

easel · 2020-02-07T19:32:18Z

src/main/scala/org/apache/spark/sql/delta/storage/S3SingleDriverLogStore.scala

    acquirePathLock(lockedPath)
    try {
      if (exists(fs, resolvedPath) && !overwrite) {
        throw new java.nio.file.FileAlreadyExistsException(resolvedPath.toUri.toString)
      }
-      val countingStream = new CountingOutputStream(stream)
-      stream = fs.create(resolvedPath, overwrite)
+      val stream = new CountingOutputStream(fs.create(resolvedPath, overwrite))
      actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
      stream.close()



I'm not 100% confident that CountingOutputStream will close the underlying stream. This way is just nice because there is no second handle laying around to get used by accident.

Looking at https://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/output/CountingOutputStream.html, it looks like it extends ProxyOutputStream, which provides the implementation of .close() and delegates it to out.

Fixes #3161616161616

zsxwing

Good catch. LGTM. Just curious how did you trigger NPE? Did you use a different guava version? Looks like in the current guava version used by Spark, CountingOutputStream doesn't check the input parameter.

easel · 2020-02-08T02:39:21Z

Good question! I was just trying to write a dataframe to s3 and it kept crashing whenever there were already files in the delta table. I'm using vanilla Spark 2.4.4, Scala 2.11 with Hadoop 3.2.1, which looks like it ends up with guava-27.0-jre.jar.

Prior to upgrading hadoop, we were using guava-14.0.1.jar or guava-16.0.1.jar (don't ask). I'm not 100% sure, but I don't think we ran into this issue with those older versions.

…verLogStore Fixes #316 Closes #317 Author: Erik LaBianca <erik.labianca@gmail.com> #7992 is resolved by zsxwing/8poe59z8. GitOrigin-RevId: 4e2306940262b3f942a8c325f494f22693e874b1

easel requested a review from liwensun February 7, 2020 19:28

easel commented Feb 7, 2020

View reviewed changes

Avoid creating null output stream in S3SingleDriverLogStore

4a3d99d

Fixes #3161616161616

easel force-pushed the s3-logstore-npe branch from d6ee1ef to 4a3d99d Compare February 7, 2020 19:35

zsxwing approved these changes Feb 7, 2020

View reviewed changes

liwensun closed this Feb 20, 2020

tdas pushed a commit to tdas/delta that referenced this pull request May 31, 2023

remove accidental merge conflict (delta-io#317)

7ffb34f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid creating null output stream in S3SingleDriverLogStore #317

Avoid creating null output stream in S3SingleDriverLogStore #317

easel commented Feb 7, 2020 •

edited

Loading

databricks-cla-assistant commented Feb 7, 2020 •

edited

Loading

easel Feb 7, 2020

easel Feb 7, 2020

zsxwing left a comment

easel commented Feb 8, 2020

Avoid creating null output stream in S3SingleDriverLogStore #317

Avoid creating null output stream in S3SingleDriverLogStore #317

Conversation

easel commented Feb 7, 2020 • edited Loading

databricks-cla-assistant commented Feb 7, 2020 • edited Loading

easel Feb 7, 2020

Choose a reason for hiding this comment

easel Feb 7, 2020

Choose a reason for hiding this comment

zsxwing left a comment

Choose a reason for hiding this comment

easel commented Feb 8, 2020

easel commented Feb 7, 2020 •

edited

Loading

databricks-cla-assistant commented Feb 7, 2020 •

edited

Loading