Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating null output stream in S3SingleDriverLogStore #317

Closed
wants to merge 1 commit into from

Conversation

easel
Copy link
Contributor

@easel easel commented Feb 7, 2020

Fixes #316

@databricks-cla-assistant
Copy link

databricks-cla-assistant commented Feb 7, 2020

CLA assistant check
All committers have signed the CLA.

@easel easel requested a review from liwensun February 7, 2020 19:28
acquirePathLock(lockedPath)
try {
if (exists(fs, resolvedPath) && !overwrite) {
throw new java.nio.file.FileAlreadyExistsException(resolvedPath.toUri.toString)
}
val countingStream = new CountingOutputStream(stream)
stream = fs.create(resolvedPath, overwrite)
val stream = new CountingOutputStream(fs.create(resolvedPath, overwrite))
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
stream.close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% confident that CountingOutputStream will close the underlying stream. This way is just nice because there is no second handle laying around to get used by accident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/output/CountingOutputStream.html, it looks like it extends ProxyOutputStream, which provides the implementation of .close() and delegates it to out.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. LGTM. Just curious how did you trigger NPE? Did you use a different guava version? Looks like in the current guava version used by Spark, CountingOutputStream doesn't check the input parameter.

@easel
Copy link
Contributor Author

easel commented Feb 8, 2020

Good question! I was just trying to write a dataframe to s3 and it kept crashing whenever there were already files in the delta table. I'm using vanilla Spark 2.4.4, Scala 2.11 with Hadoop 3.2.1, which looks like it ends up with guava-27.0-jre.jar.

Prior to upgrading hadoop, we were using guava-14.0.1.jar or guava-16.0.1.jar (don't ask). I'm not 100% sure, but I don't think we ran into this issue with those older versions.

@liwensun liwensun closed this Feb 20, 2020
zsxwing pushed a commit that referenced this pull request Feb 26, 2020
…verLogStore

Fixes #316

Closes #317

Author: Erik LaBianca <erik.labianca@gmail.com>

#7992 is resolved by zsxwing/8poe59z8.

GitOrigin-RevId: 4e2306940262b3f942a8c325f494f22693e874b1
tdas pushed a commit to tdas/delta that referenced this pull request May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NullPointerException in S3SingleDriverLogStore.write:166
3 participants