-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid creating null output stream in S3SingleDriverLogStore #317
Conversation
acquirePathLock(lockedPath) | ||
try { | ||
if (exists(fs, resolvedPath) && !overwrite) { | ||
throw new java.nio.file.FileAlreadyExistsException(resolvedPath.toUri.toString) | ||
} | ||
val countingStream = new CountingOutputStream(stream) | ||
stream = fs.create(resolvedPath, overwrite) | ||
val stream = new CountingOutputStream(fs.create(resolvedPath, overwrite)) | ||
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write) | ||
stream.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% confident that CountingOutputStream will close the underlying stream. This way is just nice because there is no second handle laying around to get used by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/output/CountingOutputStream.html, it looks like it extends ProxyOutputStream, which provides the implementation of .close()
and delegates it to out
.
Fixes #3161616161616
d6ee1ef
to
4a3d99d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. LGTM. Just curious how did you trigger NPE? Did you use a different guava version? Looks like in the current guava version used by Spark, CountingOutputStream
doesn't check the input parameter.
Good question! I was just trying to write a dataframe to s3 and it kept crashing whenever there were already files in the delta table. I'm using vanilla Spark 2.4.4, Scala 2.11 with Hadoop 3.2.1, which looks like it ends up with Prior to upgrading hadoop, we were using |
Fixes #316