Skip to content

Conversation

@tejasapatil
Copy link
Contributor

What changes were proposed in this pull request?

Currently, if due to some failure, the outstream gets destroyed or closed and later outstream.close() leads to IOException in such case. Due to this, the stderrBuffer does not get logged and there is no way for users to see why the job failed.

The change is to first display the stderr buffer and then try closing the outstream.

How was this patch tested?

The correct way to test this fix would be to grep the log to see if the stderrBuffer gets logged but I dont think having test cases which do that is a good idea.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61002 has finished for PR 13834 at commit 04c8637.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 22, 2016

SGTM

@srowen
Copy link
Member

srowen commented Jun 22, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61011 has finished for PR 13834 at commit 04c8637.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 22, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61023 has finished for PR 13834 at commit 04c8637.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 22, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61036 has started for PR 13834 at commit 04c8637.

@tejasapatil
Copy link
Contributor Author

Looks like jenkins is having issues. Thanks to @srowen for triggering retests. I will try once more.

@tejasapatil
Copy link
Contributor Author

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 23, 2016

Test build #61068 has finished for PR 13834 at commit 04c8637.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 23, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 23, 2016

Test build #61103 has finished for PR 13834 at commit 04c8637.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 23, 2016

I think the failure is legit. It's stuck at:

[info] ScriptTransformationSuite:
Attempting to post to Github...
Exception in thread "Thread-ScriptTransformation-Feed" java.io.IOException: Stream closed
    at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
    at java.io.OutputStream.write(OutputStream.java:116)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
    at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
    at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:328)
    at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:276)
    at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:276)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1840)
    at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:276)

So is the problem that the process never returns from waitFor until that stream is closed?
If your goal is just to log the message even when the stream fails to close, I guess just use try-finally.

@srowen
Copy link
Member

srowen commented Jun 29, 2016

@tejasapatil are you able to follow up on this one? I could do it if you're busy.

@tejasapatil tejasapatil changed the title [TRIVIAL] [CORE] [ScriptTransform] move printing of stderr buffer before closing the outstream [SPARK-16339] ScriptTransform does not print stderr when outstream is lost Jul 1, 2016
@tejasapatil tejasapatil changed the title [SPARK-16339] ScriptTransform does not print stderr when outstream is lost [SPARK-16339] [CORE] ScriptTransform does not print stderr when outstream is lost Jul 1, 2016
@tejasapatil
Copy link
Contributor Author

ok to test

@srowen
Copy link
Member

srowen commented Jul 1, 2016

Hm, but now the stream is not closed in the case of an exception. I think we have a Utils method for closing and logging any exception that occurs? that would let it then proceed anyway

@SparkQA
Copy link

SparkQA commented Jul 1, 2016

Test build #61602 has finished for PR 13834 at commit 4b6adf2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

tejasapatil commented Jul 1, 2016

@srowen : In case of exception, we destroy() the proc (line 322 in PR) which cleans up all the associated streams : http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/lang/UNIXProcess.java#428

@srowen
Copy link
Member

srowen commented Jul 1, 2016

OK, I guess that's a safe enough assumption. I guess I'm worried just because we've seen that if this doesn't happen then waitFor will hang. But I suppose there's no reason to expect it would block after destroy(). If it needs a force-kill then lots of stuff is probably going wrong.

@srowen
Copy link
Member

srowen commented Jul 4, 2016

Hm, wait, don't we have a problem where a fatal error is thrown from this block? now it still waits for the process in the finally block, but will hang because the stream is not closed. If that's right, just safer to always close the stream and catch the exception? I think we have a Utils method for it.

throw e
} finally {
try {
outputStream.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I mean, before this was safe in that we'd always close the stream, or try to. The problem was just that closing could throw an exception and skip printing the stderr buffer. I don't see the value in moving close() just to have to further change the method behavior to work around the move. Why not just leave it here but prevent an exception from being thrown from close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61755 has finished for PR 13834 at commit e2277ff.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61756 has finished for PR 13834 at commit 2fe6e85.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61757 has finished for PR 13834 at commit 2cb1669.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61758 has finished for PR 13834 at commit 7b58a04.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

./dev/mima passed on my box.

Jenkins re-test please

@srowen
Copy link
Member

srowen commented Jul 5, 2016

Not sure what's up with that, but to be clear, I mean, isn't this just a one-line change? Utils.tryLogNonFatalError(outputStream.close()) there's no need to modify anything else. This was the whole of the issue.

@tejasapatil
Copy link
Contributor Author

@srowen : I also changed the NonFatal to Throwable to account for your comment at : #13834 (comment) The proc should always be terminated else the finally might be blocked indefinately.

@srowen
Copy link
Member

srowen commented Jul 5, 2016

I had thought that closing the stream was sufficient (at least, that's the current assumption in the code). But OK I can see making that change for a slightly different reason, to make sure any fatal exception also terminates the process (and closes its streams, still)

@srowen
Copy link
Member

srowen commented Jul 5, 2016

Jenkins retest this please (last failure was a hard JVM crash, obviously not related)

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61792 has finished for PR 13834 at commit 7b58a04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 6, 2016

Merged to master/2.0

@asfgit asfgit closed this in 5f34204 Jul 6, 2016
asfgit pushed a commit that referenced this pull request Jul 6, 2016
…eam is lost

## What changes were proposed in this pull request?

Currently, if due to some failure, the outstream gets destroyed or closed and later `outstream.close()` leads to IOException in such case. Due to this, the `stderrBuffer` does not get logged and there is no way for users to see why the job failed.

The change is to first display the stderr buffer and then try closing the outstream.

## How was this patch tested?

The correct way to test this fix would be to grep the log to see if the `stderrBuffer` gets logged but I dont think having test cases which do that is a good idea.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…

Author: Tejas Patil <tejasp@fb.com>

Closes #13834 from tejasapatil/script_transform.

(cherry picked from commit 5f34204)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@tejasapatil tejasapatil deleted the script_transform branch July 6, 2016 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants