-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution #25960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cially enabled adaptive execution
|
Test build #111555 has finished for PR 25960 at commit
|
|
Retest this please. |
|
Test build #111558 has finished for PR 25960 at commit
|
| } else { | ||
| throw new HiveSQLException("Error running query: " + e.toString, e) | ||
| throw new HiveSQLException("Error running query: " + | ||
| SparkUtils.findFirstCause(e).toString, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkUtils.findFirstCause(e) -> org.apache.commons.lang3.exception.ExceptionUtils.getRootCause(e)?
|
Test build #111603 has finished for PR 25960 at commit
|
|
For consistency, should we do that in all Spark*Operation? with in all of them? |
|
@juliuszsompolski fixed. |
| setState(OperationState.FINISHED) | ||
| } catch { | ||
| case e: HiveSQLException => | ||
| case e: Throwable => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NonFatal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I think we may want to catch a Throwable.
E.g. InterruptedExpression is not catched by NonFatal, and we want to inform the HiveThriftServer2.listener about the error after an interrupt - this definitely can happen in SparkExecuteStatementOperation that is async and can be cancelled. After a ThreadDeath of OutOfMemoryError I think we also want to inform the HiveThriftServer2.listener to not get the query hanging in the UI, as I think the server would continue to go on (I think it won't bring the whole JVM down?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, we should list up InterruptedExpression here, too? IIUC the reason why we mainly use NonFatal in this case is not to catch NonLocalReturnControl. But, yea, this is not my area, so I think @wangyum could suggest more about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for Throwable.
Extractor of non-fatal Throwables. Will not match fatal errors like VirtualMachineError.
(for example, OutOfMemoryError and StackOverflowError, subclasses of VirtualMachineError), ThreadDeath, LinkageError, InterruptedException, ControlThrowable.
|
Test build #111664 has finished for PR 25960 at commit
|
| throw e.asInstanceOf[HiveSQLException] | ||
| } else { | ||
| throw new HiveSQLException("Error running query: " + e.toString, e) | ||
| val root = ExceptionUtils.getRootCause(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change it to?
setState(OperationState.ERROR)
e match {
case hiveException: HiveSQLException =>
logError(s"Error executing query with $statementId, currentState $currentState, ", e)
HiveThriftServer2.listener.onStatementError(
statementId, hiveException.getMessage, SparkUtils.exceptionString(hiveException))
throw hiveException
case _ =>
val rootCause = Option(ExceptionUtils.getRootCause(e)).getOrElse(e)
logError(
s"Error executing query with $statementId, currentState $currentState, ", rootCause)
HiveThriftServer2.listener.onStatementError(
statementId, rootCause.getMessage, SparkUtils.exceptionString(rootCause))
throw new HiveSQLException("Error running query: " + rootCause.toString, rootCause)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val rootCause = Option(ExceptionUtils.getRootCause(e)).getOrElse(e)
Return null only if the input e is null. Do we still add this option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the null checker, I've changed the code to above style. @wangyum
|
Retest this please. |
2 similar comments
|
Retest this please. |
|
Retest this please. |
|
Test build #111949 has finished for PR 25960 at commit
|
|
The UT could passed after #26028 merged. |
|
retest this please |
|
Test build #111991 has finished for PR 25960 at commit
|
|
Retest this please. |
|
Test build #112002 has finished for PR 25960 at commit
|
wangyum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @juliuszsompolski @srowen @maropu
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's hard to refactor this error handling vs copying it? seems OK.
|
LGTM. |
…cially enabled adaptive execution
### What changes were proposed in this pull request?
When adaptive execution is enabled, the Spark users who connected from JDBC always get adaptive execution error whatever the under root cause is. It's very confused. We have to check the driver log to find out why.
```shell
0: jdbc:hive2://localhost:10000> SELECT * FROM testData join testData2 ON key = v;
SELECT * FROM testData join testData2 ON key = v;
Error: Error running query: org.apache.spark.SparkException: Adaptive execution failed due to stage materialization failures. (state=,code=0)
0: jdbc:hive2://localhost:10000>
```
For example, a job queried from JDBC failed due to HDFS missing block. User still get the error message `Adaptive execution failed due to stage materialization failures`.
The easiest way to reproduce is changing the code of `AdaptiveSparkPlanExec`, to let it throws out an exception when it faces `StageSuccess`.
```scala
case class AdaptiveSparkPlanExec(
events.drainTo(rem)
(Seq(nextMsg) ++ rem.asScala).foreach {
case StageSuccess(stage, res) =>
// stage.resultOption = Some(res)
val ex = new SparkException("Wrapper Exception",
new IllegalArgumentException("Root cause is IllegalArgumentException for Test"))
errors.append(
new SparkException(s"Failed to materialize query stage: ${stage.treeString}", ex))
case StageFailure(stage, ex) =>
errors.append(
new SparkException(s"Failed to materialize query stage: ${stage.treeString}", ex))
```
### Why are the changes needed?
To make the error message more user-friend and more useful for query from JDBC.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Manually test query:
```shell
0: jdbc:hive2://localhost:10000> CREATE TEMPORARY VIEW testData (key, value) AS SELECT explode(array(1, 2, 3, 4)), cast(substring(rand(), 3, 4) as string);
CREATE TEMPORARY VIEW testData (key, value) AS SELECT explode(array(1, 2, 3, 4)), cast(substring(rand(), 3, 4) as string);
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.225 seconds)
0: jdbc:hive2://localhost:10000> CREATE TEMPORARY VIEW testData2 (k, v) AS SELECT explode(array(1, 1, 2, 2)), cast(substring(rand(), 3, 4) as int);
CREATE TEMPORARY VIEW testData2 (k, v) AS SELECT explode(array(1, 1, 2, 2)), cast(substring(rand(), 3, 4) as int);
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.043 seconds)
```
Before:
```shell
0: jdbc:hive2://localhost:10000> SELECT * FROM testData join testData2 ON key = v;
SELECT * FROM testData join testData2 ON key = v;
Error: Error running query: org.apache.spark.SparkException: Adaptive execution failed due to stage materialization failures. (state=,code=0)
0: jdbc:hive2://localhost:10000>
```
After:
```shell
0: jdbc:hive2://localhost:10000> SELECT * FROM testData join testData2 ON key = v;
SELECT * FROM testData join testData2 ON key = v;
Error: Error running query: java.lang.IllegalArgumentException: Root cause is IllegalArgumentException for Test (state=,code=0)
0: jdbc:hive2://localhost:10000>
```
Closes #25960 from LantaoJin/SPARK-29283.
Authored-by: lajin <lajin@ebay.com>
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
(cherry picked from commit fda4070)
Signed-off-by: Yuming Wang <wgyumg@gmail.com>
|
Merged to master and branch-3.0-preview. |
What changes were proposed in this pull request?
When adaptive execution is enabled, the Spark users who connected from JDBC always get adaptive execution error whatever the under root cause is. It's very confused. We have to check the driver log to find out why.
For example, a job queried from JDBC failed due to HDFS missing block. User still get the error message
Adaptive execution failed due to stage materialization failures.The easiest way to reproduce is changing the code of
AdaptiveSparkPlanExec, to let it throws out an exception when it facesStageSuccess.Why are the changes needed?
To make the error message more user-friend and more useful for query from JDBC.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually test query:
Before:
After: