-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception #39947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d7ecbd5 to
b47faf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, it complains the header length (8KiB limit). It can be configured below via NettyChannelBuilder.maxInboundMessageSize but I didn't change it here, see also https://stackoverflow.com/a/686243
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Am I understand correctly the header limit is for the HTTP header? Can we put it in the body? Is there still a ~4K limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we could put it in the body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? We have the abbreviated message already in the body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, it was already there before in metadata to print out the stacktrace from the server to the client. The exception was thrown when the size of stacktrace was too big so I made the stractrace string truncated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original AnalysisException.getMessage contains the string representation of the underlying plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
spark.range(1).select("a").show()Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
print(self._show_string(n, truncate, vertical))
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
pdf = DataFrame.withPlan(
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
return self._session.client.to_pandas(query)
File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
table, metrics = self._execute_and_fetch(req)
File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
self._handle_error(rpc_error)
File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
'Project ['a]
+- Range (0, 1, step=1, splits=Some(16))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the captured one's stack trace like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Captured one:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 2987, in select
jdf = self._jdf.select(self._jcols(*cols))
File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/.../sparkk/python/pyspark/errors/exceptions/captured.py", line 159, in deco
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
'Project ['a]
+- Range (0, 1, step=1, splits=Some(16))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, should we still show the plan to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, yeah. It still shows the plan. This is the part of getMessage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. 👍
21fa863 to
eb13838
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
from pyspark.sql.functions import udf
@udf
def aa(a):
1/0
spark.range(1).select(aa("id")).show()Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
print(self._show_string(n, truncate, vertical))
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
pdf = DataFrame.withPlan(
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
return self._session.client.to_pandas(query)
File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
table, metrics = self._execute_and_fetch(req)
File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
self._handle_error(rpc_error)
File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "<stdin>", line 3, in aa
ZeroDivisionError: division by zero
2435bb9 to
fed524b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getLocalizedMessage is not used in our codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the doc of setMessage mentions that it's fine to send non-localized errors (and expect the client to localize it).
f4d4cf6 to
041458c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is too long, and it gets abbreviated in Connect case.
041458c to
58a1a9f
Compare
...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the captured one's stack trace like?
...connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
Outdated
Show resolved
Hide resolved
58a1a9f to
155eb78
Compare
155eb78 to
25ab26b
Compare
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
|
Merged to master and branch-3.4. |
…n throwing an exception ### What changes were proposed in this pull request? This PR proposes to take the super classes into account when throwing an exception from the server to Python side by adding more metadata of classes, causes and traceback in JVM. In addition, this PR matches the exceptions being thrown to the regular PySpark exceptions defined: https://github.com/apache/spark/blob/04550edd49ee587656d215e59d6a072772d7d5ec/python/pyspark/errors/exceptions/captured.py#L108-L147 ### Why are the changes needed? Right now, many exceptions cannot be handled (e.g., `NoSuchDatabaseException` that inherits `AnalysisException`) in Python side. ### Does this PR introduce _any_ user-facing change? No to end users. Yes, it matches the exceptions to the regular PySpark exceptions. ### How was this patch tested? Unittests fixed. Closes #39947 from HyukjinKwon/SPARK-41715. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit c5230e4) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
| -------- | ||
| >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"]) | ||
| >>> df.freqItems(["c1", "c2"]).show() # doctest: +SKIP | ||
| >>> df.freqItems(["c1", "c2"]).show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @HyukjinKwon and @ueshin . Unfortunately, this broke Scala 2.13 CI. I made a followup.
…octest due to Scala 2.13 failure ### What changes were proposed in this pull request? This is a follow-up of #39947 to ignore `freqItems` doctest back. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #39983 from dongjoon-hyun/SPARK-40453. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…octest due to Scala 2.13 failure ### What changes were proposed in this pull request? This is a follow-up of #39947 to ignore `freqItems` doctest back. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #39983 from dongjoon-hyun/SPARK-40453. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit d703808) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…n throwing an exception ### What changes were proposed in this pull request? This PR proposes to take the super classes into account when throwing an exception from the server to Python side by adding more metadata of classes, causes and traceback in JVM. In addition, this PR matches the exceptions being thrown to the regular PySpark exceptions defined: https://github.com/apache/spark/blob/04550edd49ee587656d215e59d6a072772d7d5ec/python/pyspark/errors/exceptions/captured.py#L108-L147 ### Why are the changes needed? Right now, many exceptions cannot be handled (e.g., `NoSuchDatabaseException` that inherits `AnalysisException`) in Python side. ### Does this PR introduce _any_ user-facing change? No to end users. Yes, it matches the exceptions to the regular PySpark exceptions. ### How was this patch tested? Unittests fixed. Closes apache#39947 from HyukjinKwon/SPARK-41715. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit c5230e4) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…octest due to Scala 2.13 failure ### What changes were proposed in this pull request? This is a follow-up of apache#39947 to ignore `freqItems` doctest back. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes apache#39983 from dongjoon-hyun/SPARK-40453. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit d703808) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR proposes to take the super classes into account when throwing an exception from the server to Python side by adding more metadata of classes, causes and traceback in JVM.
In addition, this PR matches the exceptions being thrown to the regular PySpark exceptions defined:
spark/python/pyspark/errors/exceptions/captured.py
Lines 108 to 147 in 04550ed
Why are the changes needed?
Right now, many exceptions cannot be handled (e.g.,
NoSuchDatabaseExceptionthat inheritsAnalysisException) in Python side.Does this PR introduce any user-facing change?
No to end users.
Yes, it matches the exceptions to the regular PySpark exceptions.
How was this patch tested?
Unittests fixed.