Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50511][PYTHON][FOLLOWUP] Avoid wrapping streaming Python data source error messages #49532

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR is a follow up for #49092. It removes the extra try catch during streaming Python data source execution.

Why are the changes needed?

To make the error message more user-friendly and avoid nested error messages:

error1
During handling of the above exception, another exception occurred:
error2

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

Was this patch authored or co-authored using generative AI tooling?

no

@allisonwang-db
Copy link
Contributor Author

cc @HyukjinKwon

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @allisonwang-db .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the removal of PySparkRuntimeError causes two Python linter error. Could you fix them, @allisonwang-db ?

./python/pyspark/sql/streaming/python_streaming_source_runner.py:24:1: F401 'pyspark.errors.PySparkRuntimeError' imported but unused
from pyspark.errors import IllegalArgumentException, PySparkAssertionError, PySparkRuntimeError
^
./python/pyspark/sql/worker/python_streaming_sink_runner.py:23:1: F401 'pyspark.errors.PySparkRuntimeError' imported but unused
from pyspark.errors import PySparkAssertionError, PySparkRuntimeError
^
2     F401 'pyspark.errors.PySparkRuntimeError' imported but unused

dongjoon-hyun
dongjoon-hyun previously approved these changes Jan 17, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix. Pending CIs.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @allisonwang-db , please rebase this PR once more and fix the unit test.

[info] *** 4 TESTS FAILED ***
[error] Failed: Total 12090, Failed 4, Errors 0, Passed 12086, Ignored 33, Canceled 1
[error] Failed tests:
[error] 	org.apache.spark.sql.execution.python.PythonStreamingDataSourceSimpleSuite
[error] 	org.apache.spark.sql.execution.python.PythonStreamingDataSourceSuite
[error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful

@dongjoon-hyun dongjoon-hyun dismissed their stale review January 24, 2025 05:08

Stale review.

@allisonwang-db allisonwang-db force-pushed the spark-50511-streaming-pyds-err branch from f6a66c1 to 2d0a638 Compare March 5, 2025 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants