Skip to content

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Feb 4, 2023

What changes were proposed in this pull request?

Introduces base hierarchy to exceptions.

As a common hierarchy for users, base exception classes are subclasses of PySparkException.
The concrete classes for both PySpark and Spark Connect inherits the base classes that should not be exposed to users.

Why are the changes needed?

Currently exception class hierarchy is separated between PySpark and Spark Connect.

If users want to check the exception type, they need to switch the error classes based on whether they are running on PySpark or Spark Connect, but it's not ideal.

Does this PR introduce any user-facing change?

No. Users still can use the existing exception classes to check the exception type.

How was this patch tested?

Updated tests.

@ueshin ueshin changed the title Introduce base hierarchy to exceptions. [SPARK-42342][CONNECT] Introduce base hierarchy to exceptions Feb 4, 2023
@ueshin ueshin marked this pull request as ready for review February 7, 2023 07:39
@ueshin
Copy link
Member Author

ueshin commented Feb 7, 2023

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-42342][CONNECT] Introduce base hierarchy to exceptions [SPARK-42342][PYTHON][CONNECT] Introduce base hierarchy to exceptions Feb 8, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side. Thank you, @ueshin .

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Jut left nit question for my understanding.

msg = je.toString().split(": ", 1)[1] # Drop the Java StreamingQueryException type info
stackTrace = "\n\t at ".join(map(lambda x: x.toString(), je.getStackTrace()))
return StreamingQueryException(msg, stackTrace, je.getCause())
return CapturedStreamingQueryException(msg, stackTrace, je.getCause())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clear understanding, will it raise StreamingQueryException in user space, right ?
I just want to clarify because we don't use such an alias for connect exceptions for example:

from pyspark.errors.exceptions.connect import (
    AnalysisException as ConnectAnalysisException
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name StreamingQueryException is already imported for the return type annotation, so we just need to use another name.

from pyspark.errors import StreamingQueryException
from pyspark.errors.exceptions.captured import (
StreamingQueryException as CapturedStreamingQueryException,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed it.
LGTM!


class ColumnParityTests(ColumnTestsMixin, ReusedConnectTestCase):
# TODO(SPARK-42017): Different error type AnalysisException vs SparkConnectAnalysisException
# TODO(SPARK-42017): df["bad_key"] does not raise AnalysisException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

finally:
sq.stop()
self.assertTrue(type(sq.exception()) is StreamingQueryException)
self.assertIsInstance(sq.exception(), StreamingQueryException)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.4.

HyukjinKwon pushed a commit that referenced this pull request Feb 8, 2023
### What changes were proposed in this pull request?

Introduces base hierarchy to exceptions.

As a common hierarchy for users, base exception classes are subclasses of `PySparkException`.
The concrete classes for both PySpark and Spark Connect inherits the base classes that should not be exposed to users.

### Why are the changes needed?

Currently exception class hierarchy is separated between PySpark and Spark Connect.

If users want to check the exception type, they need to switch the error classes based on whether they are running on PySpark or Spark Connect, but it's not ideal.

### Does this PR introduce _any_ user-facing change?

No. Users still can use the existing exception classes to check the exception type.

### How was this patch tested?

Updated tests.

Closes #39882 from ueshin/issues/SPARK-42342/exceptions.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit bd34b16)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Feb 15, 2023
…ise_error to call the proper test

### What changes were proposed in this pull request?

This is a follow-up of #39882.

Fixes `FunctionsParityTests.test_raise_error` to call the proper test.

### Why are the changes needed?

`FunctionsParityTests.test_raise_error` should've called `check_raise_error`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The fixed test.

Closes #40021 from ueshin/issues/SPARK-42342/test.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Feb 15, 2023
…ise_error to call the proper test

### What changes were proposed in this pull request?

This is a follow-up of #39882.

Fixes `FunctionsParityTests.test_raise_error` to call the proper test.

### Why are the changes needed?

`FunctionsParityTests.test_raise_error` should've called `check_raise_error`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The fixed test.

Closes #40021 from ueshin/issues/SPARK-42342/test.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3ed1b95)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?

Introduces base hierarchy to exceptions.

As a common hierarchy for users, base exception classes are subclasses of `PySparkException`.
The concrete classes for both PySpark and Spark Connect inherits the base classes that should not be exposed to users.

### Why are the changes needed?

Currently exception class hierarchy is separated between PySpark and Spark Connect.

If users want to check the exception type, they need to switch the error classes based on whether they are running on PySpark or Spark Connect, but it's not ideal.

### Does this PR introduce _any_ user-facing change?

No. Users still can use the existing exception classes to check the exception type.

### How was this patch tested?

Updated tests.

Closes apache#39882 from ueshin/issues/SPARK-42342/exceptions.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit bd34b16)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…ise_error to call the proper test

### What changes were proposed in this pull request?

This is a follow-up of apache#39882.

Fixes `FunctionsParityTests.test_raise_error` to call the proper test.

### Why are the changes needed?

`FunctionsParityTests.test_raise_error` should've called `check_raise_error`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The fixed test.

Closes apache#40021 from ueshin/issues/SPARK-42342/test.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3ed1b95)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants