Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1011 Disable full stack trace when using spark connect #1012

Conversation

b1ackout
Copy link

@b1ackout b1ackout commented May 10, 2024

Describe your changes

  1. Updated is_non_sqlalchemy_error to handle pyspark.sql.utils.AnalysisException, thus when short_errors is enabled to show only the spark sql error and not the full stack trace

Issue number

Closes #1011

Checklist before requesting a review


📚 Documentation preview 📚: https://jupysql--1012.org.readthedocs.build/en/1012/

@b1ackout b1ackout requested a review from edublancas as a code owner May 10, 2024 10:16
raise exceptions.MissingPackageError("pysark not installed")

return SparkResultProxy(dataframe, dataframe.columns, should_cache)
try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please integrate this with short_errors:

short_errors = Bool(

by default, it should raise the exception, if short_errors is True, then just print it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was marked as resolved but I don't see any references to the short_errors option in your PR, am I missing anything?

return SparkResultProxy(dataframe, dataframe.columns, should_cache)
try:
return SparkResultProxy(dataframe, dataframe.columns, should_cache)
except AnalysisException as e:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this except is redundant, the except Exception as e can catch all exceptions

@edublancas edublancas added the feature Adds a new feature label May 21, 2024
@b1ackout b1ackout marked this pull request as draft May 23, 2024 07:34
src/sql/util.py Outdated
@@ -559,6 +559,7 @@ def is_non_sqlalchemy_error(error):
# Pyspark
"UNRESOLVED_ROUTINE",
"PARSE_SYNTAX_ERROR",
"AnalysisException",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through the code I think adding AnalysisException here will solve the issue since PARSE_SYNTAX_ERROR works as expected.
AnalysisException covers all these error conditions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just need to test it somehow. Will try to package the jupysql and install it in a spark environment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can install like this:

pip install git+https://github.com/b1ackout/jupysql@running-sql-using-sparkconnect-should-not-print-full-stack-trace

Copy link
Author

@b1ackout b1ackout May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it but no luck, the error message doesn't contain AnalysisException, only the sql error conditions listed here which are included in AnalysisException. So instead of included all these in the list (which could also be updated regularly) I think checking if the error is of instance of AnalysisException would be a lot cleaner.

Tested it and it works.
image

@@ -9,9 +9,9 @@


def handle_spark_dataframe(dataframe, should_cache=False):
"""Execute a ResultSet sqlaproxy using pysark module."""
"""Execute a ResultSet sqlaproxy using pyspark module."""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo

@@ -556,11 +562,14 @@ def is_non_sqlalchemy_error(error):
"pyodbc.ProgrammingError",
# Clickhouse errors
"DB::Exception:",
# Pyspark
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these as they are included in AnalysisException

@b1ackout b1ackout requested a review from edublancas May 27, 2024 09:25
@b1ackout b1ackout marked this pull request as ready for review May 27, 2024 09:26
@b1ackout b1ackout changed the title 1011 Running sql using sparkconnect should not print full stack trace 1011 Disable full stack trace when using spark connect Jul 8, 2024
@edublancas
Copy link

edublancas commented Jul 12, 2024

seems like you're still working on this PR, please review our contribution guidelines: https://ploomber-contributing.readthedocs.io/en/latest/contributing/responding-pr-review.html

specifically the point about addressing comments. please post links to the code changes so we can review them one by want. I also saw you marked some discussions as resolved, please unmark them so I can review them

@edublancas
Copy link

closing due to inactivity

@edublancas edublancas closed this Jul 18, 2024
if not DataFrame and not CDataFrame:
raise exceptions.MissingPackageError("pysark not installed")
raise exceptions.MissingPackageError("pyspark not installed")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo

Comment on lines +11 to +14
try:
from pyspark.sql.utils import AnalysisException
except ModuleNotFoundError:
AnalysisException = None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle the case where pyspark module is not installed

Comment on lines +566 to +572
is_pyspark_analysis_exception = (
isinstance(error, AnalysisException) if AnalysisException else False
)
return (
any(msg in str(error) for msg in specific_db_errors)
or is_pyspark_analysis_exception
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AnalysisException is imported then checks if the error is of instance of pyspark's Analysis Exception and handles it accordingly

@b1ackout
Copy link
Author

b1ackout commented Sep 2, 2024

edublancas sorry, I was away on paternity, I opened a new one: #1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants