-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-44983][SQL] Convert binary to string by to_char for the formats: hex, base64, utf-8
#42632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
to_char for the formats: hex, base64, utf-8to_char for the formats: hex, base64, utf-8
to_char for the formats: hex, base64, utf-8to_char for the formats: hex, base64, utf-8
| exception = intercept[AnalysisException] { | ||
| df2.select(func(col("input"), col("format"))).collect() | ||
| }, | ||
| errorClass = "_LEGACY_ERROR_TEMP_1100", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Open an JIRA to assign proper name for the error class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the PR #42737
| val numArgs = expressions.length | ||
| if (expressions.length == 2) { | ||
| val inputExpr = expressions.head | ||
| val (inputExpr, format) = (expressions(0), expressions(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit weird, we can just write 2 lines
val inputExpr = expressions(0)
val format = expressions(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it is the right approach to split semantically one operation: "assign names".
| } | ||
|
|
||
| def binaryFormatError(funcName: String, invalidFormat: String): Throwable = { | ||
| new AnalysisException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that, if the error is only thrown in one place, we don't need to add a method here, just throw new AnalysisException... in the caller side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of purposes of gathering exceptions to Query*Errors to don't depend on forming exceptions: errors classes, quoting, context and so on. The caller has to provide only valuable info and don't worry about any technical stuff.
Don't think that this PR is right place to begin don't do that.
|
All GAs passed. Merging to master. |
…`to_char`/`to_varchar` ### What changes were proposed in this pull request? In the PR, I propose to document the recent changes related to the `format` of the `to_char`/`to_varchar` functions: 1. binary formats added by #42632 2. datetime formats introduced by #42534 ### Why are the changes needed? To inform users about recent changes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42801 from MaxGekk/doc-to_char-api. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
In the PR, I propose to re-use the
Hex,Base64andDecodeexpressions in theToCharacter(theto_char/to_varcharfunctions) when theformatparameter is one ofhex,base64andutf-8.Why are the changes needed?
To make the migration to Spark SQL easier from the systems like:
Does this PR introduce any user-facing change?
No. This PR extends existing API. It might be considered as an user-facing change only if user's code depends on errors in the case of wrong formats.
How was this patch tested?
By running new examples:
and new tests:
Was this patch authored or co-authored using generative AI tooling?
No.