Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 23, 2023

What changes were proposed in this pull request?

In the PR, I propose to re-use the Hex, Base64 and Decode expressions in the ToCharacter (the to_char/to_varchar functions) when the format parameter is one of hex, base64 and utf-8.

Why are the changes needed?

To make the migration to Spark SQL easier from the systems like:

Does this PR introduce any user-facing change?

No. This PR extends existing API. It might be considered as an user-facing change only if user's code depends on errors in the case of wrong formats.

How was this patch tested?

By running new examples:

$ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite"

and new tests:

$ build/sbt "test:testOnly *.StringFunctionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Aug 23, 2023
@github-actions github-actions bot added the DOCS label Aug 26, 2023
@MaxGekk MaxGekk changed the title [WIP][SQL] Convert binary to string by to_char for the formats: hex, base64, utf-8 [WIP][SPARK-44983][SQL] Convert binary to string by to_char for the formats: hex, base64, utf-8 Aug 28, 2023
@MaxGekk MaxGekk changed the title [WIP][SPARK-44983][SQL] Convert binary to string by to_char for the formats: hex, base64, utf-8 [SPARK-44983][SQL] Convert binary to string by to_char for the formats: hex, base64, utf-8 Aug 28, 2023
@MaxGekk MaxGekk marked this pull request as ready for review August 28, 2023 07:51
exception = intercept[AnalysisException] {
df2.select(func(col("input"), col("format"))).collect()
},
errorClass = "_LEGACY_ERROR_TEMP_1100",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Open an JIRA to assign proper name for the error class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the PR #42737

val numArgs = expressions.length
if (expressions.length == 2) {
val inputExpr = expressions.head
val (inputExpr, format) = (expressions(0), expressions(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird, we can just write 2 lines

val inputExpr = expressions(0)
val format = expressions(1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it is the right approach to split semantically one operation: "assign names".

}

def binaryFormatError(funcName: String, invalidFormat: String): Throwable = {
new AnalysisException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that, if the error is only thrown in one place, we don't need to add a method here, just throw new AnalysisException... in the caller side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of purposes of gathering exceptions to Query*Errors to don't depend on forming exceptions: errors classes, quoting, context and so on. The caller has to provide only valuable info and don't worry about any technical stuff.

Don't think that this PR is right place to begin don't do that.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 28, 2023

All GAs passed. Merging to master.
Thank you, @HyukjinKwon @cloud-fan for review.

@MaxGekk MaxGekk closed this in 4946d02 Aug 28, 2023
MaxGekk added a commit that referenced this pull request Sep 6, 2023
…`to_char`/`to_varchar`

### What changes were proposed in this pull request?
In the PR, I propose to document the recent changes related to the `format` of the `to_char`/`to_varchar` functions:
1. binary formats added by #42632
2. datetime formats introduced by #42534

### Why are the changes needed?
To inform users about recent changes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42801 from MaxGekk/doc-to_char-api.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants