-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new function decode_base64_utf8 and expr macro #14943
Conversation
meanwhile review, I am adding more tests. |
|
||
import javax.annotation.Nullable; | ||
|
||
public class DecodeBase64UTFOperatorConversion implements SqlOperatorConversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use DirectOperatorConversion
which will save you from having to implement toDruidExpression
if (toDecode.value() == null) { | ||
return ExprEval.of(null); | ||
} | ||
return new StringExpr(new String(StringUtils.decodeBase64String(toDecode.asString()), StandardCharsets.UTF_8)).eval(bindings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use StringUtils.fromUtf8(StringUtils.decodeBase64String(...))
@@ -144,4 +145,47 @@ public Object getLiteralValue() | |||
} | |||
} | |||
} | |||
|
|||
public static class StringDecodeBase64UTFExprMacro implements ExprMacroTable.ExprMacro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to implement getOutputType
, isLiteral
, isNullLiteral
, and getLiteralValue
similar to ComplexDecodeBase64Expression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General Q, can this be implemented as an extension of UnivariateFunction
? But then this is an expression and not exactly a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseScalarUnivariateMacroFunctionExpr
is the macro version of right thing to extend here since this is a single arugment function, +1 for re-using it since it would simplify things a bit here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -44,7 +44,8 @@ | |||
public class ExprMacroTable | |||
{ | |||
private static final List<ExprMacro> BUILT_IN = ImmutableList.of( | |||
new BuiltInExprMacros.ComplexDecodeBase64ExprMacro() | |||
new BuiltInExprMacros.ComplexDecodeBase64ExprMacro(), | |||
new BuiltInExprMacros.StringDecodeBase64UTFExprMacro() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't necessarily have to be in this PR, but it would be nice to add an alias for ComplexDecodeBase64ExprMacro
(and also the SQL layer), perhaps just by extending the class and overriding the usage of NAME
so it has consistent naming with this new function, e.g. decode_base64_complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be done registering an AliasedOperatorConversion similar to how SUBSTR or TRUNC is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would fix the SQL layer, but this is native expression macros, which do not currently have a generic aliasing thing here.
@pranavbhole what are your plans for this aliasing? (I just want to make sure this actually gets done whether or not in this PR, and not lost)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add aliasing in the followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be good to add
- a test in CalciteQueryTest
- Update website/.spelling with decode_base64_utf8
assertExpr("decode_base64_utf8('aGVsbG8=')", "hello"); | ||
assertExpr("decode_base64_utf8('V2hlbiBhbiBvbmlvbiBpcyBjdXQsIGNlcnRhaW4gKGxhY2hyeW1hdG9yKSBjb21wb3VuZHMgYXJlIHJlbGVhc2VkIGNhdXNpbmcgdGhlIG5lcnZlcyBhcm91bmQgdGhlIGV5ZXMgKGxhY3JpbWFsIGdsYW5kcykgdG8gYmVjb21lIGlycml0YXRlZC4=')", "When an onion is cut, certain (lachrymator) compounds are released causing the nerves around the eyes (lacrimal glands) to become irritated."); | ||
assertExpr("decode_base64_utf8('eyJ0ZXN0IjogMX0=')", "{\"test\": 1}"); | ||
assertExpr("decode_base64_utf8('')", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. this can be null in SQL compatible mode
601a87e
to
491e4b5
Compare
addressed comments. |
040c898
to
ad20520
Compare
LGTM. One question, is there a way say I have a column of encoded strings. Atm I understand the function works on string values. Not this PR but can we later try to make it work on a column too ? Will that be useful ? Something like
So might want to think about a better error message. But need not be done in context of this PR |
It works on the column as well, provided you have encoded base64 string value stored in column. |
ad20520
to
ed64879
Compare
c592237
to
304d973
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the aliasing will be done in a followup PR
Description
Adding a new function called decode_base64_utf8 that takes base64 encoded string, decode it and gives back the utf8 encoded string.
This PR has: