-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update REGEXP_MATCH scalar function to support Utf8View #14449
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @Omega359 ❤️
} | ||
} | ||
pub fn regexp_match<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
pub fn regexp_match(args: &[ArrayRef]) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree technically speaking this is an API change but I also think it is small and is ok. I will flag this PR as API change but I think it will be minimally disruptive
@@ -783,7 +783,7 @@ EXPLAIN SELECT | |||
FROM test; | |||
---- | |||
logical_plan | |||
01)Projection: regexp_match(CAST(test.column1_utf8view AS Utf8), Utf8("^https?://(?:www\.)?([^/]+)/.*$")) AS k | |||
01)Projection: regexp_match(test.column1_utf8view, Utf8View("^https?://(?:www\.)?([^/]+)/.*$")) AS k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -354,6 +377,29 @@ X | |||
X | |||
X | |||
|
|||
# test string view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #11911 (comment) it would be great to move these tests into string.slt but we can totally do it as a follow on as well
@@ -621,7 +667,7 @@ CREATE TABLE t_stringview AS | |||
SELECT arrow_cast(str, 'Utf8View') as str, arrow_cast(pattern, 'Utf8View') as pattern, arrow_cast(start, 'Int64') as start, arrow_cast(flags, 'Utf8View') as flags FROM t; | |||
|
|||
query I | |||
SELECT regexp_count(str, '\w') from t; | |||
SELECT regexp_count(str, '\w') from t_stringview; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this looks like a driveby fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it was. Couldn't resist.
* Update REGEXP_MATCH scalar function to support Utf8View * Cargo fmt fix.
Which issue does this PR close?
Closes #11911
Rationale for this change
Add support for utf8view to regexp_match function
What changes are included in this PR?
Code, tests, benchmarks.
Are these changes tested?
Yes
Are there any user-facing changes?
signature for regexp_match rust function (not the udf) changed to not include <T: OffsetSizeTrait>
I am unsure if that is an breaking api change or not. I suppose it should be.