Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update REGEXP_MATCH scalar function to support Utf8View #14449

Merged
merged 2 commits into from
Feb 3, 2025

Conversation

Omega359
Copy link
Contributor

@Omega359 Omega359 commented Feb 3, 2025

Which issue does this PR close?

Closes #11911

Rationale for this change

Add support for utf8view to regexp_match function

What changes are included in this PR?

Code, tests, benchmarks.

Are these changes tested?

Yes

Are there any user-facing changes?

signature for regexp_match rust function (not the udf) changed to not include <T: OffsetSizeTrait>

I am unsure if that is an breaking api change or not. I suppose it should be.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions labels Feb 3, 2025
@Omega359 Omega359 marked this pull request as ready for review February 3, 2025 18:02
@alamb alamb added the api change Changes the API exposed to users of the crate label Feb 3, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @Omega359 ❤️

}
}
pub fn regexp_match<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
pub fn regexp_match(args: &[ArrayRef]) -> Result<ArrayRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree technically speaking this is an API change but I also think it is small and is ok. I will flag this PR as API change but I think it will be minimally disruptive

@@ -783,7 +783,7 @@ EXPLAIN SELECT
FROM test;
----
logical_plan
01)Projection: regexp_match(CAST(test.column1_utf8view AS Utf8), Utf8("^https?://(?:www\.)?([^/]+)/.*$")) AS k
01)Projection: regexp_match(test.column1_utf8view, Utf8View("^https?://(?:www\.)?([^/]+)/.*$")) AS k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -354,6 +377,29 @@ X
X
X

# test string view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #11911 (comment) it would be great to move these tests into string.slt but we can totally do it as a follow on as well

@@ -621,7 +667,7 @@ CREATE TABLE t_stringview AS
SELECT arrow_cast(str, 'Utf8View') as str, arrow_cast(pattern, 'Utf8View') as pattern, arrow_cast(start, 'Int64') as start, arrow_cast(flags, 'Utf8View') as flags FROM t;

query I
SELECT regexp_count(str, '\w') from t;
SELECT regexp_count(str, '\w') from t_stringview;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this looks like a driveby fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was. Couldn't resist.

@alamb
Copy link
Contributor

alamb commented Feb 3, 2025

Since I would like to include this with the 45.0.0 release I am going to merge it in now. Thanks again @Omega359 (and @tlm365 and @Weijun-H -- it takes a village!)

@alamb alamb merged commit 932439f into apache:main Feb 3, 2025
25 checks passed
@alamb alamb mentioned this pull request Feb 3, 2025
32 tasks
alamb pushed a commit to alamb/datafusion that referenced this pull request Feb 3, 2025
* Update REGEXP_MATCH scalar function to support Utf8View

* Cargo fmt fix.
alamb added a commit that referenced this pull request Feb 3, 2025
)

* Update REGEXP_MATCH scalar function to support Utf8View

* Cargo fmt fix.

Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update REGEXP_MATCH scalar function to support Utf8View
2 participants