Skip to content

Conversation

isimluk
Copy link

@isimluk isimluk commented Aug 4, 2025

Rationale for this change

People ask for regexp_extract in datafusion.

What changes are included in this PR?

This PR is fairly isolated and brings https://spark.apache.org/docs/latest/api/sql/#regexp_extract
from spark to datafusion-spark.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, spark's regexp_extract() is now implemented.

@github-actions github-actions bot added the spark label Aug 4, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @isimluk for your PR, I would propose to implement this function in Datafusion and then reuse in Spark? would that work?

Comment on lines +40 to +42
/// REGEXP_EXTRACT function extracts the first string in the str that match
/// the regexp expression and corresponding to the regex group index.
/// <https://spark.apache.org/docs/latest/api/sql/#regexp_extract>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding documentation to explain that there are differences between Java's regex engine and Rust's regex engine, so users should not expect 100% compatibility

@Omega359
Copy link
Contributor

Note that there was a previous PR for regexp_extract (#14282) - I don't know how it compares to this PR but just wanted to make it known.

@alamb alamb removed the spark label Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants