-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-7205 : [C++][Gandiva] Implement regexp_like in Gandiva #8844
Conversation
960406e
to
6a0f5cb
Compare
@pravindra @Praveen2112 @projjal Got some time for a review? 😁 |
d4608a9
to
356c300
Compare
6a0f5cb
to
a565f19
Compare
Hi @wjones127. Looks like creating a base holder class and two derived class for just these two functions seems overkill to me (I think thats why I closed the original pr temporarily). The only difference in logic between the two functions is that in case of "like" we convert the sql pattern to regex pattern before storing the compiled pattern. Would it make sense if we pass a flag to the holder that says if this is sql pattern or regex patttern? |
Projjal, that's a good point. The SQL like function is basically a wrapper around I will consolidate them into a single |
a565f19
to
41133b9
Compare
cpp/src/gandiva/like_holder.cc
Outdated
std::shared_ptr<LikeHolder>* holder) { | ||
std::string pcre_pattern; | ||
ARROW_RETURN_NOT_OK(RegexUtil::SqlLikePatternToPcre(sql_pattern, pcre_pattern)); | ||
Result<std::string> RegexpMatchesHolder::GetPattern(const FunctionNode& node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI This looks to be the first use of Arrow::Result
in the Gandiva codebase, but this seems to be what the main arrow C++ codebase is using recently. Let me know if you object.
@@ -29,7 +29,7 @@ namespace gandiva { | |||
/// \brief Utility class for converting sql patterns to pcre patterns. | |||
class GANDIVA_EXPORT RegexUtil { | |||
public: | |||
// Convert an sql pattern to a pcre pattern | |||
// Convert an sql pattern to a pcre pattern for use with PartialMatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, like
was using RE2::FullMatch
. However, regexp_matches
needs to use RE2::PartialMatch
, so I've modified this method to make partial match statements. I have added a new set of tests to validate it.
I didn't see any use of this utility method outside of the like
implementation, so I thought this change would be okay.
cpp/src/gandiva/regex_util.cc
Outdated
// Escape any char that is special for pcre regex | ||
if (pcre_regex_specials_.find(cur) != pcre_regex_specials_.end()) { | ||
if (pcre_regex_specials_.find(cur) != pcre_regex_specials_.end() && cur != escape_char) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change enables using \
as an escape character in LIKE statements. For example, col like '\_%'
can match anything starting with an underscore.
I think we should merge #9700 before this one, since my refactor of |
@wjones127 #9700 has merged, are you planning to pick this back up? |
Yes, I'll work on it this weekend. |
55d96b3
to
a21742f
Compare
@projjal This is now ready for review. Since we've added |
fixed linting issues fixed check style issues Fixed some names Remove extra whitespace Add substr short-ciruit for regexp_like Refactor to share logic between like and rlike Fix style issues Fix segfault Fix SqlLikePatternToPcre to use partial matching Fix style issues and warning Fix formatting; ran the docker image and used clang-format
a21742f
to
7621eed
Compare
Was looking for "rlike" support in Gandiva and found we had nearly implementing it. This PR revives #5860.