Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] REGEXP_REPLACE as LikeExpression #2041

Closed
ssteinhauser opened this issue Jul 17, 2024 · 3 comments
Closed

[BUG] REGEXP_REPLACE as LikeExpression #2041

ssteinhauser opened this issue Jul 17, 2024 · 3 comments
Assignees

Comments

@ssteinhauser
Copy link
Contributor

I am about to implement the Exasol syntax. For this I'd like to implement REGEXP_LIKE as LikeExpression (see https://docs.exasol.com/db/latest/sql_references/predicates/not_regexp_like.htm). Since it is also allowed to be a function in Oracle (as already implemented), it needs to be added to the whitelisted keywords.
Unfortunately, the following condition prevents it from being added to RelObjectNameWithoutValue since REGEXP_LIKE contains an underscore which is not allwed here:
tokenValue.matches("[A-Za-z]+")

if (CHARSET_ENCODER.canEncode(tokenValue) && tokenValue.matches("[A-Za-z]+")) {

What was the reason for adding this condition and how can or should I proceed to support REGEXP_LIKE as LikeExpression?

@manticore-projects
Copy link
Contributor

Greetings!

Thank you for your interest and contribution.
In genera this syntax is supported already and the following works:

SELECT 'My mail address is my_mail@exasol.com'
       RLIKE '(?i).*[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}.*'
       AS contains_email;

Now we only need to register REGEXP_LIKE as an Like Operator (like we did for RLIKE yet).
So my advise is: follow the RLIKE implementation and also take reference to the EXTRACT production (which also can be a normal function as well as a special function). Something like this will work for the REGEXP_LIKE operator too.

I will be able to help you on this by next week only since I am somewhere between Jakarta and Lagos right now. Cheers.

@ssteinhauser
Copy link
Contributor Author

ssteinhauser commented Jul 19, 2024

This was also my first intension to implement it similar to RLIKE, so I've added a new token <K_REGEXP_LIKE: "REGEXP_LIKE"> and added it as an option to LikeExpression as well as to the Java implementation.

But the problem is, that the updateKeywords task won't whitelist the REGEXP_LIKE token (unlike EXTRACT), because it contains an Underscore. The condition tokenValue.matches("[A-Za-z]+") in the following line prevents it from being whitelisted for function names:

if (CHARSET_ENCODER.canEncode(tokenValue) && tokenValue.matches("[A-Za-z]+")) {

If there is no specific reason for the restriction to alphabetical characters only, I would suggest to match this regex: \w+

@ssteinhauser
Copy link
Contributor Author

I've included an implementation of my suggestion in the PR #2044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants