Skip to content
This repository has been archived by the owner on May 14, 2020. It is now read-only.

Add word boundaries around values in SQL tautologies (942130) #1710

Merged
merged 2 commits into from
May 5, 2020

Conversation

allanrbo
Copy link
Contributor

@allanrbo allanrbo commented Mar 4, 2020

The intention of this rule appears to be to find situations such as 1=1, 123=123, 1!=2, 123!=321, 'hello' NOT LIKE 'world'. SQL expressions that will always evaluate to true - aka. tautologies.

However, I believe the rule had a flaw. For example it would match 11=1, 1=11, and fail to match 1!=11. I believe the reason is because the backreference \1 was given too much flexibility on what it could match. So for example given 11=1, when the regex engine arrives at the backreference, it seems to have the freedom to choose just any permutation of the referred capture group, so instead of choosing the whole 11, it can simply just choose 1. I think maybe the possessive quantifier ++ was an attempt to solve this problem, but it doesn't work. I believe a solution is lock down this freedom by explicitly forcing word boundaries around the capture group ([\d\w]+), so it becomes \b([\d\w]+)\b. Likewise around the \1 backreference.

The existing test case "1" sSOUNDS LIKE "SOUNDS LIKE 1 it appears to me just kind of passed by chance, because of the above described bug. It would match so \1 became SOUNDS, and then refer back to sSOUNDS but just choose the permutation of ignoring the first lower case s. Experiment here: https://regex101.com/r/hyI0Iv/1 .

This fix also has the side effect of solving the perf issue Airween brought up on the Slack channel a few days ago.

@allanrbo
Copy link
Contributor Author

allanrbo commented Mar 5, 2020

wow... just by chance realized that there actually used to be some \b word boundaries in the old CRS v2.2. Somehow it got removed during the switch to v3.0. See here. That one was only on one side of the values though (like \b([\d\w]++) rather than \b([\d\w]++)\b. Maybe it's enough only having it on one side - I am unsure. But it seems safer to have on both I think.

@dune73
Copy link
Contributor

dune73 commented Mar 5, 2020

Wow. Well spotted.

There was a manual consolidation process by Ryan Barnett before we took over the project. It probably got dropped during that period.

@allanrbo
Copy link
Contributor Author

allanrbo commented Mar 6, 2020

Looked a bit more in the git history and realized that it wasn't removed from 2.2 to 3.0, but rather it was fixed in 2.2 after the 3.0 branching, and never ported to 3.x: 6a87ace

@dune73
Copy link
Contributor

dune73 commented Mar 6, 2020

A most welcome PR then. :)

@dune73
Copy link
Contributor

dune73 commented Mar 6, 2020

Also, this was very early in the 3.0 development. This makes me wonder, what else Ryan / we missed there. Ryan quit the project in Autumn 2015.

@franbuehler franbuehler self-requested a review April 6, 2020 19:14
@franbuehler franbuehler self-assigned this Apr 6, 2020
@franbuehler
Copy link
Contributor

@franbuehler will review this PR.

Copy link
Contributor

@franbuehler franbuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well spotted, thanks for this PR @allanrbo !
I reviewed and tested this PR and it works as intended.
This PR is ready to be merged.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants