This repository has been archived by the owner on May 14, 2020. It is now read-only.
Add word boundaries around values in SQL tautologies (942130) #1710
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The intention of this rule appears to be to find situations such as
1=1
,123=123
,1!=2
,123!=321
,'hello' NOT LIKE 'world'
. SQL expressions that will always evaluate to true - aka. tautologies.However, I believe the rule had a flaw. For example it would match
11=1
,1=11
, and fail to match1!=11
. I believe the reason is because the backreference\1
was given too much flexibility on what it could match. So for example given11=1
, when the regex engine arrives at the backreference, it seems to have the freedom to choose just any permutation of the referred capture group, so instead of choosing the whole11
, it can simply just choose1
. I think maybe the possessive quantifier++
was an attempt to solve this problem, but it doesn't work. I believe a solution is lock down this freedom by explicitly forcing word boundaries around the capture group([\d\w]+)
, so it becomes\b([\d\w]+)\b
. Likewise around the\1
backreference.The existing test case
"1" sSOUNDS LIKE "SOUNDS LIKE 1
it appears to me just kind of passed by chance, because of the above described bug. It would match so\1
becameSOUNDS
, and then refer back tosSOUNDS
but just choose the permutation of ignoring the first lower cases
. Experiment here: https://regex101.com/r/hyI0Iv/1 .This fix also has the side effect of solving the perf issue Airween brought up on the Slack channel a few days ago.