-
Notifications
You must be signed in to change notification settings - Fork 109
Replace one regex by pool of regex matchers. #557
Conversation
sql/expression/comparison.go
Outdated
if err != nil { | ||
return false, err | ||
if re.pool == nil { | ||
re.pool = &sync.Pool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work when right is a column, for example. Because we will keep the same right
for all cases. This just works for the case when right
is constant.
Imagine we have: foo REGEXP bar
First eval:
- No pool, create a pool where bar is
1
Second eval:
- There's a pool, get a matcher whose regexp will be
1
, but bar is2
now. The result won't be correct.
There's an example in LIKE, I think, showing how to check if expr is constant to use a different path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but the question is - if it had worked for us?
Because I didn't change any logic. Before we had a just one matcher, now we have a synced pool, so the same bad things may happen. Generally, once you create an Regexp
instance, right expression cannot mutate (because we keep pre-compiled matcher).
I think we should handle this on higher level, because compiling regex every time may have a really bad impact on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behaviour might have been wrong before, but this still needs to be addressed.
How would we handle this on a higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so maybe for comparison
and for like
we'll take following approach:
var canCacheRegex = true
Inspect(right, func(e sql.Expression) bool {
if _, ok := e.(*GetField); ok {
canCacheRegex = false
}
return true
})
So if you can't cache it I'll create a new matcher every time (like it's done in like), but because it will impact performance, so for cache-like expressions we'll go with a pool.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean that we can have regexp patterns in a column? All the uses and tests were done with constant regexp patterns:
select
files.file_path,
ref_commits.repository_id,
files.blob_content
FROM
files
NATURAL JOIN
commit_files
NATURAL JOIN
ref_commits
WHERE
ref_commits.ref_name = 'HEAD'
AND ref_commits.history_index BETWEEN 0 AND 5
AND is_binary(blob_content) = false
AND files.file_path NOT REGEXP '^vendor.*'
AND (
blob_content REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]'
OR blob_content REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]'
OR blob_content REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]'
OR blob_content REGEXP 'AKIA[0-9A-Z]{16}'
OR blob_content REGEXP '(?i)reddit.*[\'\\"][0-9a-zA-Z]{14}[\'\\"]'
OR blob_content REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
OR blob_content REGEXP '.*-----BEGIN PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN RSA PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN DSA PRIVATE KEY-----.*'
OR blob_content REGEXP '.*-----BEGIN OPENSSH PRIVATE KEY-----.*'
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can, per example, have as regexp input a column. That means the regular expression could be different every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should add a rule to check this cases; If the regular expression input is a Literal, then use always the same matcher. If not, use a different matcher per Eval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erizocosmico, @ajnavarro, @jfontan - how about this - PTAL.
For non-cached, every time create a new matcher. For cached, check the pool of matchers
Signed-off-by: kuba-- <kuba@sourced.tech>
Signed-off-by: kuba-- kuba@sourced.tech
First approach to close src-d/gitbase#544