Closed
Description
Having the following query:
explain format=tree SELECT
files.`repository_id`
FROM files
INNER JOIN
commit_files ON
(((files.`repository_id` = commit_files.`repository_id`)
AND (files.`file_path` = commit_files.`file_path`))
AND (files.`blob_hash` = commit_files.`blob_hash`))
AND (files.`tree_hash` = commit_files.`tree_hash`)
INNER JOIN ref_commits ON
(commit_files.`repository_id` = ref_commits.`repository_id`)
AND (commit_files.`commit_hash` = ref_commits.`commit_hash`)
WHERE
files.`file_path` IS NOT NULL
AND ref_commits.`ref_name` IS NOT NULL
AND ref_commits.`history_index` IS NOT NULL
AND ref_commits.`ref_name` REGEXP '^refs/heads/HEAD/'
AND ref_commits.`history_index` = 0
AND (NOT (files.`file_path` REGEXP '^vendor.*'))
AND ((((files.`blob_content` REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]')
OR (files.`blob_content` REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]'))
OR (files.`blob_content` REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]'))
OR ((files.`blob_content` REGEXP 'AKIA[0-9A-Z]{16}')
OR (files.`blob_content` REGEXP '(?i)reddit.*[\'\\"][0-9a-zA-Z]{14}[\'\\"]')))
OR ((((files.`blob_content` REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}')
OR (files.`blob_content` REGEXP '.*-----BEGIN PRIVATE KEY-----.*'))
OR (files.`blob_content` REGEXP '.*-----BEGIN RSA PRIVATE KEY-----.*'))
OR ((files.`blob_content` REGEXP '.*-----BEGIN DSA PRIVATE KEY-----.*')
OR (files.`blob_content` REGEXP '.*-----BEGIN OPENSSH PRIVATE KEY-----.*')));
The output plan is the following one:
plan: Exchange(parallelism=8)
plan: └─ Project(files.repository_id)
plan: └─ Filter(
NOT(files.file_path IS NULL)
AND NOT(ref_commits.ref_name IS NULL)
AND NOT(ref_commits.history_index IS NULL)
AND ref_commits.ref_name REGEXP "^refs/heads/HEAD/"
AND ref_commits.history_index = 0
AND NOT(files.file_path REGEXP "^vendor.*")
AND files.blob_content REGEXP "(?i)facebook.*['\\\"][0-9a-f]{32}['\\\"]"
OR files.blob_content REGEXP "(?i)twitter.*['\\\"][0-9a-zA-Z]{35,44}['\\\"]"
OR files.blob_content REGEXP "(?i)github.*['\\\"][0-9a-zA-Z]{35,40}['\\\"]"
OR files.blob_content REGEXP "AKIA[0-9A-Z]{16}"
OR files.blob_content REGEXP "(?i)reddit.*['\\\"][0-9a-zA-Z]{14}['\\\"]"
OR files.blob_content REGEXP "(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}"
OR files.blob_content REGEXP ".*-----BEGIN PRIVATE KEY-----.*"
OR files.blob_content REGEXP ".*-----BEGIN RSA PRIVATE KEY-----.*"
OR files.blob_content REGEXP ".*-----BEGIN DSA PRIVATE KEY-----.*"
OR files.blob_content REGEXP ".*-----BEGIN OPENSSH PRIVATE KEY-----.*")
plan: └─ Filter(commit_files.repository_id = ref_commits.repository_id AND commit_files.commit_hash = ref_commits.commit_hash)
plan: └─ SquashedTable(ref_commits, commit_files, files)
plan: ├─ Columns
plan: │ ├─ Column(repository_id, TEXT, nullable=false)
plan: │ ├─ Column(file_path, TEXT, nullable=false)
plan: │ ├─ Column(blob_hash, TEXT, nullable=false)
plan: │ ├─ Column(tree_hash, TEXT, nullable=false)
plan: │ ├─ Column(tree_entry_mode, TEXT, nullable=false)
plan: │ ├─ Column(blob_content, BLOB, nullable=false)
plan: │ ├─ Column(blob_size, INT64, nullable=false)
plan: │ ├─ Column(repository_id, TEXT, nullable=false)
plan: │ ├─ Column(commit_hash, TEXT, nullable=false)
plan: │ ├─ Column(file_path, TEXT, nullable=false)
plan: │ ├─ Column(blob_hash, TEXT, nullable=false)
plan: │ ├─ Column(tree_hash, TEXT, nullable=false)
plan: │ ├─ Column(repository_id, TEXT, nullable=false)
plan: │ ├─ Column(commit_hash, TEXT, nullable=false)
plan: │ ├─ Column(ref_name, TEXT, nullable=false)
plan: │ └─ Column(history_index, INT64, nullable=false)
plan: └─ Filters
plan: ├─ files.repository_id = commit_files.repository_id
plan: ├─ files.file_path = commit_files.file_path
plan: ├─ files.blob_hash = commit_files.blob_hash
plan: └─ files.tree_hash = commit_files.tree_hash
At least, AND ref_commits.history_index = 0
and AND ref_commits.ref_name REGEXP "^refs/heads/HEAD/"
should be pushed down.