Skip to content

Some filters are not correctly pushed down #579

Closed
@ajnavarro

Description

@ajnavarro

Having the following query:

explain format=tree SELECT 
	files.`repository_id` 
FROM files 
INNER JOIN 
	commit_files ON 
	(((files.`repository_id` = commit_files.`repository_id`) 
	AND (files.`file_path` = commit_files.`file_path`)) 
	AND (files.`blob_hash` = commit_files.`blob_hash`)) 
	AND (files.`tree_hash` = commit_files.`tree_hash`) 
INNER JOIN ref_commits ON 
	(commit_files.`repository_id` = ref_commits.`repository_id`) 
	AND (commit_files.`commit_hash` = ref_commits.`commit_hash`) 
WHERE 
	files.`file_path` IS NOT NULL 
	AND ref_commits.`ref_name` IS NOT NULL 
	AND ref_commits.`history_index` IS NOT NULL 
	AND ref_commits.`ref_name` REGEXP '^refs/heads/HEAD/' 
	AND ref_commits.`history_index` = 0 
	AND (NOT (files.`file_path` REGEXP '^vendor.*')) 
	AND ((((files.`blob_content` REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]') 
		OR (files.`blob_content` REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]')) 
		OR (files.`blob_content` REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]')) 
		OR ((files.`blob_content` REGEXP 'AKIA[0-9A-Z]{16}') 
		OR (files.`blob_content` REGEXP '(?i)reddit.*[\'\\"][0-9a-zA-Z]{14}[\'\\"]'))) 
		OR ((((files.`blob_content` REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}') 
		OR (files.`blob_content` REGEXP '.*-----BEGIN PRIVATE KEY-----.*')) 
		OR (files.`blob_content` REGEXP '.*-----BEGIN RSA PRIVATE KEY-----.*')) 
		OR ((files.`blob_content` REGEXP '.*-----BEGIN DSA PRIVATE KEY-----.*') 
		OR (files.`blob_content` REGEXP '.*-----BEGIN OPENSSH PRIVATE KEY-----.*')));

The output plan is the following one:

plan: Exchange(parallelism=8)
plan:  └─ Project(files.repository_id)
plan:      └─ Filter(
				NOT(files.file_path IS NULL) 
				AND NOT(ref_commits.ref_name IS NULL) 
				AND NOT(ref_commits.history_index IS NULL) 
				AND ref_commits.ref_name REGEXP "^refs/heads/HEAD/" 
				AND ref_commits.history_index = 0 
				AND NOT(files.file_path REGEXP "^vendor.*") 
				AND files.blob_content REGEXP "(?i)facebook.*['\\\"][0-9a-f]{32}['\\\"]" 
					OR files.blob_content REGEXP "(?i)twitter.*['\\\"][0-9a-zA-Z]{35,44}['\\\"]" 
					OR files.blob_content REGEXP "(?i)github.*['\\\"][0-9a-zA-Z]{35,40}['\\\"]" 
					OR files.blob_content REGEXP "AKIA[0-9A-Z]{16}" 
					OR files.blob_content REGEXP "(?i)reddit.*['\\\"][0-9a-zA-Z]{14}['\\\"]" 
					OR files.blob_content REGEXP "(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}" 
					OR files.blob_content REGEXP ".*-----BEGIN PRIVATE KEY-----.*" 
					OR files.blob_content REGEXP ".*-----BEGIN RSA PRIVATE KEY-----.*" 
					OR files.blob_content REGEXP ".*-----BEGIN DSA PRIVATE KEY-----.*" 
					OR files.blob_content REGEXP ".*-----BEGIN OPENSSH PRIVATE KEY-----.*")
plan:          └─ Filter(commit_files.repository_id = ref_commits.repository_id AND commit_files.commit_hash = ref_commits.commit_hash)
plan:              └─ SquashedTable(ref_commits, commit_files, files)
plan:                  ├─ Columns
plan:                  │   ├─ Column(repository_id, TEXT, nullable=false)
plan:                  │   ├─ Column(file_path, TEXT, nullable=false)
plan:                  │   ├─ Column(blob_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(tree_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(tree_entry_mode, TEXT, nullable=false)
plan:                  │   ├─ Column(blob_content, BLOB, nullable=false)
plan:                  │   ├─ Column(blob_size, INT64, nullable=false)
plan:                  │   ├─ Column(repository_id, TEXT, nullable=false)
plan:                  │   ├─ Column(commit_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(file_path, TEXT, nullable=false)
plan:                  │   ├─ Column(blob_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(tree_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(repository_id, TEXT, nullable=false)
plan:                  │   ├─ Column(commit_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(ref_name, TEXT, nullable=false)
plan:                  │   └─ Column(history_index, INT64, nullable=false)
plan:                  └─ Filters
plan:                      ├─ files.repository_id = commit_files.repository_id
plan:                      ├─ files.file_path = commit_files.file_path
plan:                      ├─ files.blob_hash = commit_files.blob_hash
plan:                      └─ files.tree_hash = commit_files.tree_hash

At least, AND ref_commits.history_index = 0 and AND ref_commits.ref_name REGEXP "^refs/heads/HEAD/" should be pushed down.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions