Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some filters are not correctly pushed down #579

Closed
ajnavarro opened this issue Oct 23, 2018 · 0 comments
Closed

Some filters are not correctly pushed down #579

ajnavarro opened this issue Oct 23, 2018 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@ajnavarro
Copy link
Contributor

Having the following query:

explain format=tree SELECT 
	files.`repository_id` 
FROM files 
INNER JOIN 
	commit_files ON 
	(((files.`repository_id` = commit_files.`repository_id`) 
	AND (files.`file_path` = commit_files.`file_path`)) 
	AND (files.`blob_hash` = commit_files.`blob_hash`)) 
	AND (files.`tree_hash` = commit_files.`tree_hash`) 
INNER JOIN ref_commits ON 
	(commit_files.`repository_id` = ref_commits.`repository_id`) 
	AND (commit_files.`commit_hash` = ref_commits.`commit_hash`) 
WHERE 
	files.`file_path` IS NOT NULL 
	AND ref_commits.`ref_name` IS NOT NULL 
	AND ref_commits.`history_index` IS NOT NULL 
	AND ref_commits.`ref_name` REGEXP '^refs/heads/HEAD/' 
	AND ref_commits.`history_index` = 0 
	AND (NOT (files.`file_path` REGEXP '^vendor.*')) 
	AND ((((files.`blob_content` REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]') 
		OR (files.`blob_content` REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]')) 
		OR (files.`blob_content` REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]')) 
		OR ((files.`blob_content` REGEXP 'AKIA[0-9A-Z]{16}') 
		OR (files.`blob_content` REGEXP '(?i)reddit.*[\'\\"][0-9a-zA-Z]{14}[\'\\"]'))) 
		OR ((((files.`blob_content` REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}') 
		OR (files.`blob_content` REGEXP '.*-----BEGIN PRIVATE KEY-----.*')) 
		OR (files.`blob_content` REGEXP '.*-----BEGIN RSA PRIVATE KEY-----.*')) 
		OR ((files.`blob_content` REGEXP '.*-----BEGIN DSA PRIVATE KEY-----.*') 
		OR (files.`blob_content` REGEXP '.*-----BEGIN OPENSSH PRIVATE KEY-----.*')));

The output plan is the following one:

plan: Exchange(parallelism=8)
plan:  └─ Project(files.repository_id)
plan:      └─ Filter(
				NOT(files.file_path IS NULL) 
				AND NOT(ref_commits.ref_name IS NULL) 
				AND NOT(ref_commits.history_index IS NULL) 
				AND ref_commits.ref_name REGEXP "^refs/heads/HEAD/" 
				AND ref_commits.history_index = 0 
				AND NOT(files.file_path REGEXP "^vendor.*") 
				AND files.blob_content REGEXP "(?i)facebook.*['\\\"][0-9a-f]{32}['\\\"]" 
					OR files.blob_content REGEXP "(?i)twitter.*['\\\"][0-9a-zA-Z]{35,44}['\\\"]" 
					OR files.blob_content REGEXP "(?i)github.*['\\\"][0-9a-zA-Z]{35,40}['\\\"]" 
					OR files.blob_content REGEXP "AKIA[0-9A-Z]{16}" 
					OR files.blob_content REGEXP "(?i)reddit.*['\\\"][0-9a-zA-Z]{14}['\\\"]" 
					OR files.blob_content REGEXP "(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}" 
					OR files.blob_content REGEXP ".*-----BEGIN PRIVATE KEY-----.*" 
					OR files.blob_content REGEXP ".*-----BEGIN RSA PRIVATE KEY-----.*" 
					OR files.blob_content REGEXP ".*-----BEGIN DSA PRIVATE KEY-----.*" 
					OR files.blob_content REGEXP ".*-----BEGIN OPENSSH PRIVATE KEY-----.*")
plan:          └─ Filter(commit_files.repository_id = ref_commits.repository_id AND commit_files.commit_hash = ref_commits.commit_hash)
plan:              └─ SquashedTable(ref_commits, commit_files, files)
plan:                  ├─ Columns
plan:                  │   ├─ Column(repository_id, TEXT, nullable=false)
plan:                  │   ├─ Column(file_path, TEXT, nullable=false)
plan:                  │   ├─ Column(blob_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(tree_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(tree_entry_mode, TEXT, nullable=false)
plan:                  │   ├─ Column(blob_content, BLOB, nullable=false)
plan:                  │   ├─ Column(blob_size, INT64, nullable=false)
plan:                  │   ├─ Column(repository_id, TEXT, nullable=false)
plan:                  │   ├─ Column(commit_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(file_path, TEXT, nullable=false)
plan:                  │   ├─ Column(blob_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(tree_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(repository_id, TEXT, nullable=false)
plan:                  │   ├─ Column(commit_hash, TEXT, nullable=false)
plan:                  │   ├─ Column(ref_name, TEXT, nullable=false)
plan:                  │   └─ Column(history_index, INT64, nullable=false)
plan:                  └─ Filters
plan:                      ├─ files.repository_id = commit_files.repository_id
plan:                      ├─ files.file_path = commit_files.file_path
plan:                      ├─ files.blob_hash = commit_files.blob_hash
plan:                      └─ files.tree_hash = commit_files.tree_hash

At least, AND ref_commits.history_index = 0 and AND ref_commits.ref_name REGEXP "^refs/heads/HEAD/" should be pushed down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants