Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first pass at better handling of special duckdb syntax #6

Merged
merged 8 commits into from
Oct 4, 2023

Conversation

dylanscott
Copy link

SUP-1069 this PR pulls in hex-inc/sqlparse#5 and maps these new keywords to the SELECT query type since they are all variations of same.

@github-actions
Copy link

github-actions bot commented Oct 3, 2023

Pull Request Test Coverage Report for Build 6388458559

  • 12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 6354706848: 0.0%
Covered Lines: 888
Relevant Lines: 888

💛 - Coveralls

Comment on lines 264 to +267
and self.last_keyword_normalized in TABLE_ADJUSTMENT_KEYWORDS
and self.previous_token.normalized not in ["AS", "WITH"]
and self.normalized not in ["AS", "SELECT", "IF", "SET", "WITH"]
and not self.last_keyword_pivot_operator
Copy link
Author

@dylanscott dylanscott Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that I had encountered before in my playing around in the ast-service and which is on display in this PR is that the table inference logic provided by sql_metadata is definitely not bulletproof. I had encountered some test cases where it would tack on some extra entries in the extracted tables, and the pivot syntax appears to be particularly prone to this. I will comment inline elsewhere but there are some telling test assertions where we just check that a table is in parser.tables instead of asserting the full contents of parser.table because it contains incorrect entries.

I think that is mostly down to this Token.is_potential_table_name function being pretty broad. However, of all the functions in here it is the one I am most terrified of making changes to for fear of breaking some obscure but legitimate cases where an identifier is a table reference. The most I dared change was adding this scoping which I will elaborate on inline below. This feels like the right tradeoff - we care much more about false negatives than false positives here, as false positives will generally be harmless as they won't match the names of variables in the project, and worst case would just add some unnecessary reactive edges.

Comment on lines +5 to +13
source = """
SELECT *
FROM monthly_sales
PIVOT(SUM(amount) FOR MONTH IN ('JAN', 'FEB', 'MAR', 'APR'))
AS p
ORDER BY EMPID
"""
parser = Parser(source)
assert ["monthly_sales"] == parser.tables
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only test case that is affected by the changes I made in token.py and in fact it is not correcting regressed behavior due to the change in handling of the pivot keyword. I checked and it was parsing the same before those changes. Without the last_keyword_pivot_operator check this would also pull out SUM and amount as tables.

Comment on lines +19 to +20
parser = Parser("select * from pivot join other using (id)")
assert "pivot" in parser.tables and "other" in parser.tables
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unfortunately getting confused by the new handling of pivot and as a result interprets join as a potential table name as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, hopefully a pretty rare case!

Copy link

@jkillian jkillian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +19 to +20
parser = Parser("select * from pivot join other using (id)")
assert "pivot" in parser.tables and "other" in parser.tables
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, hopefully a pretty rare case!

@dylanscott dylanscott merged commit dfea863 into master Oct 4, 2023
@dylanscott dylanscott deleted the dscott/duckdb-syntax branch October 4, 2023 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants